Modern Solid-State Drives (SSDs) are increasingly adopting Quad-Level Cell (QLC) flash memory, a technology that allows for the storage of four bits of data in a single cell, as their primary storage medium to significantly enhance storage capacity.
1. Introduction
Modern Solid-State Drives (SSDs) are increasingly adopting Quad-Level Cell (QLC) flash memory, a technology that allows for the storage of four bits of data in a single cell, as their primary storage medium to significantly enhance storage capacity
[1][2][3][4]. Although this advancement offers a substantial price advantage over SSDs utilizing Single-Level Cell (SLC), Multi-Level Cell (MLC), or Triple-Level Cell (TLC) technologies, QLC SSDs often face limitations in terms of I/O performance and lifespan, a comparison of which is detailed in
Table 1 [5]. To mitigate these drawbacks, recent innovations in SSD technology have led to the exploration of hybrid SSD architectures. These architectures strategically program certain blocks of QLC memory to operate in SLC mode, which involves storing just one bit per cell, therefore harnessing the high-speed advantages of SLC memory
[6][7][8]. In these hybrid designs, the SLC region is commonly utilized as a cache layer for the remaining QLC blocks, enhancing the overall efficiency and performance of the storage device
[9][10][11].
Table 1. Performance of SLC, MLC, TLC, and QLC flash memories.
In the context of hybrid SSD architectures, a notable challenge arises due to the inverse relationship between cache size and available data storage space. As the allocated SLC cache size increases, the available space for data storage correspondingly decreases, therefore emphasizing the need for efficient and strategic management of the SLC cache
[12][13]. This management becomes particularly critical when the SLC cache reaches its capacity, necessitating the initiation of processes such as garbage collection or the migration of certain data from the SLC cache to the slower QLC region (
Figure 1). These processes typically involve employing sophisticated algorithms, and they often result in additional write and erase operations.
Figure 1. Operations in the SLC/QLC hybrid SSD architecture.
A specific challenge within this management process is the handling of cold data, which is data that is not frequently accessed and, therefore, is unlikely to be reused in the short term. When such data occupies space in the SLC cache, it eventually needs to be transferred to the QLC region, an action that, in many cases, proves to be unnecessary if the data are not accessed again
[14][15]. To address this inefficiency, a more effective strategy involves the accurate identification of cold data at the outset, followed by its direct storage in the QLC region, bypassing the SLC cache altogether. This approach was explored by Im et al.
[7], who classified data as hot or cold based on the size of the I/O requests, with the assumption that smaller-sized I/O requests usually correspond to hot data. Building on these insights, Yoo et al.
[16] furthered this field of study by implementing reinforcement learning techniques to determine various parameters in hybrid SSDs, including the optimal size for the SLC cache and the threshold for distinguishing between hot and cold data.
The task of deciding whether to allocate incoming write requests to the SLC cache or to route them directly to the QLC region in hybrid SSDs is both essential and complex. This complexity arises from the dynamic nature of workload characteristics and the fluctuating status of the SSD itself. Despite the intricate nature of this decision-making process, the fundamental goal remains clear: to optimally choose between these two storage options for each write request. Given this challenge, machine learning emerges as a particularly promising approach due to its ability to process and learn from complex and variable data.
2. Data Placement Using a Classifier for SLC/QLC Hybrid SSDs
In the evolving landscape of hybrid SSD architecture, two primary configurations have emerged as dominant: hard partitioning and soft partitioning. Hard partitioning involves the use of physically distinct memory chips such as SLC, MLC, TLC, or QLC within a single hybrid SSD, as detailed by Tripathy et al.
[12]. This configuration enables the effective utilization of different types of memory technologies within the same device, each serving specific storage roles based on their performance characteristics.
In the realm of hard-partitioned hybrid SSDs, Kwon et al.
[8] implemented an innovative segmentation strategy within the SLC cache. They created separate areas for file system journal data and regular data, aiming to enhance sequential journal writes and, therefore, reduce the overhead associated with garbage collection. During periods of low activity, data are strategically moved from the SLC cache to TLC flash memory. In scenarios where the SLC cache reaches full capacity, normal data bypasses the cache and is written directly to the TLC flash, optimizing space utilization. Furthering this approach, Jung et al.
[17] proposed a method that employs multiple SLC chips as a cache layer for MLC flash memory. In this system, a single SLC chip is designated for the storage of both random and sequential data. This chip is utilized until it runs out of free blocks. Upon reaching its maximum capacity, with no available space for additional logs or data blocks, another SLC chip is activated for continued data storage. Yao et al.
[9] discussed a different system, where SLC and MLC pages are allocated using two distinct mapping caches, one for hot data and the other for normal data. This system prioritizes the use of SLC pages for hot data that register hits in its dedicated cache while defaulting to MLC pages for other data types.
Chang et al.
[18] proposed a unique management technique for SLC-MLC hybrid SSDs. In their approach, hot data are preferentially stored in SLC flash, whereas cold data are directly written to MLC flash. This method employs a K-means algorithm to periodically determine and adjust the threshold for estimating the hotness of data, ensuring optimal data placement. Similarly, Hachiya et al.
[19] introduced a cache management technique tailored for MLC-TLC hybrid SSDs. In their system, hot data are retained in MLC flash, and cold data, once identified, are transferred to TLC flash. This methodology reflects an ongoing trend in hybrid SSD technology, where smart data management and strategic cache utilization play crucial roles in enhancing overall storage efficiency and performance.
Soft partitioning presents a contrasting approach within hybrid SSD architecture, characterized by its flexibility in allowing various types of flash memory to emulate SLC-type flash characteristics. This versatility offers a dynamic solution to the rigid physical constraints of hard partitioning. Expanding on this concept, Im et al.
[7] put forward a cache management strategy for SLC-MLC hybrid SSDs. Their approach involves classifying data as either hot or cold based on the size of the incoming requests. Hot data, typically smaller in size and more frequently accessed, is allocated to the SLC cache for fast retrieval. In contrast, cold data, which is less likely to be accessed frequently, is directly written to the MLC flash, therefore optimizing the usage of the SLC cache. Crucially, this technique incorporates a dynamic adjustment of the hotness threshold, which is calibrated based on the observed frequency of data migrations to the MLC flash, ensuring efficient cache utilization and data flow.
Yoo et al.
[16] introduced another management policy for SLC-QLC hybrid SSDs, utilizing the principles of reinforcement learning. This policy is specifically tailored to distinguish between hot and cold data, directing hot data to the SLC cache while routing cold data straight to the QLC flash. The innovative aspect of this policy lies in its dynamic nature, allowing for continuous modification of both the size of the SLC cache and the threshold that differentiates hot from cold data, adapting in real time to changing data access patterns. Furthering the advancements in soft partitioning, Zhang et al.
[20] developed a method tailored for SLC-TLC hybrid SSDs. Their approach focuses on spreading SLC pages across multiple planes to significantly enhance parallelism, therefore reducing latency associated with data migration. This strategy includes a sophisticated policy for allocating these planes, which takes into account not only the availability of SLC pages but also the capabilities for parallel processing. This method represents a strategic balance between enhancing data access speed and optimizing the use of available storage resources, a key consideration in the evolving landscape of hybrid SSD technology.
Most existing research in the realm of hybrid SSDs has predominantly concentrated on the identification and management of hot data for the effective utilization of the SLC cache. In addition to this, several studies have ventured into the application of machine-learning techniques, aiming to ascertain or fine-tune the optimal size of the SLC cache and to establish precise thresholds for evaluating the hotness level of incoming data. These thresholds are crucial in determining which data benefits most from the faster SLC cache and which can be relegated to slower storage.
This entry is adapted from the peer-reviewed paper 10.3390/app14041648