*3.1. Compression-Assisted Adaptive ECC*

From the perspective of storage efficiency, data compression is an effective way of enabling a high-density storage system, and a significant amount of user's data in storage media could be compressed according to [31,32]. We incorporated this compression mechanism in an attempt to provide the enhancement of error recoverability. In this system, the incoming write request goes through two steps before being programmed into flash cells, that is, RAID scattering and adaptive ECC with compression, as shown in Figure 3. At first, the incoming data are compressed using the compressor module. Then, the compressed data are placed at some position within the page buffer, and the unused region caused by compression is filled with 1 s because the value of 1 s is electrically stable for flash memory.

In general, the read and write units of NAND flash memory are page units; however, a page is divided into sub-pages for ECC encoding/decoding. The size of a page is usually 4 KB, 8 KB, or 16 KB; each page can be composed of four to eight sub-pages. For example, if a page has an 8 KB data area and a spare area for 1 KB of ECC, it consists of eight 1 KB sub-pages and a space area for 128 bytes of ECC. The write operation method of this system is as follows. When write data are sent from the host, the data are compressed by the compression module. If the compressed and reduced size is not larger than the sub-page size, it is not necessary to store data in more than one sub-page. The compressed data are moved to a specific position within the page by the RAID-scattering module. Thereafter, the ECC is generated by the adaptive ECC module.

**Figure 3.** Data write procedure with compression-assisted adaptive ECC and RAID scattering.

Compression modules have already been applied to improve the ECC function [5,6]. However, in existing works, unused areas due to data compression are used to store additional redundancy. In this system, in contrast, we use ECC encoding and decoding only for the compressed region, leaving the unused region intact. This method can be applied with little modification to existing ECC engine modules and encoding schemes. The procedure is as follows. After compression, the size of the compressed data is divided by the number of sub-pages, i.e., *N*, to obtain the source size of ECC encoding. *N* is denoted as the number of ECC operations for each page, which is fixed for

the implementation of legacy flash controller. This ensures that the source length of the ECC encoding is smaller than the existing sub-page size. With the reduced source by compression, the variable-sized ECC [33] is applied to generate ECC parity, and the generated ECC parity size does not change. The ECC parity size remains the same, and the code rate indicating the ECC parity relative to the source length is lowered, indicating a higher error correction rate.

The instance of the data-writing process and ECC-generation process is described in Figure 3. As shown in the figure, it is assumed that one page, with size 8 KB, is composed of eight sub-pages. The 8 KB of data transmitted from the host were compressed using the compression module, resulting in 5 KB of compressed data. It means that three sub-pages do not need to store data. These data are moved to the third sub-page position by the RAID-scattering scheme. The RAID-scattering scheme is discussed in the next subsection. As the data are reduced from 8 KB to 5 KB by compression, one ECC encoding size of 5 KB data is 640 bytes, which is 5 KB/8. This represents a reduction in code rate, which means that the ECC-correction capability is improved. Because ECC encoding is an error-correction function for data stored in a page, the adaptive ECC scheme improves the in-page error-correction function.

### *3.2. RAID Scattering*

If data are compressed by the compression module, the size of the compressed data would be smaller than the original size, so it occupies only a specific part of the page. In our system, when the compressed data are stored in the flash memory, it is not stored from the beginning of the page but in a specific position in the page. This position is determined according to the stripe number where the page is located in the stripe. Therefore, RAID scattering is defined as positioning compressed data in a specific location on the page according to the position in the corresponding stripe.

For instance, the architecture of RAID scattering with a stripe size four times larger than the page size is described in Figure 4. In the RAID configuration, *SG* (Stripe Group) is a group of blocks belonging to a stripe, and the notation *BI* (block index) is a block number in the stripe group; *SB* (Scatter Base) is a page size divided by the number of blocks in a stripe, and *SI* (Scatter Index) is the offset for the placement of each compressed data unit in a stripe group, calculated as *SB* multiplied by *BI*. For each write request, the compressed data are placed at the offset value *SI* calculated by *SB* and *BI*. If the data do not exceed the page size starting from a specific position, i.e., *SI*, all of the compressed data can be stored on the page, as appropriate. If the amount of data exceeds the limit of the page, the remaining data are placed from zero offset at the page, in circular manner. For example, as shown in Figure 4, if the number of blocks in a stripe group is four and the page size is 8 KB, data *D*5 are placed at offset zero in the page, data *D*6 are placed at offset 2 KB, data *D*7 is placed from offset 4 KB, and data *D*8 is placed offset by 6 KB in the page, respectively. As a result, all compressed data can be placed at different positions within their stripes by the RAID scattering scheme. Then, RAID parity is created by xor operation of all data in the stripe. In Figure 4, *P*2 is the RAID parity created through the xor operation of *D*5, *D*6, *D*7, and *D*8.

In our system, if data cannot be recovered by in-page ECC, it will be recovered via the outer-page error recovery, i.e., RAID parity. In an existing RAID system, when attempting to recover an error page by using the RAID parity of the stripe group containing the page where the error occurred, all pages in the stripe group must be read to perform the xor operation. However, in the RAID scattering system, it is not necessary to read all pages of the stripe group. Instead of reading all the pages in the stripe group, it is only necessary to read the pages that overlap the actual valid area of the compressed data of the page where the error occurred. For instance, in Figure 4, let us assume errors occur during the read of data *D*5, which means error is not recovered by in-page ECC. In this case, to recover data with RAID parity, the RAID scattering system reads only *D*6 and *D*8 and performs the xor operation to recover *D*5. *D*7 does not overlap with *D*5, and hence reading for the recovery operation is unnecessary.

**Figure 4.** Compression-assisted RAID scattering architecture.

To identify overlap information of pages in a stripe group, the length of the compressed data is added in a mapping-table in the FTL metadata. The compressed length is an important parameter to determine which pages are overlapped for recovery, as well as to decompress data for the read operation. The added length information is depicted in Figure 4. The length of valid data is represented by the actual length of the compressed data. In our implementation, the length field has a compact size for reducing metadata space overhead. For instance, a 12-bit field is required to represent a maximum length of 4 KB page, and a 13-bit field is required to represent a maximum length of 8 KB page. So maximum 3 bytes per entry is required for that.

Using the method of scattering within a stripe group has an effect of reducing the association of pages with each other within a stripe group. In other words, it is possible to increase the size of the stripe group for the same level of association. That is, if a RAID parity in the existing RAID system covered a stripe group with stripe size four, then another RAID parity in the RAID scattering system can cover a stripe group with stripe size eight. This results in reducing the overhead of RAID parity.

### *3.3. Metadata Structure and Parity Management*

The design objective of metadata structure is to reduce flash write operations for metadata and parity data of RAID5 architecture. For this, the metadata changes are updated as a logged fashion with metadata log structure, and parity is always flushed once at all writes of the corresponding stripe is completed. The designed metadata structure for a RAID stripe and parity management is shown in Figure 5, where this metadata are present in the FTL layer. The left part of the figure shows the in-NAND metadata snapshot structure, which is composed of the root block, StripeMap block, PageMap block, and MapLog block. For the root block, either flash memory block 0 or 1 is used as the pivot root block. At first, block 0 is assigned to the root block. The root block contains information of metadata blocks, and it is mainly used to find the metadata blocks at system boot. The last updated page of the root block contains information called RootInfo that contains the location of the blocks where the metadata are stored in Flash memory. Each time the metadata block changes, RootInfo is updated on the one page of the root block as in a log. When the update reaches the last page of the current root block (e.g., block 0), the root block changes to block 1 and the update continues.

The StripeMap block contains the stripe information for each stripe. Those stripe information is managed by three types of data structures: BlkInfo, StripeInfo, and StripeInfoList. The BlkInfo contains the status information for each data block, and StripeInfo has the status information for each stripe group. The size of the stripe group is the number of data blocks included in the stripe. For example, if the stripe size is *S*, one stripe group consists of *S* blocks. StripeInfoList is a hash data structure with a bucket from 0 to (*S* × *N*), where *N* is the number of valid pages in the stripe group.The stripe information includes stripe number, valid page count for the stripe, page offset, and so on. Each bucket has a list of dual-connected stripe groups with the same valid number of pages. It is used to select the free stripe group for the next write and the victim stripe group to perform when garbage collection (GC) is needed. The PageMap block contains page-mapping table information. Each entry in the page-mapping table is an array of physical page numbers (PPNs). The last part of the metadata is the StripeMapLog block, which is used to log the metadata changes information for data writes. The StripeMapLog block is important for our metadata log architecture. The in-NAND flash metadata blocks represent the latest version of the snapshot, while the main memory contains the latest metadata that has changed since the last snapshot. The metadata changes are written as a logged manner so that the latest metadata can be maintained.

**Figure 5.** Software metadata structure for RAID and parity management.

A metadata log operation is illustrated in Figure 6. Each metadata log is composed of one StripeMapLogInfo structure, and the StripeMapLogInfo consists of an array of logical addresses of the corresponding stripe group, which represent logical-physical mapping information. A stripe group is a group of blocks grouped together to form a RAID stripe. The stripe group is at the block level; however, the parity for the stripe is at the page level within the group. For example, if a stripe is composed of four blocks, parity is generated based on three pages with the same page offset in the group. The logging strategy is for metadata logging not for data logging. So, whenever data write occurs, the data are just written to flash memory, while its metadata change is not flushed immediately. Each time a new write occurs, new data are allocated to a new page within the stripe group, and the LPN of the corresponding PPN is logged to StripeMapLogInfo in memory. The updated metadata information, that is, the logical addresses for the written data in this stripe group, is just maintained as the StripeMapLogInfo structure information in memory until all pages of the stripe group are used for writes. Once it is filled with a certain amount of metadata updates, the stripeMapLogInfo is flushed to the last available page in the StripeMapLog block. When the StripeMapLog block is full of loaded pages, thus, no longer available for logging, the next available block area is needed. Before allocating a new StripeMapLog block, metadata information in the main memory is synchronized to the flash to be the latest snapshot, and then, a new StripeMapLog block is allocated. After this process, the logging mechanism described above is repeated.

In addition to the metadata logging, full stripe parity update operation is designed and implemented. In our system, parity for the stripe is retained in a buffer memory until full stripe write is done. The full

stripe write means that all of the data for the corresponding stripe is written. Since parity is updated in memory buffer with the incoming data of the stripe, it does not need to read old parity and old data to update parity. The parity is just written when a full stripe write is completed. Since partial parity is maintained in memory, it can be lost with power failure. The metadata log information is used during the recovery process of the partial parity from power failure. When power loss occurs during the parity is in partial state retained in buffer, that partial parity can be recovered by scanning the logged information.

**Figure 6.** Stripe map log operations for RAID and parity management.
