*3.4. Power Loss Recovery Step of Metadata Log and Partial Parity*

The metadata log and partial parity retained in memory can be lost from power failure; however, it can be recovered by recovery process. Since the metadata changes after the snapshot are logged to the StripeMapLog block, we can recover metadata changes by scanning the MapLog blocks. For the paritis, almost all parity data are stored in flash memory as a status of parity for full stripe write; the actual portion to be recovered is just the partial parity of write data for the corresponding stripe before power loss.

The metadata log and parity recovery steps are as follows. This step is power loss recovery step, not error recovery step. At the first step, flash storage tries to find the latest RootInfo by scanning *Root* block. In the second step, it recovers the metadata snapshot that consists of StripeInfoList, StripeBlkInfo, BlkInfo, and page mapping table by retrieving StripeMap and PageMap blocks. In the third step, the metadata changes, that occur after the snapshot, were stored in the StripeMapLog blocks, so they are recovered by reading page by page and updating each information. The last step to be recovered is the metadata changes that were losted due to the power loss are recovered. Those were just maintained in-memory StripeMapLogInfo so were not flushed to StripeMapLog block. Metadata changes of the StripeMapLogInfo can be recovered by scanning the corresponding stripe group itself, and we can identify the stripe group number from last valid page in the StripeMapLog Block since we record the number of NextStripe group at last valid StripeMapLog. So, the mapping changes of the NextStripe group can be recovered by scanning the spare region of each page. If the page has valid logical address in the spare region, which means the data are stored safely so that its metadata should be updated. If not, the data are not stored safely and the pages are abandoned. For the scanning of each stripe page, if all the pages of the stripe have valid logical address, which means all the data for the stripe are stored safely, the corresponding parity is also safe. If some pages do not have valid logical addresses, which means not all the data in this stripe are valid, then there is no valid parity for this stripe. For this stripe, each valid page is retrieved and parity is calculated with those. It is the partial parity that is lost by power failure. Therefore, we recovered partial parity for this stripe.

#### **4. Evaluation**

In order to prove the validity of the proposed RAID scattering and adaptive ECC, we implemented the proposed schemes inside FlashSim [34,35] that is widely used in the storage system evaluation. This open source simulator includes the simulated NAND flash controller and NAND flash memories that can be set for page size, block size, etc. The simulator also implements well-known FTLs and NAND flash operations, including read, write, and erase, as well as garbage collection. We have added the RAID parity management, compression-based adaptive ECC, and RAID-scattering schemes in this simulator. The simulation configuration set for performance evaluation is as follows. The page-level FTL was used for an FTL mapping management scheme, in which 8 KB was used for page size, and a 1 KB sub-page was used. The compression module uses the zlib [36] compression module. The BCH ECC model is used with 1024BCH16, which is an ECC module that can correct 16 bits per 1024 bytes. That is, 1024BCH16 is applied for the 1 KB sub-page. For the bit error generation and simulation, the BER is sampled and an error is injected according to its P/E cycles. The samples from BER versus P/E cycle data are taken from a previous study on NAND flash devices [24]. The simulated BER sample is depicted in Figure 2.

#### *4.1. Analysis of Adaptive ECC and RAID Scattering*

To evaluate the performance of the proposed system in different user environments, various IO traces, such as filesystem workloads, database workloads, web server access data, and media streaming IO, are collected by developers reflecting the real human behavior. The IO traces are collected and categorized as F-1, F-2, F-3, F-4, and F-5 according to the application's characteristics. Among those IO traces, F-1 and F-2 are text-dominant file system and database traces, F-3 and F-4 are media-dominant streaming application traces, and F-5 is a mix of IO traces.

We have measured the impact of compression for various IO traces. We applied page-based data compression using the zlib [36] compression module and measured the compression ratio for each page. The compression ratio distribution for IO traces from F-1 to F-5 is shown in Figure 7. As shown in the figure, compression ratios are distributed differently among all IO traces. For general filesystems and databases, large chunks of data are compressed to less than 50% of the original data, which means much of data for general filesystems and databases are highly compressed. The portion of text-like data in filesystems and databases is large, and these types of data are generally compressed at a high compression rate. On the contrary, for web servers or media streaming applications, not much data are compressed, as depicted in F-3 and F-4. Media-like data, such as videos, images, and audio, are dominant on web servers, which are inherently compressed data, and hence, the compression rate is low, which results in low compression rate.

**Figure 7.** Distribution for compressions for each dataset.

Based on the distribution for the data-compression rate of IO traces, we have evaluated the adaptive ECC scheme and the RAID-scattering scheme. For a fair evaluation, we have simulated IO operations over 10 million times and checked errors for each trace, as bit errors occur rarely. We measured the uncorrectable bit error rate (UBER) [1] for RBER as a performance metric by increasing the P/E cycle from 0 to 5000 for each experiment of each configuration. For UBER a page cannot be recovered by using the error-recovery scheme.

At first, the adaptive ECC scheme is evaluated to show its own performance contribution to error recovery. This is achieved by comparing with the original ECC scheme, that is, fixed-sized ECC encoding. The results of the error-recovery ability for adaptive ECC are plotted in Figure 8. The *x*-axis represents P/E cycles, and the *y*-axis represents UBER on a logarithmic scale. Thus, the result error rate after ECC decoding is presented as UBER. As per the results, the lower the UBER, the better the error-correction capability. As shown in Figure 8, the adaptive ECC scheme results in better UBER over legacy ECC for all IO traces, which means that the adaptive ECC scheme shows a better performance than the fixed ECC scheme in the correction of bit errors. The reason of this increase is the reduction in valid data from compression. From the figure, we can also identify that the performance gap increases for increasing P/E cycles. It means that more errors can be covered by adaptive ECC as more bit errors occur. Among IO traces, F-1 and F-2 result much better UBER than other traces. Because F-1 and F-2 achieve higher compression rates than others, these give smaller source sizes for ECC decoding. On the contrary, F-3 and F-4 show little performance improvement in comparison with that of the original ECC owing to inefficient compression.

**Figure 8.** Adaptive ECC scheme for IO traces from F-1 to F-5 in comparison with legacy.

Next, the RAID-scattering scheme is compared with the original RAID scheme having the RAID5 parity model. For this evaluation we have simulated the original RAID and RAID scattering with varying stripe sizes of four, eight, and 16. For all RAID and RAID-scattering configurations, UBER is evaluated as the P/E cycle increases from 0 to 5000. The experimental results are plotted in Figure 9. In all the figures, in accordance with the P/E cycle, RAID-Stripe4, RAID-Stripe8, and RAID-Stripe16 represent UBER for the original RAID configuration with stripe sizes four, eight, and 16, respectively, while Scatter4, Scatter8, and Scatter16 represent UBER for the RAID-scattering scheme with stripe sizes four, eight, and 16, respectively. We have also plotted Legacy UBER to compare results with the RAID-scattering scheme, as well as the RAID system. The Legacy shows the original uncorrected page error rate of the original fixed ECC scheme.

From Figure 9a,b,e, we identify that RAID scattering gives UBER enhancement over the legacy RAID system for F-1, F-2, and F-5. This is due to the high compression ratio of these traces. In the RAID system, the error-containing page can be recovered using the parity page and other data pages within the corresponding stripe. If more than one of those pages also has an uncorrected page error, the error-containing page fails to be recovered. As data are compressed and scattered within a stripe in the RAID-scattering system, there is less probability for overlap between error-containing pages than in the original RAID system. This results in less UBER for RAID-scattering systems for traces F-1, F-2, and F-5. For F-1 and F-2, we identify that RAID scattering with all stripe sizes give less UBER than legacy RAIDs with even smallest stripe size, that is, four. This means that even if the stripe size is large

and the parity covers a wider range, the error-correction capability is better than of the original RAID with a smaller stripe size. However, for IO traces with lower compression rates such as F-3 and F-4, as shown in Figure 9c,d, there is no big difference between UBER of the RAID-scattering scheme and UBER of the existing RAID scheme. As compression is less effective for F-3 and F-4, the number of pages to be recovered in the corresponding stripe has an error rate that does not differ much from that of the original RAID scheme. Even so, the figure shows that there is some performance improvement for each P/E cycle.

**Figure 9.** Experimental results for RAID scattering in comparison with original RAID with various stripe sizes. In accordance with P/E cycle, UBERs of RAID Scattering and original RAID are estimated along with stripe sizes four, eight, and 16.(**a**) RAID Scattering for F-1, (**b**) RAID Scattering for F-2, (**c**) RAID Scattering for F-3, (**d**) RAID Scattering for F-4, (**e**) RAID Scattering for F-5.

Finally, the RAID-scattering scheme with the adaptive ECC is compared with the legacy RAID system. In this experiment, each IO trace is modeled as compressed, encoded with adaptive ECC, and stored as scattering within the stripe in the RAID system. Like for the previous evaluation, we have simulated those by varying stripe sizes with four, eight, and 16. For all RAID and RAID scattering with adaptive ECC configurations, UBER is evaluated as the P/E cycle increases from 0 to 5000. The results for traces from F-1 to F-5 are plotted from Figure 10a–e. Similar to the results of the RAID

scattering only scheme, all figures show plots for UBER for RAID-Stripe4, RAID-Stripe8, RAID-Stripe16, Adp.ECC-Scatter4, Adp.ECC-Scatter8, and Adp.ECC-Scatter16 configurations in accordance with the P/E cycle. From the figures, we identify that RAID scattering with adaptive ECC shows a lower UBER than that of legacy RAID, as well as RAID-scattering scheme. For all cases, it definitely results in less UBER than legacy RAID systems, regardless of the compression rate of the IO data. Another noticeable point is that the UBER for RAID scattering with adaptive ECC is less than that of the legacy RAID even though the stripe size is larger, which implies that the RAID-scattering scheme can enlarge the stripe size for the RAID system configuration with a higher reliability. In other words, by reducing parity overhead, RAID scattering can reduce write overhead at the same reliability level as it has less parity portion.

(**e**)Adp. ECC and Scattering for F-5

**Figure 10.** Experimental results for RAID scattering added by adaptive ECC in comparison with the original RAID with various stripe sizes four, eight, and 16. In accordance with P/E cycle, RAID-Stripe4, RAID-Stripe8, RAID-Stripe16, Adp. ECC-Scatter4, Adp. ECC-Scatter8, Ada. ECC-Scatter16 show UBER with stripe sizes four, eight, and 16, respectively. (**a**) Adp. ECC and Scattering for F-1, (**b**) Adp. ECC and Scattering for F-2, (**c**) Adp. ECC and Scattering for F-3, (**d**) Adp. ECC and Scattering for F-4, (**e**) Adp. ECC and Scattering for F-5.

To see how much pages can be skipped during the recovery stage in RAID scattering scheme, we have measured the number of pages to be skipped from the recovery operations for the error-occurred page. It was also done with the RAID configuration as stripe size four, eight and 16, respectively, and for each configuration, the five IO traces were performed. Figure 11 shows the ratio of pages skipped during each restoring error-occurred page in the RAID scattering scheme. In the figure, five IO traces are depicted as a group to the same stripe size level, and thus the *x*-axis represents groups of five IO traces according to stripe size level, and the *y*-axis represents the skip ratio. Since the skip ratio represents the ratio of pages that need not be read during recovery; the higher the skip ratio is, the better the recovery rate is as well as the shorter the recovery time is. From the results, it can be inferred that since the compression ratio is high as in the traces F-1 and F-2, the skip ratio is high, which results in reduced UBER. This is an effect of the RAID scattering scheme. On the other hand, the skip ratio of F-3 is low; there is little effect on RAID scattering, which results in little enhancement for UBER.

**Figure 11.** The skip ratio of pages when restoring error-occurred page in the stripe in the RAID scattering scheme.

#### *4.2. Analysis of Metadata and Parity Overhead*

The adaptive ECC and RAID-scattering schemes provide an effective error management method that can be achieved by reducing the size of the valid data regions on a page. However, additional overhead is required to support this operation. There is an overhead in storing and managing the size of the compressed data stored in each page. Specifically, a space for recording the compressed data size of the corresponding page is required for each page entry of the FTL mapping table. It takes about 14 bits or 3 bytes per entry. The necessary metadata size can estimated according to page size, block size, and capacity of flash device. For 128 GB flash device, whose page size and block size is 8 KB and 1 MB, respectively, the amount of estimated metadata blocks is about 119–122 blocks which includes two root blocks, four StripeMap blocks, 112 PageMap blocks and 1–4 MapLog blocks. The size is about 120 MB. If the capacity of flash device is double, the amount of estimated metadata blocks is almost double of previous one. For 512 GB flash device, whose page size and block size 8 KB and 3 MB, respectively, the amount of metadata blocks is about 155–158 blocks including two root, two StripeMap, 150 PageMap, and 1–4 MapLog blocks, and whose size is around 237 MB. From the estimation we identify that it is the PageMap block that occupies the largest weight in metadata, and it increases as the number of pages increases. Since we use about 7 bytes per entry of the pagemap to manage the physical address of compressed data, it is considerably larger than the PageMap which does not manage data compression. On the other hand, the additional metadata blocks for stripe management consumes less than 10 blocks for hundreds of GB of flash devices, which is not much overhead.

To see the metadata overhead in the aspect of NAND flash IO, we analyzed the overhead of the metadata logging IO operation. The overhead of the metadata log management is mainly composed of two parts. The first is page write operations that write StripeMapLogInfo, which is a log of metadata changes for the corresponding stripe group. The StripeMaplogInfo is written to a page in the StripeMapLog block whenever the page's current stripe group is used for data write. The other part involves the flushing of whole metadata snapshot, which occurs when all the pages

of StripeMapLog block are exhaused. The whole metadata snapshot includes PageMap, StripeMap, and RootInfo. To evaluate the metadata log overhead, we have generated random write operations to the simulator and measured the number of written pages for metadta logs, which include logging of StripeMapLogInfo and the flusing of the entire metadata. Among the metadata log overhead, the duration of writing metadata snapshot is dependent of the number of StripeMapLog blocks. If we have several StripeMapLog blocks, the duration of flushing metadata snapshot is longer since more StripeMapLogInfo can be written to the StripeMapLob blocks. Therefore, the experiment was conducted by changing the number of StripeMapLog blocks from one to four.

Figure 12 shows the results of metadata log overhead for the implemented metadata log system for RAID5 architecture with stripe size four, eight, and 16, respectively. As shown in the figure, metadata log overhead is almost 10 percent if the RAID5 configuration is set with four stripe size and 1 StripeMapLog block. However, the overhead decreases as the stripe size increases and the number of StripeMapLog block increases. As the stripe size increases, the number of pages included in the one stripe group increases, and thus the frequency of flushing the StripeMapLogInfo log increases. In addition, if the number of StripeMapLog blocks is increased, the number of flushing of whole metadata snapshots is reduced. From the results, it can be seen that when the number of StripeMapLog blocks is increased by four or more, the overall metadata log overhead can be reduced to 5% even if the capacity of flash device is hundreds of GB.

**Figure 12.** The results of metadata log overhead for the metadata log system for RAID5 architecture with stripe size four, eight, and 16, respectively.

Next, to analyze the parity overhead, we have measured the number of parity writes for random write requests while varying average request size from 4 KB to 128 KB. The experiments are done with two RAID5 configurations, that is, full stripe parity scheme and conventional RAID5 scheme. The full stripe parity scheme implemented in this paper was compared with the conventional RAID5 scheme, in which parities of RAID5 are updated whenever a write request occurs for the corresponding stripe. Those configurations are also set to four, eight, 16 stripe size, respectively.

The result of the parity overhead according to the average random request size is depicted in Figure 13, for each stripe size four, eight, 16, respectively. In the figure, the *x*-axis represents the average request size and the *y*-axis represents the number of parity writes per request. The number of parity writes per requests means the number of parities written for each request, and therefore, the lower the value, the smaller the parity overhead. Two RAID configurations, STLog and RAID5, are depicted, where STLog stands for the full stripe parity scheme and RAID5 for the conventional RAID5 scheme. As shown in the Figure 13a, if the stripe size is four, there is little difference between STLog and RAID5. There is lower parity overhead for small requests, and parity overhead becomes similar as the request size increases. However, if the stripe size is bigger, i.e., eight or 16, the parity overhead is largely reduced with the STLog parity management scheme as shown in Figure 13b,c. It is due to the partial parity buffering of the full stripe parity scheme. Particularly, when the request size is smaller than the stripe size, the partial parity buffering applied in STLog is more effective since parity updates are frequently performed in the conventional RAID5 scheme. If the request size

exceeds the stripe size, parity for full stripe write is created as it is, so STLog is no different from conventional RAID5.

**Figure 13.** Result of the number of parity writes per request according to the change in the average request size, for each stripe size four, eight, 16, respectively.

#### *4.3. Discussion about Compression Module*

The adaptive ECC and RAID scattering is applied based on lossless data compression, and thus the overhead of the data compression module is a performance-deteriorating factor for NAND flash memory devices. Although the software zlib module is used in our system for data compression, the compression module requires a lot of CPU computation, and hence it is not suitable for practical use in software implementations. However, recent compression hardware [37] or accelerator [38] further increases the practical possibilities of lossless data compression. In particular, [37] describes that it supports gzip/zlib/deflate data compression with approximately 100 Gbps bandwidth. If such a compression module is included in the NAND flash controller, it is expected to be able to perform data compression with little effect to the performance degradation of other modules, although it will increase the cost. In this paper, we could not analyze the performance degradation effect by the compression module, but in the future, there is a need to analyze the performance of the overall system including the hardware compression module. This is our further work.

### **5. Conclusions**

The density of flash memory devices has increased by moving to smaller geometries and storing more bits per cell; however, this generates a lot of data errors. The typical approach to check these increasing errors is the use of well-known ECCs; however, the current state of the ECC scheme it not on the same level with the evolution of errors of the current state of flash memory. The RAID technology can be employed for flash storage devices to enhance their reliability. However, RAID has inherent parity update overhead.

In this paper, we propose the enhancement for ECC ability for inside the page and RAID parity management for outside the page, with the help of lossless data compression. The compressed data are encoded with adaptively reduced source length, which increases the recovery ability of the ECC module. The adaptive ECC method can reduce the size of a source length in proportion to the compression ratio. As this reduces the source length to be encoded, the effect of lowering the code rate increases the error-correction rate. For the perspective of outside the page, the compressed data are placed at a specific position in the page. As the unused area in the page generated by compression can be used as a non-overlapping area in RAID scattering, it is effective during the RAID recovery stage. The unused area is spread out within the parity cluster, which reduces the error coverage range due to the possible skipping pages that are not overlapping with the error-occurred page. The experimental results show that adaptive ECC can enhance the ability of error recovery. Furthermore, the RAID-scattering scheme achieves a high reliability at the same RAID stripe level. As the next step of this study, we will apply compression, ECC, and RAID scattering modules as hardware modules to evaluate the actual overhead and performance impact. Therefore, a more rigorous comparison of the performance of the propose

scheme versus others could be an important task to improve the completeness of the proposed scheme. Consequently, we set the more rigorous performance evaluations as our further work.

**Author Contributions:** Conceptualization, S.-H.L. and K.-W.P.; methodology, S.-H.L. and K.-W.P.; software, S.-H.L.; validation, S.-H.L.; formal analysis, S.-H.L. and K.-W.P.; investigation, S.-H.L. and K.-W.P.; resources, S.-H.L. and K.-W.P.; data curation, S.-H.L. and K.-W.P.; writing–original draft preparation, S.-H.L. and K.-W.P.; writing–review and editing, S.-H.L. and K.-W.P.; visualization, S.-H.L. and K.-W.P.; supervision, S.-H.L. and K.-W.P.; project administration, S.-H.L. and K.-W.P. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean Government (MSIT) (NRF-2019R1F1A1057503). This work was supported by Hankuk University of Foreign Studies Research Fund. Also, this work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean Government (MSIT) (NRF-2020R1A2C4002737).

**Conflicts of Interest:** The authors declare no conflict of interest.
