Next Article in Journal
Millimeter-Wave Channel Measurements and Path Loss Characterization in a Typical Indoor Office Environment
Previous Article in Journal
ARMOR: Differential Model Distribution for Adversarially Robust Federated Learning
 
 
Article
Peer-Review Record

LazyRS: Improving the Performance and Reliability of High-Capacity TLC/QLC Flash-Based Storage Systems Using Lazy Reprogramming

Electronics 2023, 12(4), 843; https://doi.org/10.3390/electronics12040843
by Beomjun Kim and Myungsuk Kim *
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Reviewer 4:
Electronics 2023, 12(4), 843; https://doi.org/10.3390/electronics12040843
Submission received: 2 January 2023 / Revised: 29 January 2023 / Accepted: 31 January 2023 / Published: 7 February 2023

Round 1

Reviewer 1 Report

This paper shows a technique which is based on the lazy reprogramming scheme. It divides a program operation into two stages with different program latency and retention capability. They have implemented a LazyRS-aware FTL, called lazyFTL, which takes full advantage of the LazyRS-enabled flash device.  This paper targets significant problem and it is well organized and well written. 

Advantages:

·        This work is quite novel when not choosing the common solution for the poor program performance of flash memory (i.e. hybrid SSD). Moreover, this work can guarantee two most important requirements in Flash-based Storage Systems, which are program performance and reliability.

·        Authors provide persuasive reasons for LazyRS replacing for the common hybrid solution.

·        Description of the LazyRS is almost acceptable.

Disadvantages:

·        First of all, please revise all acronyms or terms and their definitions or functions (i.e. tPROG in 1, Vth, P/E in 2,.etc). For example, Vth: the threshold voltage distribution of cell; P/E: program/erase. It can help to avoid some confusion from readers and form your work more informative.

·        In WMS, it can be more forceful, if authors can explain or refer some related works about three write modes derived from NAND retention model.

·        With RBL, how an efficient length of the RBL (u) can be specified is not provided. Besides, two thresholds related to RBL are not analyzed clearly. How do the parameters related to RBL affect performance and reliability?

·        Where is the performance and reliability results as RBL parameters vary? This work can be more powerful and adequate if some information of RBL parameters are well explained.

Author Response

We thank the reviewers for their valuable feedback and suggestions. First, we answered individual questions from each reviewer. We also revised our manuscript considering the reviewer's comments. Revised part of manuscript has been highlighted by blue printed text. We also added Figure 8 and Figure 9 for revised manuscript.

 

  1. First of all, please revise all acronyms or terms and their definitions or functions (i.e. tPROG in 1, Vth, P/E in 2,.etc). For example, Vth: the threshold voltage distribution of cell; P/E: program/erase. It can help to avoid some confusion from readers and form your work more informative.

(Answer) As reviewer pointed out, we added the meaning and precise definition of several acronyms to our revised manuscript.

 

  1. In WMS, it can be more forceful, if authors can explain or refer some related works about three write modes derived from NAND retention model.

(Answer) As mentioned in Section 2, there is a strong inverse relationship between tPROG and flash reliability. Based on these well-known flash characteristics, we can design a huge number of write modes depending on how much retention capability is restricted. For example, we can make different write modes that secure x days of retention time varying by the value x. In our paper, we selected three write modes based on the workload characteristics used in experiments. In our revised manuscript, we added more information about why these three write modes were selected.

 

  1. With RBL, how an efficient length of the RBL (u) can be specified is not provided. Besides, two thresholds related to RBL are not analyzed clearly. How do the parameters related to RBL affect performance and reliability?

(Answer) As reviewer pointed out, two thresholds, u(high) and u(low) are crucial parameters that determine the overall performance of our proposed technique. As explained in Section 4.2, the length of RBL(u) indicates how many blocks in a flash chip should be reprogrammed by stage_s write mode to ensure data integrity. The more blocks are reprogrammed, the lower the flash performance is. Therefore, we need to manage the length of RBL(u) depending on workload characteristics. If the length of RBL(u) exceeds a specific threshold (i.e., u(high)), the WMS changes a write mode which allows longer retention time to reduce the length of RBL(u). In our experiment, u(high) and u(low) are 60% and 30% of the entire block entry list length, respectively. To clarify the effectiveness of the two threshold values used in LazyRS-aware FTL, we modified Section 4.2 in our revised manuscript.

 

  1. Where is the performance and reliability results as RBL parameters vary? This work can be more powerful and adequate if some information of RBL parameters are well explained.

(Answer) In our paper, we used fixed values of RBL parameters based on workload characteristics that were used in system-level experiments. However, to optimize the performance of storage systems maximally, it is necessary that RBL parameters can be dynamically adjusted depending on workload characteristics. We plan to study how to decide the optimal RBL parameters under various real-world workloads.

Reviewer 2 Report

The manuscript presents an alternate programming scheme to improve the write latency and lifetime of 3D QLC NAND flash. The topic is interesting, the manuscript is well-written, the results are nicely summarized and the content is within the journal's scope. I have a few minor concerns noted below:

1. Please define FTL - I could not find the definition anywhere in the manuscript. What does it mean?

2. The definitions of stage_t and stage_s could be clarified - I mean what is the difference between the write scheme for these two cases? Could you give more physical details - is del(Vispp) step the only difference between the two? or are there other details? 

3. The statement "For example, the shorter an idle interval between two stages, the more stage_s should be applied." is confusing - what do you mean by "more stage_s" ? Maybe this sentence needs to be rewritten.

4. Figure 1 needs some more references, meaning how these numbers were obtained? Did the author survey the datasheets/published literature? or measured these metrics themselves. If measured, some details of the measurement should be mentioned.

5. Figure 5(b) shows "reliability improvement (%)" as a function of idle interval. I have a few comments on this - (a) What kind of reliability are you talking about? is the data retention capability? or endurance? Could you please clarify? (b) How is the reliability being measured/estimated? (c) what is the baseline which is compared to get the % numbers? (d) the maximum value I see in the figure is ~14%, but the text mentioned ~31%? 

Author Response

We thank the reviewers for their valuable feedback and suggestions. First, we answered individual questions from each reviewer. We also revised our manuscript considering the reviewer's comments. Revised part of manuscript has been highlighted by blue printed text. We also added Figure 8 and Figure 9 for revised manuscript.

 

  1. Please define FTL - I could not find the definition anywhere in the manuscript. What does it mean?

(Answer) FTL is an abbreviation for flash transition layer. FTL is a software layer (a type of firmware) that allows the host's file system to treat the SSD like a conventional block device (e.g., HDD). We added more information about FTL to our revised manuscript.

 

  1. The definitions of stage_t and stage_s could be clarified - I mean what is the difference between the write scheme for these two cases? Could you give more physical details - is del(Vispp) step the only difference between the two? or are there other details? 

(Answer) Our proposed scheme LazyRS consists of two different program stages, stage_t, and stage_s. Since the retention capability of stored data is limited, stage_t enables a fast program speed by relaxing the â–³Vispp. On the other hand, stage_s guarantees secure data with a relatively slow write speed by keeping â–³Vispp narrow. As reviewer mentioned, the main difference between stage_s and stage_t is the amount of â–³Vispp during a program operation. Unlike a conventional reprogramming scheme in NAND flash memory, our LazyRS allows an idle interval time between the 1st coarse program (i.e., state_s) and the 2nd fine program (i.e., stage_s)

 

  1. The statement "For example, the shorter an idle interval between two stages, the more stage_s should be applied." is confusing - what do you mean by "more stage_s" ? Maybe this sentence needs to be rewritten.

(Answer) A short idle interval between two stages means that the guaranteed retention capability of stage_t is extremely short, such as 1- day. If the data is still valid after 1-day, all stored data should be rewritten by stage t to ensure data integrity. As a result, stage_s is more frequently invoked as an idle interval gets shorter. We rewrote that sentence to make the message clearer.

 

  1. Figure 1 needs some more references, meaning how these numbers were obtained? Did the author survey the datasheets/published literature? or measured these metrics themselves. If measured, some details of the measurement should be mentioned.

(Answer) Experimental results for write throughput in Figure 1 were measured based on the datasheet. For example, 3D TLC flash chips from vendor A showed a program latency of 680us while those from others showed a program latency of 660us. We added references to our revised manuscript.

 

  1. Figure 5(b) shows "reliability improvement (%)" as a function of idle interval. I have a few comments on this - (a) What kind of reliability are you talking about? is the data retention capability? or endurance? Could you please clarify? (b) How is the reliability being measured/estimated? (c) what is the baseline which is compared to get the % numbers? (d) the maximum value I see in the figure is ~14%, but the text mentioned ~31%? 

(Answer) As mentioned in Section 3.2, we determined the reliability of NAND flash memory based on retention BER. Retention BER, denoted by Nret(t) in our paper, was measured after t-day retention time at 1,000 P/E cycles. To confirm the reliability improvement effect, we compared 1-year retention BER (e.g., Nret(365)) over varying idle time with that of the conventional reprogramming scheme that there is no interval time between two different program stages. In addition, we fixed typos in Figure 5(b).

Reviewer 3 Report

Authors have proposed new NAND programming scheme that improves the write latency and reliablity. The concept is well explained. However, I would recommend authors to compare with present state-of-art as [13,29] are older techniques. 

Please comment on hardware complexity/requirements in terms of area and power for the proposed concept.

Author Response

We thank the reviewers for their valuable feedback and suggestions. First, we answered individual questions from each reviewer. We also revised our manuscript considering the reviewer's comments. Revised part of manuscript has been highlighted by blue printed text. We also added Figure 8 and Figure 9 for revised manuscript.

 

  1. Authors have proposed new NAND programming scheme that improves the write latency and reliability. The concept is well explained. However, I would recommend authors to compare with present state-of-art as [13,29] are older techniques. 

(Answer) As reviewer pointed out, if the state-of-the-art techniques were employed as the baseline of our experiment, we could analyze the effectiveness of our proposed technique more clearly. However, since most reprogramming techniques were developed for 2D TLC flash memory to mitigate cell-to-cell interference, we struggled to find a suitable reprogramming technique for recent 3D TLC or QLC NAND flash memory. We hope our technique will become a new baseline reprogramming technique for high-capacity 3D TLC/QLC NAND flash memory.

  1. Please comment on hardware complexity/requirements in terms of area and power for the proposed concept.

(Answer) LazyRS does not require additional hardware requirements because it can be implemented by exploiting the conventional reprogramming scheme.

Reviewer 4 Report

The manuscript proposes a new NAND programming scheme, called lazy reprogramming scheme(LazyRS), which leverages two program stages and an idle interval for multi-level NAND flash memory in order to optimize the overall performance of flash-based storage systems without any hardware modification. The topic is interesting, and the method has certain innovation. Some comments are listed as follows:

1. More experiments are required to evaluate the effectiveness of the proposed method. The descriptions about Fig.5 and Fig.5 are suggested to move to the experiments part to enhance the evaluation. 

2. Some important experimental settings are missing, like uhigh and ulow. What’s more, the influence of threshold value like uhigh and ulow should be analyzed in the experiments.

3. It is difficult for reviewer to be convinced that the LazyRS-aware FTL can improve the write throughput and flash reliability by up to 2.6 times and 31.2%, respectively. Therefore, the quantitative experimental results are required to be reported. 

4. Lack of the reference of the conventional page-level FTL (pageFTL).

5. There are some minor typos, like: A period is missing after the sentence “can be efficiently removed due to an idle interval between two stages”.

Author Response

We thank the reviewers for their valuable feedback and suggestions. First, we answered individual questions from each reviewer. We also revised our manuscript considering the reviewer's comments. Revised part of manuscript has been highlighted by blue printed text. We also added Figure 8 and Figure 9 for revised manuscript.

 

  1. More experiments are required to evaluate the effectiveness of the proposed method. The descriptions about Fig.5 and Fig.5 are suggested to move to the experiments part to enhance the evaluation. 

(Answer) As reviewer pointed out, we agree that more experiments can prove the effectiveness of our proposed technique, LazyRS. LazyRS was implemented based on the device characterization studies using real 3D TLC flash chips. Figure 5 demonstrates how much the performance can be enhanced by limiting the retention time and how much the reliability can be improved with different idle intervals. Therefore, we placed the analysis of Figure 5 before the system-level experiment results.

 

  1. Some important experimental settings are missing, like uhigh and ulow. What’s more, the influence of threshold value like uhigh and ulow should be analyzed in the experiments.

(Answer) As reviewer pointed out, two thresholds, u(high) and u(low) are crucial parameters that determine the overall performance of our proposed technique. As explained in Section 4.2, the length of RBL(u) indicates how many blocks in a flash chip should be reprogrammed by stage_s write mode to ensure data integrity. The more blocks are reprogrammed, the lower the flash performance is. Therefore, we need to manage the length of RBL(u) depending on workload characteristics. If the length of RBL(u) exceeds a specific threshold (i.e., u(high)), the WMS changes a write mode which allows longer retention time to reduce the length of RBL(u). In our experiment, u(high) and u(low) are 60% and 30% of the entire block entry list length, respectively. To clarify the effectiveness of two threshold values used in LazyRS-aware FTL, we modified Section 4.2 in our revised manuscript. In our current study, we fixed values of u(high) and u(low). However, it is critical that u(high) and u(low) can be dynamically adjusted depending on workload to maximize the performance of storage systems. We plan to study how to track the optimal RBL parameters in our future work.

 

  1. It is difficult for reviewer to be convinced that the LazyRS-aware FTL can improve the write throughput and flash reliability by up to 2.6 times and 31.2%, respectively. Therefore, the quantitative experimental results are required to be reported. 

(Answer) Unfortunately, we should keep some hardware details or information secret because of NDA with the NAND manufacturer, which supports test flash chips. Therefore, we represented the effectiveness of our proposed technique by using the normalized value. As needed, we can provide approximate key design parameters, for example, the program latency or initial RBER (Raw Bit Error Rate).

 

  1. Lack of the reference of the conventional page-level FTL (pageFTL).

(Answer) As reviewer pointed out, we added the reference for the conventional page-level FTL (pageFTL) to our revised manuscript.

 

  1. There are some minor typos, like: A period is missing after the sentence “can be efficiently removed due to an idle interval between two stages”.

(Answer) As reviewer pointed out, we fixed typos in our revised manuscript.

Round 2

Reviewer 4 Report

Thanks for the authors's reponses, and I have no more comments. Therefore, I would like to give the recommendation of ACCETP. 

Back to TopTop