High-Performance Garbage Collection Scheme with Low Data Transfer Overhead for NoC-Based SSDC
Abstract
:1. Introduction
- We introduce a selective copy-back mechanism that prioritizes shorter data paths when no errors are detected during GC, significantly reducing interconnect delays.
- We present a hardware implementation using task queues to efficiently manage the proposed selective data flow, enabling parallelism with minimal overhead.
- We evaluate the proposed scheme under real-world workloads, demonstrating up to a 26.9% improvement in average latency and a 50.0% reduction in peak latency compared to conventional methods.
2. Background
2.1. Conventional SSD Organization
2.2. Garbage Collection in SSDs
2.3. NoC-Based SSD Controller
3. Proposed Garbage Collection Scheme
3.1. Adaptive Data Path Selection for Garbage Collection Efficiency
- Target Block/Page Identification: Blocks with a high number of invalid pages are selected for GC. Valid pages from these blocks are identified and copied to a temporary buffer in the flash memory controller (FMC).
- Error Correction: Valid data in the FMC buffer are routed via the NoC interconnect to the CPU for error correction using the low-density parity-check algorithm.
- Data Write-Back: Once corrected, the data are written to an empty page in the target NAND flash block.
- Round-Trip Traversal: Each copy-back operation involves two interconnect traversals—one to the CPU for error correction and another to return the data to the target NAND flash memory.
- Slice-Based Delays: Data packets pass through multiple interconnect slices, with each slice adding latency based on the interconnect size and the number of active channels.
- Copy-Back Operation: During the copy-back process, valid pages from the GC target block are temporarily stored in the FMC buffer, as presented in Figure 4a.
- Error Analysis: Data packets are routed to the CPU via the NoC for error analysis by the ECC unit, as presented in Figure 4a:
- No Errors Detected: If no errors are found, the data stored in the FMC buffer are routed directly back to an available location in the NAND flash memory, as shown in Figure 4b.
- Errors Detected: If errors are detected, the ECC unit within the CPU performs the necessary corrections, after which the corrected data are routed back to the NAND flash memory, as shown in Figure 4c.
- Selective Data Path Optimization: A dynamic mechanism determines the shortest path for copy-back operations. Error-free data bypasses the CPU and is directly written back to NAND flash memory, reducing interconnect latency by roughly half.
- Error-Adaptive Control: For data with errors, full round-trip traversal ensures compatibility with standard LDPC correction processes, preserving reliability.
3.2. Efficient Data Handling Through Task Queue Architecture
- Efficient Task Queue Integration: The approach integrates with task queue architectures in NoC-based SSD controllers, enabling smooth implementation alongside host read and write operations.
- Integration with ECC Analysis: The task queue enqueues GC write requests (Wgc) only if errors are detected and corrected. Otherwise, error-free data from the FMC buffer are directly written back without generating additional Wgc tasks.
4. Evaluation
4.1. Experimental Setup
4.2. Trace File Analysis
4.3. Experimental Results
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Narayanan, D.; Thereska, E.; Donnelly, A.; Elnikety, S.; Rowstron, A. Migrating Server Storage to SSDs: Analysis of Tradeoffs. In Proceedings of the 4th ACM European Conference on Computer Systems, Nuremberg, Germany, 1–3 April 2009; pp. 145–158. [Google Scholar]
- Deng, Y. What Is the Future of Disk Drives, Death or Rebirth? ACM Comput. Surv. (CSUR) 2011, 43, 1–27. [Google Scholar] [CrossRef]
- Micheloni, R.; Marelli, A.; Eshghi, K.; Wong, G. SSD Market Overview. In Inside Solid State Drives (SSDs); Springer: Berlin/Heidelberg, Germany, 2013; pp. 1–17. [Google Scholar]
- Guo, J.; Hu, Y.; Mao, B.; Wu, S. Parallelism and Garbage Collection Aware I/O Scheduler with Improved SSD Performance. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS), Orlando, FL, USA, 29 May–2 June 2017; pp. 1184–1193. [Google Scholar]
- Bux, W.; Iliadis, I. Performance of Greedy Garbage Collection in Flash-Based Solid-State Drives. Perform. Eval. 2010, 67, 1172–1186. [Google Scholar] [CrossRef]
- Google. Google: Taming the Long Latency Tail—When More Machines Equals Worse Results. Available online: http://highscalability.com/blog/2012/3/12/google-taming-the-long-latency-tail-when-more-machines-equal.html (accessed on 12 March 2012).
- Dean, J.; Barroso, L.A. The Tail at Scale. Commun. ACM 2013, 56, 74–80. [Google Scholar] [CrossRef]
- Gupta, A.; Pisolkar, R.; Urgaonkar, B.; Sivasubramaniam, A. Leveraging Value Locality in Optimizing NAND Flash-Based SSDs. In Proceedings of the FAST 2011, San Jose, CA, USA, 15–18 February 2011; pp. 91–103. [Google Scholar]
- Yadgar, G.; Yaakobi, E.; Schuster, A. Write Once, Get 50% Free: Saving SSD Erase Costs Using WOM Codes. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST 15), Santa Clara, CA, USA, 16–20 February 2015; pp. 257–271. [Google Scholar]
- Ae, J.; Hong, Y. Efficient Garbage Collection Algorithm for Low Latency SSD. Electronics 2022, 11, 1084. [Google Scholar] [CrossRef]
- Zhang, Q.; Li, X.; Wang, L.; Zhang, T.; Wang, Y.; Shao, Z. Lazy-RTGC: A Real-Time Lazy Garbage Collection Mechanism with Jointly Optimizing Average and Worst Performance for NAND Flash Memory Storage Systems. ACM Trans. Des. Autom. Electron. Syst. (TODAES) 2015, 20, 1–32. [Google Scholar] [CrossRef]
- Cheng, W.; Wang, X.; Zhang, S.; Li, J. Lifespan-Based Garbage Collection to Improve SSD’s Reliability and Performance. J. Parallel Distrib. Comput. 2022, 164, 28–39. [Google Scholar] [CrossRef]
- Kim, J.; Kang, S.; Park, Y.; Kim, J. Networked SSD: Flash Memory Interconnection Network for High-Bandwidth SSD. In Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture (MICRO), Chicago, IL, USA, 1–5 October 2022; pp. 388–403. [Google Scholar]
- Kim, J.; Jung, M.; Kim, J. Decoupled SSD: Rethinking SSD Architecture Through Network-Based Flash Controllers. In Proceedings of the 50th Annual International Symposium on Computer Architecture (ISCA ′23), Orlando, FL, USA, 17–21 June 2023; Association for Computing Machinery: New York, NY, USA, 2023; Article 61, pp. 1–13. [Google Scholar]
- Mielke, N.; Alam, S.; Goodman, A. Bit Error Rate in NAND Flash Memories. In Proceedings of the IEEE International Reliability Physics Symposium, Phoenix, AZ, USA, 27 April–1 May 2008; pp. 9–19. [Google Scholar]
- Cai, Y.; Haratsch, E.F.; Mutlu, O.; Mai, K. Error Patterns in MLC NAND Flash Memory: Measurement, Characterization, and Analysis. In Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE), Dresden, Germany, 12–16 March 2012; pp. 521–526. [Google Scholar]
- Chang, W.; Lim, Y.; Cho, J. An Efficient Copy-Back Operation Scheme Using Dedicated Flash Memory Controller in Solid-State Disks. Int. J. Electr. Energy 2014, 2, 13–17. [Google Scholar] [CrossRef]
- Pang, S.; Wang, X.; Zhang, Y.; Chen, L.; Liu, J.; Wu, D. PcGC: A Parity-Check Garbage Collection for Boosting 3-D NAND Flash Performance. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2023, 42, 4364–4377. [Google Scholar] [CrossRef]
- Rostedt, S. Ftrace Linux Kernel Tracing. In Proceedings of the Linux Conference Japan, Tokyo, Japan, 5–6 October 2010. [Google Scholar]
- Brunelle, A.D. Block I/O Layer Tracing: Blktrace; Hewlett Packard Company: Cupertino, CA, USA, 2006; p. 57. [Google Scholar]
- Lee, J.; Kim, Y.; Shipman, G.M.; Oral, S.; Wang, F.; Kim, J. A Semi-Preemptive Garbage Collector for Solid State Drives. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, Austin, TX, USA, 4–6 April 2011; pp. 12–21. [Google Scholar]
- Lee, J.; Kim, Y.; Shipman, G.M.; Oral, S.; Kim, J. Preemptible I/O Scheduling of Garbage Collection for Solid State Drives. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2013, 32, 247–260. [Google Scholar] [CrossRef]
- Zhang, Q.; Li, X.; Wang, L.; Zhang, T.; Wang, Y.; Shao, Z. Optimizing Deterministic Garbage Collection in NAND Flash Storage Systems. In Proceedings of the 21st IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), Seattle, WA, USA, 13–16 April 2015. [Google Scholar]
- Kishani, M.; Ahmadian, S.; Asadi, H. A Modeling Framework for Reliability of Erasure Codes in SSD Arrays. IEEE Trans. Comput. 2020, 69, 649–665. [Google Scholar] [CrossRef]
- Micron. NAND Flash Memory MT29F4G08ABADAH4 16Gb Asynchronous/Synchronous NAND Features Datasheet. Available online: https://www.micron.com/products/nand-flash/slc-nand/part-catalog/mt29f4g08abadah4-ait (accessed on 15 July 2020).
- Sysbench Manual. Available online: http://imysql.com/wp-content/uploads/2014/10/sysbench-manual.pdf (accessed on 22 December 2020).
- MySQL 8.0 Reference Manual. Available online: https://dev.mysql.com/doc/refman/8.0/en (accessed on 22 December 2020).
- Cooper, B.F.; Silberstein, A.; Tam, E.; Ramakrishnan, R.; Sears, R. Benchmarking Cloud Serving Systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing (SOCC 10), Indianapolis, IN, USA, 10–11 June 2010. [Google Scholar]
- Apache Cassandra Documentation v3.11.13. Available online: https://cassandra.apache.org/doc/3.11.13/ (accessed on 22 December 2020).
- MongoDB Documentation v6.0. Available online: https://www.mongodb.com/ko-kr/docs/v6.0/ (accessed on 13 July 2021).
- RocksDB Documentation. Available online: http://rocksdb.org/docs/getting-started.html (accessed on 13 July 2021).
- Phoronix Test Suite v10.8.4 User Manual. Available online: https://github.com/phoronix-test-suite/phoronix-test-suite/blob/master/documentation/phoronix-test-suite.pdf (accessed on 21 June 2023).
Features | Value |
---|---|
Number of Channels | 8 |
Planes per Channel | 4 |
Blocks per Plane | 512 |
Pages per Block | 64 |
Page Size (KB) | 4 |
Delay | Value |
---|---|
SRAM read (μs) | 12 |
SRAM write (μs) | 12 |
NAND read (μs) | 25 |
NAND write (μs) | 200 |
NAND erase (μs) | 700 |
Interconnect (μs) | 714 |
Workloads | Cassandra | Dbench | MySQL | SQLite | MongoDB | RocksDB |
---|---|---|---|---|---|---|
Read Ratio (%) | 80.2 | 0.0 | 0.0 | 0.0 | 36.9 | 50.5 |
Write Ratio (%) | 19.8 | 100.0 | 100.0 | 100.0 | 63.1 | 49.5 |
Seq. Request (%) | 56.8 | 13.2 | 9.8 | 18.7 | 18.7 | 9.9 |
Rand. Request (%) | 43.2 | 86.8 | 90.2 | 81.3 | 81.3 | 90.1 |
Aver. Read Size (KB) | 70.5 | 4.0 | 0.0 | 4.0 | 6.4 | 4.9 |
Aver. Write Size (KB) | 103.7 | 7.6 | 5.8 | 4.9 | 4.7 | 4.5 |
Avg. Read Latency (ms) | Cassandra | Dbench | MySQL | SQLite | MongoDB | RocksDB |
Conventional | 1.729 | - | - | - | 0.826 | 0.856 |
Proposed | 1.712 | - | - | - | 0.824 | 0.855 |
Avg. Write Latency (ms) | Cassandra | Dbench | MySQL | SQLite | MongoDB | RocksDB |
Conventional | 20.714 | 8.981 | 1.159 | 1.151 | 1.118 | 1.467 |
Proposed | 15.137 | 6.811 | 1.076 | 1.097 | 1.067 | 1.461 |
Proposed/Conventional Improvement Ratio (%) | Cassandra | Dbench | MySQL | SQLite | MongoDB | RocksDB |
Avg. Read latency | 0.983 | - | - | - | 0.242 | 0.117 |
Avg. Write latency | 26.924 | 24.162 | 7.161 | 4.692 | 4.562 | 0.409 |
Max. Read Latency (ms) | Cassandra | Dbench | MySQL | SQLite | MongoDB | RocksDB |
Conventional | 181.257 | - | - | - | 2.832 | 21.686 |
Proposed | 90.625 | - | - | - | 2.474 | 20.625 |
Max. Write Latency (ms) | Cassandra | Dbench | MySQL | SQLite | MongoDB | RocksDB |
Conventional | 1055.123 | 977.507 | 29.450 | 3.776 | 3.602 | 51.816 |
Proposed | 758.686 | 836.451 | 25.372 | 3.341 | 3.200 | 50.637 |
Proposed/Conventional Improvement Ratio (%) | Cassandra | Dbench | MySQL | SQLite | MongoDB | RocksDB |
Max. Read latency | 50.002 | - | - | - | 12.641 | 4.893 |
Max. Write latency | 28.095 | 14.430 | 13.847 | 11.520 | 11.160 | 2.275 |
Standard Deviation | Cassandra | Dbench | MySQL | SQLite | MongoDB | RocksDB |
---|---|---|---|---|---|---|
Avg. read latency | 0.003 | - | - | - | 0.000 | 0.000 |
Avg. write latency | 0.430 | 0.395 | 0.002 | 0.001 | 0.000 | 0.001 |
Max. read latency | 1.758 | - | - | - | 0.057 | 0.676 |
Max. write latency | 6.893 | 6.227 | 1.199 | 0.117 | 0.133 | 1.053 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ahn, S.; Im, D.; You, D.; Hong, Y. High-Performance Garbage Collection Scheme with Low Data Transfer Overhead for NoC-Based SSDC. Electronics 2024, 13, 4838. https://doi.org/10.3390/electronics13234838
Ahn S, Im D, You D, Hong Y. High-Performance Garbage Collection Scheme with Low Data Transfer Overhead for NoC-Based SSDC. Electronics. 2024; 13(23):4838. https://doi.org/10.3390/electronics13234838
Chicago/Turabian StyleAhn, Seyeon, Donghyuk Im, Donggon You, and Youpyo Hong. 2024. "High-Performance Garbage Collection Scheme with Low Data Transfer Overhead for NoC-Based SSDC" Electronics 13, no. 23: 4838. https://doi.org/10.3390/electronics13234838
APA StyleAhn, S., Im, D., You, D., & Hong, Y. (2024). High-Performance Garbage Collection Scheme with Low Data Transfer Overhead for NoC-Based SSDC. Electronics, 13(23), 4838. https://doi.org/10.3390/electronics13234838