Adaptive Regression Prefetching Algorithm by Using Big Data Application Characteristics
Abstract
:1. Introduction
- We conducted an analysis of complicated and indirect memory access patterns from graph workloads to design a data prefetch algorithm by determining dynamic memory access patterns as a design application method utilizing big-data execution characteristics.
- We designed an adaptive optimized regression prefetch scheme that can select from a dynamic set of prefetch engines by using a machine-learning approach.
- We proposed a novel page-management mechanism that leverages the cost-effectiveness of PCM’s characteristics by efficiently utilizing both DRAM and PCM.
2. Related Work
2.1. Prefetching Scheme
2.2. Hybrid Memory
3. Main Architecture
3.1. Overall Architecture
3.1.1. DRAM Buffer
3.1.2. Hybrid Main Memory Management Unit
3.1.3. Arbitrator
3.2. Cache Miss Analysis
3.3. Regression Analysis
3.4. Prefetch Table
3.4.1. Entry Table
3.4.2. Offset Table
3.5. Prefetch Engine
3.5.1. Next Lined Prefetch Engine
Algorithm 1: Next Lined Prefetching Algorithm. |
//Step 1: for offset in Entry table find offset size //Step 2: if offsetSize < 3 Offset x ← Last offset x PrefetchAdd ← 0 else PrefetchAdd ← Nextline Prefetch(reqAdd) return PrefetchAdd |
3.5.2. Regular Stride Prefetch Engine
Algorithm 2: Regular stride prefetch engine. |
//Step 1: for offset in Offset table find last offset size //Step 2: if offsetSize == 3 Delta[i] = offset[x-i] − offset[x-i-1] if Delta [0] = Delta [1] PrefetchAdd ← reqAdd + Delta [0] //Step 3: else DeltaInter[i-1] = Delta[i] − Delta[i-1] if Delta [0] == Delta [1] && Delta [1] = Delta [2] PrefetchAdd ← reqAdd + Delta [0] else if DeltaInter [0] == DeltaInter [1] PrefetchAdd ← reqAdd + Delta [0] + DeltaInter [0] return PrefetchAdd |
3.5.3. Optimized LWR Prefetch Engine
Algorithm 3: Optimized Linear Regression Engine. |
//Step 1: for address in Entry table if offsetTable[i].dataSize > offsetSize offsetEntries ← offsetTable[i] //Step 2: x ← i+1 y ← offset[i]Value cc ← calCC(offsetEntries) Δoffset = offsetTable.max - offsetTable.min k ← calK(cc, Δoffset, pageSize) //function 5 weight[i] ← calWeight(y[i], y[maxNum], k) //function 4 //Step 3: //find regression coefficient sortAscendingOrder(sortArray, offsetTable[i]) y[i] ← weight[i] * y[i] coefficient ← calRegressionCoefficient(sortArray) //predict offset predictOffsetValue(coefficient, offsetSize + 1) PrefetchAdd ← entryTable[i] + predictedoffset return PrefetchAdd |
3.5.4. Adaptive Hyperparameter Setting for Gaussian Kernel
4. Evaluations
4.1. Workload Characteristics
4.2. Simulation Configurations
4.3. Performance Evaluation
4.3.1. Optimal Size of Hybrid Main Memory
4.3.2. Optimal Size of Dram Buffer
4.3.3. Optimal Size of Offset
4.3.4. Optimal Size of DRAM Buffer
4.4. Overall Performance Analysis
- From a global perspective, pattern recognition was performed for different patterns, and three different prefetch engines were designed.
- Local linear regression was optimized and improved, and an intelligent regression algorithm for memory prefetching in irregular patterns was proposed.
- Several experiments were performed to choose the most suitable parameter configuration.
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Wang, J.; Yang, Y.; Wang, T.; Sherratt, R.S.; Zhang, J. Big data service architecture: A survey. J. Internet Technol. 2020, 21, 393–405. [Google Scholar]
- Aye, K.N.; Thein, T. A platform for big data analytics on distributed scale-out storage system. Int. J. Big Data Intell. 2015, 2, 127–141. [Google Scholar] [CrossRef]
- Liu, N.; Li, D.-S.; Zhang, Y.-M.; Li, X.-L. Large-scale graph processing systems: A survey. Front. Inf. Technol. Electron. Eng. 2020, 21, 384–404. [Google Scholar] [CrossRef]
- Somgyi, S.; Wenisch, T.F.; Aliamaki, A.; Falsadi, B.; Moshovos, A. Spatial memory streaming. ACM SIGARCH Comput. Archit. News 2006, 34, 252–263. [Google Scholar] [CrossRef] [Green Version]
- Fu, J.W.; Patel, J.H.; Janssens, B.L. Stride directed prefetching in scalar processors. ACM SIGMICRO Newsl. 1992, 23, 102–110. [Google Scholar] [CrossRef]
- Nesbit, K.J.; Smith, J.E. Data cache prefetching using a global history buffer. In Proceedings of the 10th International Symposium on High Performance Computer Architecture (HPCA’04), Madrid, Spain, 14–18 February 2004. [Google Scholar]
- Manuel Le Gallo and Abu Sebastian: An overview of phase-change memory device physics. J. Phys. D Appl. Phys. 2020, 53, 213002. [CrossRef]
- Yoon, S.K.; Youn, Y.S.; Nam, S.J.; Son, M.H.; Kim, S.D. Optimized memory-disk integrated system with dram and nonvolatile memory. IEEE Trans. Multi-Scale Comput. Syst. 2016, 2, 83–93. [Google Scholar] [CrossRef]
- Luk, C.K.; Cohn, R.; Muth, R.; Patil, H.; Klauser, A.; Lowney, G.; Hazelwood, K. Pin: Building customized program analysis tools with dynamic instrumentation. ACM SIGPLAN Not. 2005, 40, 190–200. [Google Scholar] [CrossRef]
- Iosup, A.; Hegeman, T.; Ngai, W.L.; Heldens, S.; Prat-Pérez, A.; Manhardto, T.; Chafio, H.; Capotă, M.; Sundaram, N.; Anderson, M.; et al. Ldbc graphalytics: A benchmark for large-scale graph analysis on parallel and distributed platforms. Proc. VLDB Endow. 2016, 9, 1317–1328. [Google Scholar] [CrossRef] [Green Version]
- Charney, M.J.; Puzak, T.R. Prefetching and memory system behavior of the SPEC95 benchmark suite. IBM J. Res. Dev. 1997, 41, 265–286. [Google Scholar] [CrossRef] [Green Version]
- Talati, N.; May, K.; Behroozi, A.; Yang, Y.; Kaszyk, K.; Vasiladiotis, C.; Verma, T.; Li, L.; Nguyen, B.; Sun, J.; et al. Prodigy: Improving the Memory Latency of Data-Indirect Irregular Workloads Using Hardware-Software Co-Design. In Proceedings of the IEEE International Symposium on High-Performance Computer Architecture (HPCA), Seoul, Republic of Korea, 27 February–3 March 2021; pp. 654–667. [Google Scholar]
- Ishii, Y.; Inaba, M.; Hiraki, K. Access map pattern matching for high performance data cache prefetch. J. Instr. Level Parallelism 2011, 13, 1–24. [Google Scholar]
- Michaud, P. Best-offset hardware prefetching. In Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA), Barcelona, Spain, 12–16 March 2016; pp. 469–480. [Google Scholar]
- Yun, J.T.; Yoon, S.K.; Kim, J.G.; Burgstaller, B.; Kim, S.D. Regression prefetcher with preprocessing for dram-pcm hybrid main memory. IEEE Comput. Archit. Lett. 2018, 17, 163–166. [Google Scholar] [CrossRef]
- Kim, J.G.; Jo, Y.S.; Yoon, S.K.; Kim, S.D. History table-based linear analysis method for DRAM-PCM hybrid memory system. J. Supercomput. 2021, 77, 12924–12952. [Google Scholar] [CrossRef]
- Zhang, M.; Kim, J.-G.; Yoon, S.-K.; Kim, S.-D. Dynamic recognition prefetch engine for DRAM-PCM hybrid main memory. J. Supercomput. 2022, 78, 1885–1902. [Google Scholar] [CrossRef]
- Choi, J.H.; Park, G.H. Demand-aware nvm capacity management policy for hybrid cache architecture. Comput. J. 2016, 59, 685–700. [Google Scholar] [CrossRef]
- Ramos, L.E.; Gorbatov, E.; Bianchini, R. Page placement in hybrid memory systems. In Proceedings of the International Conference on Supercomputing, New York, NY, USA, 31 May–4 June 2011; pp. 85–95. [Google Scholar]
- Gelman, A.; Hill, J. (Eds.) Data Analysis Using Regression and Multi-Level/Hierarchical Models; Cambridge University Press: Cambridge, UK, 2006. [Google Scholar]
- Chen, S.; Gibbons, P.B.; Nath, S. Rethinking database algorithms for phase change memory. CIDR 2011, 11, 9–12. [Google Scholar]
- Park, K.H.; Park, Y.; Hwang, W.; Park, K.W. Mn-mate: Resource management of many cores with DRAM and nonvolatile memories. In Proceedings of the IEEE 12th International Conference on High Performance Computing and Communications, Melbourne, VIC, Australia, 1–3 September 2010. [Google Scholar]
Address Sequence Pattern | Address Sequence Instance |
---|---|
Stabile stride memory access pattern | 0x80000080, 0x80000080, 0x80000100, 0x80000180 |
Linear transformation stride pattern | 0x80000000, 0x80000080, 0x80000180, 0x80000300 |
Irregular memory access pattern | Other |
Workload | Computation Type, Feature, Use Case |
---|---|
BFS | Graph traversing algorithm, indirect memory access patterns (e.g., irregular, read-intensive memory requests), similarity search and finding maximum flow |
Connected component (CCOMP) | Connectivity computation for graphs, irregular memory access with read-intensiveness, social graph analysis |
Degree centrality (Dcentr) | Connectivity computation for graphs, indirect and irregular memory accesses, social graph analysis |
Shortest (SPath) | Finding global optima algorithm for graph structure, indirect memory accesses with read-intensive characteristics, street navigation |
Page Rank (Prank) | Iterative computations for graph analysis, Compute-intensive with indirect memory requests, prioritizing web pages |
Processor | Quad-Cores, 4 GHz |
---|---|
L1 Instruction Cache (per core, private) | 32 KB, 8-way set associativity, 64-byte cache line size, LRU replacement |
L1 Data Cache (per core, private) | 32 KB, 8-way set associativity, 64-byte cache line size, LRU replacement |
L2 Unified Cache (per core, private) | 256 KB, 4-way set associativity, 64-byte cache line size, LRU replacement |
L3 Cache (LLC) (per processor, shared) | 8 MB, 16-way set associativity, 64-byte cache line size, LRU replacement |
DRAM Buffer | 16 MB, fully associative, 4 KB page size (managed as page-granularity), LRU replacement |
DRAM | 128 MB, fully associative, 4 KB page size (managed as pagegranularity), LRU replacement |
PCM | 2 GB, fully associative, 4 KB page size (managed as page-granularity), LRU replacement |
Parameter | DRAM | PCM | HDD |
---|---|---|---|
Write latency | 20–50 ns | 1 ns | 5 ms |
Read latency | 20–50 ns | 50 ns | 5 ms |
Write energy | 1.2 J/GB | 6 J/GB | 65 J/GB |
Read energy | 0.8 J/GB | 1 J/GB | 65 J/GB |
Idle power | 100 mW/GB | 1 mW/GB | 10 W/TB |
Density | 1× | 4× | N/A |
Cost | 4× | 1× | N/A |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, M.; Tang, Q.; Kim, J.-G.; Burgstaller, B.; Kim, S.-D. Adaptive Regression Prefetching Algorithm by Using Big Data Application Characteristics. Appl. Sci. 2023, 13, 4436. https://doi.org/10.3390/app13074436
Zhang M, Tang Q, Kim J-G, Burgstaller B, Kim S-D. Adaptive Regression Prefetching Algorithm by Using Big Data Application Characteristics. Applied Sciences. 2023; 13(7):4436. https://doi.org/10.3390/app13074436
Chicago/Turabian StyleZhang, Mengzhao, Qian Tang, Jeong-Geun Kim, Bernd Burgstaller, and Shin-Dug Kim. 2023. "Adaptive Regression Prefetching Algorithm by Using Big Data Application Characteristics" Applied Sciences 13, no. 7: 4436. https://doi.org/10.3390/app13074436
APA StyleZhang, M., Tang, Q., Kim, J. -G., Burgstaller, B., & Kim, S. -D. (2023). Adaptive Regression Prefetching Algorithm by Using Big Data Application Characteristics. Applied Sciences, 13(7), 4436. https://doi.org/10.3390/app13074436