A Machine Learning Solution for Data Center Thermal Characteristics Analysis
Abstract
:1. Introduction
- RO.1.
- To identify a clustering (grouping) algorithm that is appropriate for the purpose of this research;
- RO.2.
- To determine the criteria for feature selection in the analysis of DC IT room thermal characteristics;
- RO.3.
- To determine the optimal number of clusters for the analysis of thermal characteristics;
- RO.4.
- To perform sequential clustering and interpretation of results for repeated time series of air temperature measurements;
- RO.5.
- To identify servers that most frequently occur in cold or hot air temperature ranges (and clusters);
- RO.6.
- To provide recommendations related to IT room thermal management with the aim of appropriately addressing servers overheating issue.
2. Background and Related Work
3. Methodology
3.1. Cluster and Dataset Description
3.2. Data Analytics
- The number of features used for clustering was small. Therefore, the formulated clustering problem was simple and did not require complex algorithms;
- K-means has linear computational complexity and is fast to use for the problem in question. While the formulation of the problem is simple, it requires several thousands of repetitions of clustering for each set of nodes. From this point of view, the speed of the algorithm becomes an influential factor;
- K-means has a weak point, namely the random choice of initial centroids, which could lead to different results when different random generators are used. This does not pose any issue in this use case since the nodes are clustered several times based on sets of measurements taken at different timestamps and minor differences brought by the randomness are mitigated by the repetition of the clustering procedure.
4. Results and Discussions
5. Conclusions and Future Work
- Explore the effectiveness of the cooling system by firstly uncovering nodes with hot range IDs (e.g., change direction, volume, speed of cooling air). Additionally, directional cooling could be recommended (e.g., spot cooling to cool overheated nodes). Next, unravel covert factors that lead to nodes’ repetitive overheating (e.g., location next to the PDUs that have higher allowable temperature ranges);
- Revise cluster load scheduling so that these frequently overheated servers are not overloaded in the future (note: this is to enable an even thermal distribution within the IT room. See [11] for details). In other words, it is recommended to formulate a resource allocation policy for the purpose of a more even thermal distribution of ambient air temperature;
- Perform continuous environmental monitoring of the IT room and evaluate the effectiveness of recommended actions and their influence on the ambient temperature.
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
Appendix A
References
- Hashem, I.A.T.; Chang, V.; Anuar, N.B.; Adewole, K.S.; Yaqoob, I.; Gani, A.; Ahmed, E.; Chiroma, H. The role of big data in smart city. Int. J. Inf. Manag. 2016, 36, 748–758. [Google Scholar] [CrossRef] [Green Version]
- Zhang, K.; Zhang, Y.; Liu, J.; Niu, X. Recent advancements on thermal management and evaluation for data centers. Appl. Therm. Eng. 2018, 142, 215–231. [Google Scholar] [CrossRef]
- Datacenter Knowledge. A Critical Look at Mission-Critical Infrastructure. 2018. Available online: https://www.datacenterknowledge.com/industry-perspectives/critical-look-mission-critical-infrastructure (accessed on 26 June 2020).
- Hartmann, B.; Farkas, C. Energy efficient data centre infrastructure—Development of a power loss model. Energy Build. 2016, 127, 692–699. [Google Scholar] [CrossRef]
- He, Z.; Ding, T.; Liu, Y.; Li, Z. Analysis of a district heating system using waste heat in a distributed cooling data center. Appl. Therm. Eng. 2018, 141, 1131–1140. [Google Scholar] [CrossRef]
- Nadjahi, C.; Louahlia, H.; Lemasson, S. A review of thermal management and innovative cooling strategies for data center. Sustain. Comput. Inform. Syst. 2018, 19, 14–28. [Google Scholar] [CrossRef]
- AT Committee. Data Center Power Equipment Thermal Guidelines and Best Practices Whitepaper. ASHRAE, Tech. Rep., 2016. Available online: https://tc0909.ashraetcs.org/documents/ASHRAE_TC0909_Power_White_Paper_22_June_2016_REVISED.pdf (accessed on 6 June 2019).
- Patterson, M.K. The effect of data center temperature on energy efficiency. In Proceedings of the 11th Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems, Orlando, FL, USA, 28–31 May 2008; pp. 1167–1174. [Google Scholar] [CrossRef] [Green Version]
- Grishina, A. Data Center Energy Efficiency Assessment Based on Real Data Analysis. Unpublished PERCCOM Masters Dissertation. 2019. [Google Scholar]
- Capozzoli, A.; Serale, G.; Liuzzo, L.; Chinnici, M. Thermal metrics for data centers: A critical review. Energy Procedia 2014, 62, 391–400. [Google Scholar] [CrossRef] [Green Version]
- De Chiara, D.; Chinnici, M.; Kor, A.-L. Data mining for big dataset-related thermal analysis of high performance (HPC) data center. In International Conference on Computational Science; Springer: Cham, NY, USA, 2020; pp. 367–381. [Google Scholar]
- Chinnici, M.; Capozzoli, A.; Serale, G. Measuring energy efficiency in data centers. In Pervasive Computing: Next Generation Platforms for Intelligent Data Collection; Dobre, C., Xhafa, F., Eds.; Morgan Kaufmann: Burlington, MA, USA, 2016; Chapter 10; pp. 299–351. ISBN 9780128037027. [Google Scholar]
- Infoworld. Facebook Heat Maps Pinpoint Data Center Trouble Spots. 2012. Available online: https://www.infoworld.com/article/2615039/facebook-heat-maps-pinpoint-data-center-trouble-spots.html (accessed on 20 June 2020).
- Bash, C.E.; Patel, C.D.; Sharma, R. Efficient thermal management of data centers—Immediate and long-term research needs. HVAC&R Res. 2003, 9, 137–152. [Google Scholar] [CrossRef]
- Fernández-Cerero, D.; Fernández-Montes, A.; Velasco, F.P. Productive Efficiency of Energy-Aware Data Centers. Energies 2018, 11, 2053. [Google Scholar] [CrossRef] [Green Version]
- Fredriksson, S.; Gustafsson, J.; Olsson, D.; Sarkinen, J.; Beresford, A.; Kaufeler, M.; Minde, T.B.; Summers, J. Integrated thermal management of a 150 kW pilot Open Compute Project style data center. In Proceedings of the 2019 IEEE 17th International Conference on Industrial Informatics (INDIN), Helsinki, Finland, 22–25 July 2019; Volume l, pp. 1443–1450. [Google Scholar]
- Srinarayana, N.; Fakhim, B.; Behnia, M.; Armfield, S.W. Thermal performance of an air-cooled data center with raised-floor and non-raised-floor configurations. Heat Transf. Eng. 2013, 35, 384–397. [Google Scholar] [CrossRef]
- Schmidt, R.R.; Cruz, E.E.; Iyengar, M. Challenges of data center thermal management. IBM J. Res. Dev. 2005, 49, 709–723. [Google Scholar] [CrossRef]
- MirhoseiniNejad, S.; Moazamigoodarzi, H.; Badawy, G.; Down, D.G. Joint data center cooling and workload management: A thermal-aware approach. Future Gener. Comput. Syst. 2020, 104, 174–186. [Google Scholar] [CrossRef]
- Fang, Q.; Wang, J.; Gong, Q.; Song, M.-X. Thermal-aware energy management of an HPC data center via two-time-scale control. IEEE Trans. Ind. Inform. 2017, 13, 2260–2269. [Google Scholar] [CrossRef]
- Zhang, S.; Zhou, T.; Ahuja, N.; Refai-Ahmed, G.; Zhu, Y.; Chen, G.; Wang, Z.; Song, W.; Ahuja, N. Real time thermal management controller for data center. In Proceedings of the Fourteenth Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems (ITherm), Orlando, FL, USA, 27 May 2014; pp. 1346–1353. [Google Scholar]
- Sharma, R.; Bash, C.; Patel, C.; Friedrich, R.; Chase, J.S. Balance of Power: Dynamic Thermal Management for Internet Data Centers. IEEE Internet Comput. 2005, 9, 42–49. [Google Scholar] [CrossRef] [Green Version]
- Kubler, S.; Rondeau, E.; Georges, J.P.; Mutua, P.L.; Chinnici, M. Benefit-cost model for comparing data center performance from a biomimicry perspective. J. Clean. Prod. 2019, 231, 817–834. [Google Scholar] [CrossRef]
- Capozzoli, A.; Chinnici, M.; Perino, M.; Serale, G. Review on performance metrics for energy efficiency in data center: The role of thermal management. Lect. Notes Comput. Sci. 2015, 8945, 135–151. [Google Scholar]
- Grishina, A.; Chinnici, M.; De Chiara, D.; Guarnieri, G.; Kor, A.-L.; Rondeau, E.; Georges, J.-P. DC Energy Data Measurement and Analysis for Productivity and Waste Energy Assessment. In Proceedings of the 2018 IEEE International Conference on Computational Science and Engineering (CSE), Bucharest, Romania, 29–31 October 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–11, ISBN 978-1-5386-7649-3. [Google Scholar]
- Koronen, C.; Åhman, M.; Nilsson, L.J. Data centres in future European energy systems—Energy efficiency, integration and policy. Energy Effic. 2019, 13, 129–144. [Google Scholar] [CrossRef] [Green Version]
- Grishina, A.; Chinnici, M.; De Chiara, D.; Rondeau, E.; Kor, A.L. Energy-Oriented Analysis of HPC Cluster Queues: Emerging Metrics for Sustainable Data Center; Springer: Dubrovnik, Croatia, 2019; pp. 286–300. [Google Scholar]
- Grishina, A.; Chinnici, M.; Kor, A.L.; Rondeau, E.; Georges, J.P.; De Chiara, D. Data center for smart cities: Energy and sustainability issue. In Big Data Platforms and Applications—Case Studies, Methods, Techniques, and Performance Evaluation; Pop, F., Ed.; Springer: Berlin, Germany, 2020. [Google Scholar]
- Athavale, J.; Yoda, M.; Joshi, Y.K. Comparison of data driven modeling approaches for temperature prediction in data centers. Int. J. Heat Mass Transf. 2019, 135, 1039–1052. [Google Scholar] [CrossRef]
- Kaufman, L.; Rousseeuw, P.J. Finding Groups in Data: An Introduction to Cluster Analysis; John Wiley & Sons: Hoboken, NJ, USA, 2009. [Google Scholar]
- Kassambara, A. (Ed.) Determining the Optimal Number of Clusters: 3 Must Know Methods. Available online: https://www.datanovia.com/en/lessons/determining-the-optimal-number-of-clusters-3-must-know-methods/ (accessed on 6 May 2019).
- Fernández-Cerero, D.; Fernández-Montes, A.; Ortega, J.A. Energy policies for data-center monolithic schedulers. Expert Syst. Appl. 2018, 110, 170–181. [Google Scholar] [CrossRef]
- Yuan, H.; Bi, J.; Tan, W.; Zhou, M.; Li, B.H.; Li, J. TTSA: An Effective Scheduling Approach for Delay Bounded Tasks in Hybrid Clouds. IEEE Trans. Cybern. 2017, 47, 3658–3668. [Google Scholar] [CrossRef]
- Yuan, H.; Bi, J.; Zhou, M.; Sedraoui, K. WARM: Workload-Aware Multi-Application Task Scheduling for Revenue Maximization in SDN-Based Cloud Data Center. IEEE Access 2018, 6, 645–657. [Google Scholar] [CrossRef]
- Fernández-Cerero, D.; Irizo, F.J.O.; Fernández-Montes, A.; Velasco, F.P. Bullfighting extreme scenarios in efficient hyper-scale cluster computing. In Cluster Computing; Springer: Berlin/Heidelberg, Germany, 2020; pp. 1–17. [Google Scholar] [CrossRef]
- Fernández-Cerero, D.; Fernández-Montes, A.; Jakobik, A.; Kołodziej, J.; Toro, M. SCORE: Simulator for cloud optimization of resources and energy consumption. Simul. Model. Pract. Theory 2018, 82, 160–173. [Google Scholar] [CrossRef]
- Bi, J.; Yuan, H.; Tan, W.; Zhou, M.; Fan, Y.; Zhang, J.; Li, J. Application-Aware Dynamic Fine-Grained Resource Provisioning in a Virtualized Cloud Data Center. IEEE Trans. Autom. Sci. Eng. 2015, 14, 1172–1184. [Google Scholar] [CrossRef]
- Klimova, A.; Rondeau, E.; Andersson, K.; Porras, J.; Rybin, A.; Zaslavsky, A. An international Master’s program in green ICT as a contribution to sustainable development. J. Clean. Prod. 2016, 135, 223–239. [Google Scholar] [CrossRef]
Time Label | Real Time of Measurement | Node ID | Inlet T (°C) | Exhaust T (°C) | CPU 1 T (°C) | CPU 2 T (°C) | Cluster Label |
---|---|---|---|---|---|---|---|
Cluster Type | |||||
Ratio% | 2.8 | 86.0 | 11.2 | ||
Cluster Type | |||||
Ratio% | 4.2 | 20.0 | 28.4 | 31.2 | 16.2 |
Cluster Type | |||||
Ratio% | 2.0 | 63.0 | 35.0 | ||
Cluster Type | |||||
Ratio% | 0.5 | 40.0 | 8.0 | ||
Hot Range Node ID | - | - | 30, 31, 32, 45, 46, 48, 68, 79, 94, 96, 105, 117, 118, 120, 182, 183, 189, 198 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Grishina, A.; Chinnici, M.; Kor, A.-L.; Rondeau, E.; Georges, J.-P. A Machine Learning Solution for Data Center Thermal Characteristics Analysis. Energies 2020, 13, 4378. https://doi.org/10.3390/en13174378
Grishina A, Chinnici M, Kor A-L, Rondeau E, Georges J-P. A Machine Learning Solution for Data Center Thermal Characteristics Analysis. Energies. 2020; 13(17):4378. https://doi.org/10.3390/en13174378
Chicago/Turabian StyleGrishina, Anastasiia, Marta Chinnici, Ah-Lian Kor, Eric Rondeau, and Jean-Philippe Georges. 2020. "A Machine Learning Solution for Data Center Thermal Characteristics Analysis" Energies 13, no. 17: 4378. https://doi.org/10.3390/en13174378