Data Imputation in Electricity Consumption Profiles through Shape Modeling with Autoencoders
Abstract
:1. Introduction
2. Proposed Methodology
2.1. By-Row Normalization
2.2. Shape Encoding Using Autoencoders
2.3. Total Profile Imputation
2.4. Partial Profile Imputation
3. Examples
3.1. Example 1
3.1.1. Shape Extraction and Modeling
- Day of the year (): with for January 1 and for December 31.
- Type of day (): a discrete variable whose values are 0 for regular days, 1 for Saturdays, and 2 for holidays.
3.1.2. Energy Modeling
3.1.3. Data Imputation
- The histograms of daily energy consumption confirm that the low values corresponding to the missing days have been corrected. Both show a bimodal distribution, but the corrected one has more high values.
- Three typical profiles were found using the k-means clustering algorithm. The profiles are quite similar, showing that the imputation has maintained the typical shapes.
- The variability of the data can be visualized using the box plots of each slot. Comparison of these plots shows that the imputation of the data has transformed the zero values into values closer to the boxes. In addition, it is observed that in the bands with lower variability (early morning), the imputation is performed both above and below the central values.
3.2. Example 2
3.2.1. Experiment 1
3.2.2. Experiment 2
3.3. Example 3
- Zero Order Hold (ZOH): missing data are replaced with the last valid data available.
- Mean: missing data are replaced by the average of the available data.
- Linear interpolation: missing data are replaced by linear interpolation between the last and the next available data.
4. Some Remarks on Shape Encoding
5. Imputation Python Library
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Kaszowska, B.; Wóczyk, A.; Zmarzy, D. Assessment of available measurement data, data breaks and estimation of missing data from AMI meters. In Proceedings of the 2019 Modern Electric Power Systems (MEPS), Wroclaw, Poland, 9–12 September 2019. [Google Scholar] [CrossRef]
- Duarte, J.E.; Rosero-Garcia, J.; Duarte, O. Analysis of Variability in Electric Power Consumption: A Methodology for Setting Time-Differentiated Tariffs. Energies 2024, 17, 842. [Google Scholar] [CrossRef]
- Li, X.; Lei, X.; Jiang, L.; Yang, T.; Ge, Z. A New Strategy: Remaining Useful Life Prediction of Wind Power Bearings Based on Deep Learning under Data Missing Conditions. Mathematics 2024, 12, 2119. [Google Scholar] [CrossRef]
- Berthold, M.R.; Borgelt, C.; Höppner, F.; Klawonn, F.; Silipo, R. Guide to Intelligent Data Science. How to Intelligently Make Use of Real Data; Springer: Cham, Switzerland, 2020. [Google Scholar] [CrossRef]
- Enders, C. Applied Missing Data Analysis; Methodology in the Social Sciences Series; Guilford Publications: New York, NY, USA, 2022. [Google Scholar]
- Aguirre-Larracoechea, U.; Borges, C.E. Imputation for Repeated Bounded Outcome Data: Statistical and Machine-Learning Approaches. Mathematics 2021, 9, 81. [Google Scholar] [CrossRef]
- Wu, J.; Koirala, A.; Hertem, D.V. Review of statistics based coping mechanisms for Smart Meter Missing Data in Distribution Systems. In Proceedings of the 2022 IEEE PES Innovative Smart Grid Technologies Conference Europe (ISGT-Europe), Novi Sad, Serbia, 10–12 October 2022. [Google Scholar] [CrossRef]
- Li, F.; Sun, H.; Gu, Y.; Yu, G. A Noise-Aware Multiple Imputation Algorithm for Missing Data. Mathematics 2023, 11, 73. [Google Scholar] [CrossRef]
- Miao, X.; Wu, Y.; Chen, L.; Gao, Y.; Yin, J. An Experimental Survey of Missing Data Imputation Algorithms. IEEE Trans. Knowl. Data Eng. 2023, 35, 6630–6650. [Google Scholar] [CrossRef]
- Zhu, M.; Cheng, X. Iterative KNN imputation based on GRA for missing values in TPLMS. In Proceedings of the 2015 4th International Conference on Computer Science and Network Technology (ICCSNT), Harbin, China, 19–21 December 2015; Volume 1, pp. 94–99. [Google Scholar] [CrossRef]
- Twala, B.; Cartwright, M.; Shepperd, M. Comparison of various methods for handling incomplete data in software engineering databases. In Proceedings of the 2005 International Symposium on Empirical Software Engineering, Noosa Heads, QLD, Australia, 17–18 November 2005; p. 10. [Google Scholar] [CrossRef]
- Jerez, J.M.; Molina, I.; García-Laencina, P.J.; Alba, E.; Ribelles, N.; Martín, M.; Franco, L. Missing data imputation using statistical and machine learning methods in a real breast cancer problem. Artif. Intell. Med. 2010, 50, 105–115. [Google Scholar] [CrossRef] [PubMed]
- Pal, M. Random forest classifier for remote sensing classification. Int. J. Remote Sens. 2005, 26, 217–222. [Google Scholar] [CrossRef]
- Royston, P.; White, I.R. Multiple Imputation by Chained Equations (MICE): Implementation in Stata. J. Stat. Softw. 2011, 45, 1–20. [Google Scholar] [CrossRef]
- Mazumder, R.; Hastie, T.; Tibshirani, R. Spectral Regularization Algorithms for Learning Large Incomplete Matrices. J. Mach. Learn. Res. 2010, 11, 2287–2322. [Google Scholar] [PubMed]
- Lee, D.; Seung, H.S. Algorithms for Non-negative Matrix Factorization. In Proceedings of the Advances in Neural Information Processing Systems; Leen, T., Dietterich, T., Tresp, V., Eds.; MIT Press: Cambridge, MA, USA, 2000; Volume 13. [Google Scholar]
- Josse, J.; Pagès, J.; Husson, F. Multiple imputation in principal component analysis. Adv. Data Anal. Classif. 2011, 5, 231–246. [Google Scholar] [CrossRef]
- Miranda, V.; Krstulovic, J.; Keko, H.; Moreira, C.; Pereira, J. Reconstructing missing data in state estimation with autoencoders. IEEE Trans. Power Syst. 2012, 27, 604–611. [Google Scholar] [CrossRef]
- Pereira, R.C.; Santos, M.S.; Rodrigues, P.P.; Abreu, P.H. Reviewing Autoencoders for Missing Data Imputation: Technical Trends, Applications and Outcomes. J. Artif. Intell. Res. 2020, 69, 1255–1285. [Google Scholar] [CrossRef]
- Gondara, L.; Wang, K. MIDA: Multiple Imputation Using Denoising Autoencoders. In Proceedings of the Advances in Knowledge Discovery and Data Mining; Phung, D., Tseng, V.S., Webb, G.I., Ho, B., Ganji, M., Rashidi, L., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 260–272. [Google Scholar]
- Mattei, P.A.; Frellsen, J. MIWAE: Deep Generative Modelling and Imputation of Incomplete Data Sets. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019. [Google Scholar]
- Spinelli, I.; Scardapane, S.; Uncini, A. Missing data imputation with adversarially-trained graph convolutional networks. Neural Netw. 2020, 129, 249–260. [Google Scholar] [CrossRef] [PubMed]
- Moritz, S.; Bartz-Beielstein, T. imputeTS: Time Series Missing Value Imputation in R. R J. 2017, 9, 207–218. [Google Scholar] [CrossRef]
- Anindita, N.; Nugroho, H.A.; Adji, T.B. A Combination of multiple imputation and principal component analysis to handle missing value with arbitrary pattern. In Proceedings of the 2017 7th International Annual Engineering Seminar (InAES), Yogyakarta, Indonesia, 1–2 August 2017; pp. 1–5. [Google Scholar] [CrossRef]
- Peppanen, J.; Zhang, X.; Grijalva, S.; Reno, M.J. Handling bad or missing smart meter data through advanced data imputation. In Proceedings of the 2016 IEEE Power and Energy Society Innovative Smart Grid Technologies Conference (ISGT), Minneapolis, MN, USA, 6–9 September 2016; pp. 1–5. [Google Scholar] [CrossRef]
- Al-Wakeel, A.; Wu, J.; Jenkins, N. k-means based load estimation of domestic smart meter measurements. Appl. Energy 2017, 194, 333–342. [Google Scholar] [CrossRef]
- Ryu, S.; Kim, M.; Kim, H. Denoising Autoencoder-Based Missing Value Imputation for Smart Meters. IEEE Access 2020, 8, 40656–40666. [Google Scholar] [CrossRef]
- Mateos, G.; Giannakis, G.B. Load Curve Data Cleansing and Imputation Via Sparsity and Low Rank. IEEE Trans. Smart Grid 2013, 4, 2347–2355. [Google Scholar] [CrossRef]
- Kodaira, D.; Han, S. Topology-based estimation of missing smart meter readings. Energies 2018, 11, 224. [Google Scholar] [CrossRef]
- Duarte, O.G.; Rosero, J.A.; Pegalajar, M.d.C. Data Preparation and Visualization of Electricity Consumption for Load Profiling. Energies 2022, 15, 7557. [Google Scholar] [CrossRef]
- Ziyin, L.; Hartwig, T.; Ueda, M. Neural networks fail to learn periodic functions and how to fix it. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS ’20, Red Hook, NY, USA, Vancouver, BC, Canada, 6–12 December 2020. [Google Scholar]
Repetition | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Dim | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | Mean | Std |
1 | 0.097 | 0.091 | 0.09 | 0.09 | 0.09 | 0.09 | 0.089 | 0.089 | 0.089 | 0.089 | 0.090 | 0.002 |
2 | 0.141 | 0.11 | 0.091 | 0.147 | 0.091 | 0.09 | 0.091 | 0.09 | 0.091 | 0.101 | 0.104 | 0.021 |
3 | 0.101 | 0.099 | 0.09 | 0.113 | 0.09 | 0.09 | 0.136 | 0.089 | 0.134 | 0.092 | 0.103 | 0.017 |
4 | 0.095 | 0.095 | 0.131 | 0.11 | 0.09 | 0.115 | 0.092 | 0.132 | 0.133 | 0.092 | 0.109 | 0.017 |
5 | 0.134 | 0.093 | 0.133 | 0.099 | 0.16 | 0.094 | 0.133 | 0.097 | 0.134 | 0.09 | 0.117 | 0.023 |
6 | 0.102 | 0.135 | 0.133 | 0.133 | 0.133 | 0.133 | 0.131 | 0.099 | 0.131 | 0.134 | 0.126 | 0.013 |
Shape Model | Energy Model | |||
---|---|---|---|---|
No. | Type | Explanatory Variables | Type | Explanatory Variables |
1 | Equation (9) | DOY | Equation (9) | DOY |
2 | Neural Network | DOY | Equation (9) | DOY |
3 | Neural Network | TOD | Equation (9) | DOY |
4 | Neural Network | DOY-TOD | Equation (9) | DOY |
5 | Equation (9) | DOY | Neural Network | DOY |
6 | Equation (9) | DOY | Neural Network | TOD |
7 | Equation (9) | DOY | Neural Network | DOY-TOD |
8 | Equation (9) | DOY | Neural Network | Latent variable |
Repetition | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Case | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | Mean | Std |
1 | 0.089 | 0.09 | 0.089 | 0.095 | 0.089 | 0.09 | 0.091 | 0.094 | 0.095 | 0.089 | 0.091 | 0.002 |
2 | 0.17 | 0.174 | 0.171 | 0.17 | 0.175 | 0.171 | 0.17 | 0.17 | 0.171 | 0.174 | 0.172 | 0.002 |
3 | 0.101 | 0.104 | 0.101 | 0.101 | 0.102 | 0.103 | 0.107 | 0.102 | 0.1 | 0.1 | 0.102 | 0.002 |
4 | 0.175 | 0.171 | 0.104 | 0.103 | 0.099 | 0.105 | 0.102 | 0.103 | 0.105 | 0.106 | 0.117 | 0.028 |
5 | 0.262 | 0.262 | 0.26 | 0.263 | 0.263 | 0.256 | 0.261 | 0.26 | 0.261 | 0.262 | 0.261 | 0.002 |
6 | 0.193 | 0.144 | 0.188 | 0.125 | 0.143 | 0.157 | 0.138 | 0.13 | 0.136 | 0.139 | 0.149 | 0.022 |
7 | 0.129 | 0.163 | 0.138 | 0.165 | 0.151 | 0.137 | 0.196 | 0.149 | 0.147 | 0.125 | 0.150 | 0.020 |
8 | 0.132 | 0.108 | 0.113 | 0.136 | 0.212 | 0.254 | 0.111 | 0.12 | 0.11 | 0.112 | 0.141 | 0.048 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Duarte, O.; Duarte, J.E.; Rosero-Garcia, J. Data Imputation in Electricity Consumption Profiles through Shape Modeling with Autoencoders. Mathematics 2024, 12, 3004. https://doi.org/10.3390/math12193004
Duarte O, Duarte JE, Rosero-Garcia J. Data Imputation in Electricity Consumption Profiles through Shape Modeling with Autoencoders. Mathematics. 2024; 12(19):3004. https://doi.org/10.3390/math12193004
Chicago/Turabian StyleDuarte, Oscar, Javier E. Duarte, and Javier Rosero-Garcia. 2024. "Data Imputation in Electricity Consumption Profiles through Shape Modeling with Autoencoders" Mathematics 12, no. 19: 3004. https://doi.org/10.3390/math12193004
APA StyleDuarte, O., Duarte, J. E., & Rosero-Garcia, J. (2024). Data Imputation in Electricity Consumption Profiles through Shape Modeling with Autoencoders. Mathematics, 12(19), 3004. https://doi.org/10.3390/math12193004