AI–Big Data Analytics Platform for Energy Forecasting in Modern Power Systems
Abstract
1. Introduction
- Improved grid operations and management. Big Data Analytics identifies issues in a timely manner through real-time monitoring of grid performance, overloads, and voltage imbalances. It allows the prediction of potential failures by analyzing data from sensors and equipment, with maintenance proactively scheduled to minimize downtime and improve system reliability. It allows for the optimization of the dispatch of power generation resources through data analysis, ensuring a balance between supply and demand while minimizing costs and carbon emissions [10]. It helps maintain grid stability by predicting and managing fluctuations in energy supply and demand, which is important given the increasing integration of renewable energy sources [11].
- Improved energy forecasting and load management. Big Data Analytics helps predict future energy demand more accurately, allowing the electricity system to plan optimal resource allocation and reduce the risk of shortages or surpluses [12]. It allows for the analysis of customer consumption patterns, enabling the implementation of dynamic pricing strategies and demand response programs, shifting consumption away from peak hours and optimizing grid utilization. It enables the development of tailored energy solutions for each customer, promoting energy efficiency and cost savings [13].
- Improved efficiency and cost reduction. Big Data Analytics optimizes energy flow and reduces transmission losses by identifying energy losses in distribution networks [14]. It optimizes maintenance programs and minimizes outages by improving asset management, significantly reducing system operating and maintenance costs [15,16]. It allows for resource planning and investment decisions for electrical infrastructure by identifying energy consumption patterns and market trends [17].
- Enabling smart grids and integrating renewable energy. Big Data Analytics is a strategic partner for the development and operation of smart grids, facilitating two-way communication between power grids and customers and the integration of distributed energy resources [18]. It enables the management of the intermittency of renewable energy sources such as solar and wind, optimizing their integration into the grid and ensuring a reliable and stable energy supply. It facilitates efficient energy management and promotes the use of renewable energy to achieve decarbonization goals [19].
- First, a Big Data Analytics Platform to implement and automate intelligent models in electrical systems is proposed. This platform allows processing raw data from the electrical systems and transforming it into knowledge that adds value for operational and strategic decision-making.
- Second, a comparative analysis of statistical and machine learning models for electricity price forecasting is presented. For this purpose, two classical statistical models were evaluated: Autoregressive Integrated Moving Average (ARIMA) and Box–Cox transformation, ARIMA errors, Trend and Seasonal components (BATS), and seven ML models: Random Forest (RF), Gradient Boosting (GB), Light Gradient Boosting M (LGBM), Extreme Gradient Boosting (XGB), Support Vector Machine (SVM), artificial neural networks (ANN), and one DL model: long short-term memory (LSTM).
- Third, the implementation of the best forecasting model into the Big Data Analytics Platform to display the day-ahead electricity price forecast per node through dynamic graphic reports, providing a descriptive and prescriptive data analytics system for decision-making in operational time.
2. Big Data Platforms
3. Big Data Analytics Platform Proposed
- Apache Hadoop: A tool for repository integration (Data Lake) that enables distributed data storage and processing.
- Apache Sqoop: A tool for massive data ingestion from relational databases.
- Apache Hive: A tool for managing repositories in the data lake and querying information using SQL.
- Apache Spark: A distributed data processing engine with advanced processing capabilities, mainly used for stream processing.
- Apache Airflow: A tool for deploying and periodically running task executor agents.
- Anaconda Navigator: Data science tools for advanced data analysis and model development.
- It is an on-premises platform adapted to the information technology infrastructure of the electric company.
- It is a cost-effective solution, compared with the use of third-party commercial tools that require paid licenses and specialized hardware.
- It allows the integration of data from various information sources: structured, semi-structured, and unstructured.
- It uses a reduced and customized set of the Hadoop and Spark ecosystem, which allows the implementation of advanced analytics using ML on demand.
- It can scale horizontally (by adding more virtual machines) to expand the space and processing capacity of the data lake.
- It is a suitable alternative for integrating Big Data infrastructure into the power grids and leveraging distributed processing capabilities.
4. Materials and Methods for Electricity Price-Forecasting Models
4.1. Problem Analysis
- Energy Component, which represents the energy production cost calculated by CENACE.
- Congestion Component, which represents the cost derived from adding each additional megawatt to the grid due to transmission restrictions.
- Losses Component, which represents the cost caused by the increase in grid losses when supplying each additional megawatt.
4.2. Data Collection
4.3. Data Analysis
4.4. Pre-Processing
4.5. Training and Evaluation Software
5. Comparative Analysis of Electricity Price-Forecasting Models
5.1. Data Selection
5.2. Forecasting Model Training
5.3. Forecast Model Performance for Node 01AUO-115
5.4. Forecasting Models for Six Nodes
6. AI–Big Data Analytics Platform
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Liao, H.; Michalenko, E.; Vegunta, S.C. Review of Big Data Analytics for Smart Electrical Energy Systems. Energies 2023, 16, 3581. [Google Scholar] [CrossRef]
- Arroyo-Figueroa, G. Editorial: An overview of Applied Artificial Intelligence in Power Grids. Int. J. Comb. Optim. Probl. Inform. 2024, 15, 1–6. [Google Scholar] [CrossRef]
- Wang, Y.; Chen, Q.; Hong, T.; Kang, C. Review of Smart Meter Data Analytics: Applications, Methodologies, and Challenges. IEEE Trans. Smart Grid 2019, 10, 3125–3148. [Google Scholar] [CrossRef]
- Zhou, K.; Fu, C.; Yang, S. Big data driven smart energy management: From big data to big insights. Renew. Sustain. Energy Rev. 2016, 56, 215–225. [Google Scholar] [CrossRef]
- Jiang, H.; Wang, K.; Wang, Y.; Gao, M.; Zhang, Y. Energy big data: A survey. IEEE Access 2016, 4, 3844–3861. [Google Scholar] [CrossRef]
- Guerrero-Prado, J.S.; Alfonso-Morales, W.; Caicedo-Bravo, E.; Zayas-Pérez, B.; Espinosa-Reza, A. The Power of Big Data and Data Analytics for AMI Data: A Case Study. Sensors 2020, 20, 3289. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Y.; Huang, T.; Bompard, E.F. Big data analytics in smart grids: A review. Energy Inform. 2018, 1, 8. [Google Scholar] [CrossRef]
- Kezunovic, M.; Pinson, P.; Obradovic, Z.; Grijalva, S.; Hong, T.; Bessa, R. Big data analytics for future electricity grids. Electr. Power Syst. Res. 2020, 189, 106788. [Google Scholar] [CrossRef]
- Syed, D.; Zainab, A.; Ghrayeb, A.; Refaat, S.S.; Abu-Rub, H.; Bouhali, O. Smart Grid Big Data Analytics: Survey of Technologies, Techniques, and Applications. IEEE Access 2021, 9, 59564–59585. [Google Scholar] [CrossRef]
- Escobedo, G.; Jacome, N.; Arroyo-Figueroa, G. Big Data & Analytics to Support the Renewable Energy Integration of Smart Grids—Case Study: Power Solar Generation. In Proceedings of the 2nd International Conference on Internet of Things, Big Data and Security IoTBDS, Porto, Portugal, 24–26 April 2017; Volume 1, pp. 267–275. [Google Scholar] [CrossRef]
- Alhamrouni, I.; Abdul Kahar, N.H.; Salem, M.; Swadi, M.; Zahroui, Y.; Kadhim, D.J.; Mohamed, F.A.; Alhuyi Nazari, M. A Comprehensive Review on the Role of Artificial Intelligence in Power System Stability, Control, and Protection: Insights and Future Directions. Appl. Sci. 2024, 14, 6214. [Google Scholar] [CrossRef]
- Seyedan, M.; Mafakheri, F. Predictive big data analytics for supply chain demand forecasting: Methods, applications, and research opportunities. J. Big Data 2020, 7, 53. [Google Scholar] [CrossRef]
- Mohanty, A.; Ramasamy, A.K.; Verayiah, R.; Bastia, S.; Dash, S.S.; Elahi, M.; Soudagar, M.; Khan, T.M.Y.; Cuce, E. Smart grid and application of big data: Opportunities and challenges. Sustain. Energy Technol. Assess. 2024, 71, 104011. [Google Scholar] [CrossRef]
- Barja-Martinez, S.; Aragüés-Peñalba, M.; Munné-Collado, Í.; Lloret-Gallego, P.; Bullich-Massagué, E.; Villafafila-Robles, R. Artificial intelligence techniques for enabling Big Data services in distribution networks: A review. Renew. Sustain. Energy Rev. 2021, 150, 111459. [Google Scholar] [CrossRef]
- Huang, L. Intelligent Condition Monitoring and Fault Diagnosis of Generator based on Internet of Things and Big Data Technology. In Proceedings of the 2023 IEEE 13th International Conference on Electronics Information and Emergency Communication (ICEIEC), Beijing, China, 14–16 July 2023; pp. 85–89. [Google Scholar]
- Wang, X.; Duan, Z. Application of Artificial Intelligence Technology in Power Equipment Condition Prediction and Maintenance. In Proceedings of the International Conference on Power, Electrical Engineering, Electronics and Control (PEEEC), Athens, Greece, 25–27 September 2023; pp. 86–90. [Google Scholar] [CrossRef]
- Kaytez, F.; Taplamacioglu, M.C.; Cam, E.; Hardalac, F. Forecasting electricity consumption: A comparison of regression analysis, neural networks and least squares support vector machines. Int. J. Electr. Power Energy Syst. 2015, 67, 431–438. [Google Scholar] [CrossRef]
- Ahmad, T.; Madonski, R.; Zhang, D.; Huang, C.; Mujeeb, A. Data-driven probabilistic machine learning in sustainable smart energy/smart energy systems: Key developments, challenges, and future research opportunities in the context of smart grid paradigm. Renew. Sustain. Energy Rev. 2022, 160, 112128. [Google Scholar] [CrossRef]
- Diamantoulakis, P.D.; Kapinas, V.M.; Karagiannidis, G.K. Big Data Analytics for Dynamic Energy Management in Smart Grids. Big Data Res. 2015, 2, 94–101. [Google Scholar] [CrossRef]
- Hong, T.; Pinson, P.; Wang, Y.; Weron, R.; Yang, D.; Zareipour, H. Energy Forecasting: A Review and Outlook. J. Power Energy 2020, 7, 376–388. [Google Scholar] [CrossRef]
- Kuster, C.; Rezgui, Y.; Mourshed, M. Electrical load forecasting models: A critical systematic review. Sustain. Cities Soc. 2017, 35, 257–270. [Google Scholar] [CrossRef]
- Ren, Y.; Suganthan, P.N.; Srikanth, N. Ensemble methods for wind and solar power forecasting—A state-of-the-art review. Renew. Sustain. Energy Rev. 2015, 50, 82–91. [Google Scholar] [CrossRef]
- Klyuev, R.V.; Morgoev, I.D.; Morgoeva, A.D.; Gavrina, O.A.; Martyushev, N.V.; Efremenkov, E.A.; Mengxu, Q. Methods of Forecasting Electric Energy Consumption: A Literature Review. Energies 2022, 15, 8919. [Google Scholar] [CrossRef]
- Lago, J.; Marcjasz, G.; Schutter, B.; Weron, R. Forecasting day-ahead electricity prices: A review of state-of-the-art algorithms, best practices and an open-access benchmark. Appl. Energy 2021, 293, 116983. [Google Scholar] [CrossRef]
- Lago, J.; De Ridder, F.; De Schutter, B. Forecasting spot electricity prices: Deep learning approaches and empirical comparison of traditional algorithms. Appl. Energy 2018, 221, 386–405. [Google Scholar] [CrossRef]
- Yang, Z.; Ce, L.; Lian, L. Electricity price forecasting by a hybrid model, combining wavelet transform, ARMA and kernel-based extreme learning machine methods. Appl. Energy 2017, 190, 291–305. [Google Scholar] [CrossRef]
- Kılıç, D.K.; Nielsen, P.; Thibbotuwawa, A. Intraday Electricity Price Forecasting via LSTM and Trading Strategy for the Power Market: A Case Study of the West Denmark DK1 Grid Region. Energies 2024, 17, 2909. [Google Scholar] [CrossRef]
- Dudek, G. A Comprehensive Study of Random Forest for Short-Term Load Forecasting. Energies 2022, 15, 7547. [Google Scholar] [CrossRef]
- Zhao, X.; Li, Q.; Xue, W.; Zhao, Y.; Zhao, H.; Guo, S. Research on Ultra-Short-Term Load Forecasting Based on Real-Time Electricity Price and Window-Based XGBoost Model. Energies 2022, 15, 7367. [Google Scholar] [CrossRef]
- Narajewski, M. Probabilistic forecasting of German electricity imbalance prices. Energies 2022, 15, 4976. [Google Scholar] [CrossRef]
- O’Connor, C.; Collins, J.; Prestwich, S.; Visentin, A. Electricity Price Forecasting in the Irish Balancing Market. Energy Strategy Rev. 2024, 54, 101436. [Google Scholar] [CrossRef]
- Zahid, M.; Ahmed, F.; Javaid, N.; Abbasi, R.A.; Zainab Kazmi, H.S.; Javaid, A.; Bilal, M.; Akbar, M.; Ilahi, M. Electricity price and load forecasting using enhanced convolutional neural network and enhanced support vector regression in smart grids. Electronics 2019, 8, 122. [Google Scholar] [CrossRef]
- Heidarpanah, M.; Hooshyaripor, F.; Fazeli, M. Daily electricity price forecasting using artificial intelligence models in the Iranian electricity market. Energy 2023, 263, 126011. [Google Scholar] [CrossRef]
- Sarnovsky, M.; Bednar, P.; Smatana, M. Big Data Processing and Analytics Platform Architecture for Process Industry Factories. Big Data Cogn. Comput. 2018, 2, 3. [Google Scholar] [CrossRef]
- Chen, C.L.P.; Zhang, C.-Y. Data-intensive applications, challenges, techniques and technologies: A survey on Big Data. Inf. Sci. 2014, 275, 314–347. [Google Scholar] [CrossRef]
- Pravin, A.K.; Dhawale, G.; Kumbhar, S.; Patil, U.; Magdum, P. A comprehensive review: Machine learning and its application in integrated power system. Energy Rep. 2021, 7, 5467–5474. [Google Scholar] [CrossRef]
- El-Afifi, M.I.; Sedhom, B.E.; Eladl, A.A.; Padmanaban, S. Survey of technologies, techniques, and applications for big data analytics in smart energy hub. Energy Strategy Rev. 2024, 56, 101582. [Google Scholar] [CrossRef]
- Ajah, I.A.; Nweke, H.F. Big Data and Business Analytics: Trends, Platforms, Success Factors and Applications. Big Data Cogn. Comput. 2019, 3, 32. [Google Scholar] [CrossRef]
- Nambiar, A.; Mundra, D. An Overview of Data Warehouse and Data Lake in Modern Enterprise Data Management. Big Data Cogn. Comput. 2022, 6, 132. [Google Scholar] [CrossRef]
- Buyya, R.; Calheiros, R.N.; Dastjerdi, A.V. Big Data Principles and Paradigmes; Morgan Kaufmann: San Francisco, CA, USA, 2023. [Google Scholar]
- Escobedo, G.; Jacome, N.; Arroyo-Figueroa, G. Design of a Technology Management Infrastructure for Large Volumes of Data in an Intelligent Power Network. Res. Comput. Sci. 2016, 122, 113–126. [Google Scholar] [CrossRef]
- Amaya-Sanchez, Q.; Argumedo, M.J.D.M.; Aguilar-Lasserre, A.A.; Reyes Martinez, O.A.; Arroyo-Figueroa, G. Fault Diagnosis in Power Generators: A Comparative Analysis of Machine Learning Models. Big Data Cogn. Comput 2024, 8, 145. [Google Scholar] [CrossRef]
- Programa Sectorial de Energía 2020–2024. Secretaria de Energia (SENER). 2024. Available online: https://www.gob.mx/cms/uploads/attachment/file/562631/PS_SENER_CACEC-DOF_08-07-2020.pdf (accessed on 23 May 2024).
- Sistema de Información del Mercado (SIM). Centro Nacional de Control de Energía (CENACE). 2024. Available online: https://www.gob.mx/cenace/acciones-y-programas/sistema-de-informacion-de-mercado-sim (accessed on 23 May 2024).
- Nielsen, A. Practical Time Series Analysis: Prediction with Statistics and Machine Learning. Oreilly Media Inc.: Sebastopol, CA, USA, 2019. [Google Scholar]
- Anaconda Software Distribution. 2025. Conda (Version 3-13.5). Available online: https://anaconda.org/anaconda/python (accessed on 27 July 2024).
- Oliphant, T.E. A Guide to NumPy. 2006. Volume 1. Available online: https://web.mit.edu/dvp/Public/numpybook.pdf (accessed on 27 July 2024).
- McKinney, W. Pandas: A foundational python library for data analysis and statistics. Python High Perform. Sci. Comput. 2011, 14, 1–9. [Google Scholar]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Mystakidis, A.; Koukaras, P.; Tsalikidis, N.; Ioannidis, D.; Tjortjis, C. Energy Forecasting: A Comprehensive Review of Techniques and Technologies. Energies 2024, 17, 1662. [Google Scholar] [CrossRef]
- St-Aubin, P.; Agard, B. Precision and Reliability of Forecasts Performance Metrics. Forecasting 2022, 4, 882–903. [Google Scholar] [CrossRef]









| Big Data Platforms | Main Characteristics | Type |
|---|---|---|
| Apache Hadoop | An open-source framework that enables distributed processing of massive datasets across clusters. Hadoop provides a scalable and cost-effective solution for storing, processing, and analyzing massive amounts of structured and unstructured data. | Open source |
| Apache Spark | A unified analytics engine for batch processing, streaming data, machine learning, and graph processing. | Open source |
| Google Cloud BigQuery | A powerful and accessible platform for organizations to unify data, connect it to AI, and automate data tasks, which provides a fully managed and serverless data warehouse solution. | Commercial |
| Amazon EMR (MapReduce) | A managed cluster platform from AWS for processing and analyzing large datasets using open source Big Data frameworks. | Commercial |
| Microsoft Azure HDInsight | Provides a fully managed cloud analytics service for processing and analyzing large datasets using open-source platforms in the Azure environment. | Commercial |
| Cloudera | A comprehensive suite of tools and services based on open source Big Data frameworks designed to manage and analyze large volumes of data. | Commercial |
| IBM InfoSphere BigInsights | An enterprise-focused Apache Hadoop platform that offers a range of tools to manage and analyze large volumes of structured as well as unstructured data in a reliable manner. | Commercial |
| Databricks | A platform built on Apache Spark that simplifies the process of building, deploying, and managing big data and machine learning workflows by providing a cloud-based environment. | Commercial |
| Node Code | Voltage Level (kV) | Transmission Zone | Characteristics |
|---|---|---|---|
| 01AUO-115 | 115 | Central | Central regional control center node that presents negative congestion in most of the data. |
| 07SAF-115 | 115 | Baja California Sur | Baja California Sur regional control center node that presents zero congestion in most cases. |
| 01TTH-230 | 230 | Noreste | Northeast regional control center node that presents zero congestion in most cases, followed by positive congestion. |
| 08SLC-230 | 230 | Peninsular | Eastern regional control center node that presents positive congestion in most of the data. |
| 03AGM-400 | 400 | Occidental | Western regional control center node that presents zero congestion in most cases, followed by negative congestion. |
| 02CBE-400 | 400 | Oriental | Eastern regional control center node that presents zero congestion in most cases, followed by negative congestion. |
| Model | Parameter | Value Range |
|---|---|---|
| ARIMA | order | p = 3, d = 0, q = 4 |
| BATS | Seasonal period | 120 |
| RFRegressor | n_estimators | 200 |
| random_state | 42 | |
| GBRegressor | n_estimators, | 200 |
| max_depth, | 5 | |
| random_state | 42 | |
| LGBMRegressor | n_estimators, | 200 |
| max_depth, | 5 | |
| random_state | 42 | |
| XGBRegressor | n_estimators, | 200 |
| max_depth, | 5 | |
| base_score | 0.5 | |
| SVR | kernel= | rbf |
| C = 100 | 100 | |
| epsilon | 0.5 | |
| ANN-MLP | Activation function | ReLU |
| Optimizer | Adam | |
| LSTM | Learning_rate | 0.001–0.4 |
| Epochs | 100 | |
| batch_size | 32 | |
| Activation function | ReLU | |
| Optimizer | Adam |
| Model | MAE | RMSE | MAPE | R2 |
|---|---|---|---|---|
| ARIMA | 57.36 | 76.18 | 6.25 | 0.902412 |
| BATS | 27.49 | 34.09 | 3.15 | 0.984756 |
| RF | 7.09 | 15.69 | 2.38 | 0.991265 |
| GB | 4.31 | 9.38 | 1.51 | 0.996879 |
| LGBM | 8.85 | 19.27 | 3.13 | 0.986819 |
| XGB | 4.35 | 10.46 | 1.41 | 0.996114 |
| SVR | 2.95 | 5.66 | 1.21 | 0.998864 |
| ANN | 34.19 | 91.19 | 10.20 | 0.700000 |
| LSTM | 7.59 | 18.77 | 2.60 | 0.990568 |
| Node | Model | MAE | RMSE | MAPE | R2 |
|---|---|---|---|---|---|
| 01AUO-115 | SVR | 2.95 | 5.66 | 1.21 | 0.998864 |
| 07SAF-115 | XGB | 3.24 | 4.51 | 1.15 | 0.999775 |
| 01TTH-230 | SVR | 1.87 | 4.48 | 0.8 | 0.999239 |
| 04PLD-230 | XGB | 110.19 | 651.33 | 2.99 | 0.897721 |
| 03AGM-400 | SVR | 1.64 | 3.11 | 0.94 | 0.999547 |
| 02CBE-400 | XGB | 44.39 | 342.50 | 2.44 | 0.927826 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Santos-Dominguez, M.; Hernandez Flores, N.; Parra-Ramirez, I.A.; Arroyo-Figueroa, G. AI–Big Data Analytics Platform for Energy Forecasting in Modern Power Systems. Big Data Cogn. Comput. 2025, 9, 272. https://doi.org/10.3390/bdcc9110272
Santos-Dominguez M, Hernandez Flores N, Parra-Ramirez IA, Arroyo-Figueroa G. AI–Big Data Analytics Platform for Energy Forecasting in Modern Power Systems. Big Data and Cognitive Computing. 2025; 9(11):272. https://doi.org/10.3390/bdcc9110272
Chicago/Turabian StyleSantos-Dominguez, Martin, Nicasio Hernandez Flores, Isaac Alberto Parra-Ramirez, and Gustavo Arroyo-Figueroa. 2025. "AI–Big Data Analytics Platform for Energy Forecasting in Modern Power Systems" Big Data and Cognitive Computing 9, no. 11: 272. https://doi.org/10.3390/bdcc9110272
APA StyleSantos-Dominguez, M., Hernandez Flores, N., Parra-Ramirez, I. A., & Arroyo-Figueroa, G. (2025). AI–Big Data Analytics Platform for Energy Forecasting in Modern Power Systems. Big Data and Cognitive Computing, 9(11), 272. https://doi.org/10.3390/bdcc9110272

