Historical Hourly Information of Four European Wind Farms for Wind Energy Forecasting and Maintenance
Abstract
:1. Introduction
- Developing robust predictive models capable of handling the inherent variability of wind energy production.
- Identifying key patterns and factors that influence wind energy generation.
- Providing tools for early fault detection and optimizing preventive maintenance.
- Contributing to the scientific understanding of renewable energy forecasting.
2. Data Description
2.1. Metadata
- Site: A textual identifier for each wind energy production site, serving as a unique name or designation that distinguishes each location.
- Wind_generator_number: The total number of wind generators (turbines) installed at each site. This variable indicates the scale of the wind energy operation at each location, providing insight into the site’s capacity to generate electricity from wind.
- Capacity kWh: The total capacity of the wind energy site, measured in kilowatt-hours (kWh). This figure gives an indication of the maximum amount of electrical energy the site can produce under optimal conditions within a specific timeframe, highlighting the site’s contribution to the energy grid.
- ID: An additional numerical identifier for each wind energy site, likely used for internal tracking or database management purposes. It serves as an alternative reference to the site variable.
- Latitude: The geographical latitude of the wind energy site, expressed in decimal degrees. Latitude is a critical factor in determining the amount of solar exposure and potentially influences wind patterns at the site.
- Longitude: The geographical longitude of the wind energy site, also in decimal degrees. Longitude, along with latitude, helps to precisely locate the site on the globe, facilitating spatial analysis and the assessment of geographical influences on wind energy production.
- City: The name of the nearest city to the wind energy site. This variable provides a local context, helping to associate each site with a nearby urban area for logistical, administrative, and social considerations.
- State: The state, province, or regional administrative division where the wind energy site is located. This variable further specifies the site’s location within a country, offering insights into regional policies, wind energy incentives, and environmental conditions that might affect the site.
- Region: A broader geographical categorization that encompasses the site, often reflecting ecological, climatic, or administrative commonalities among sites within the same area.
- Country_iso: The ISO country code of the nation where the wind energy site is situated. ISO codes provide a standardized short-form representation of country names, facilitating data analysis and international comparisons.
- Country: The full name of the country hosting the wind energy site. This variable situates each site within a national context, highlighting the global distribution of wind energy operations and allowing for country-specific analyses of wind energy production.
Site | Wind_Generator _Number | Capacity kWh | ID | Latitude | Longitude | City | State | Region | Country_iso | Country |
---|---|---|---|---|---|---|---|---|---|---|
Canacoloma | 7 | 20,800 | 1 | 45.9993 | 6.6405 | Magland | Haute-Savoie | Auvergne- Rhône-Alpes | FR | France |
ElSasoG | 6 | 17,400 | 2 | 46.0845 | 6.7098 | Samoëns | Haute-Savoie | Auvergne- Rhône-Alpes | FR | France |
LasMajas | 20 | 38,000 | 3 | 41.97745 | −0.94812 | Castejón de Valdejasa | Zaragoza | Aragón | ES | Spain |
SierradeLuna | 8 | 21,400 | 4 | 45.6784 | 8.12859 | Valdilana | Biella | Piemonte | IT | Italy |
2.2. Data_Wind Dataset
- Timestamps (index): The dataset features 15,253 unique timestamps, capturing data at hourly intervals. Each timestamp follows the format YYYY-MM-DDTHH:MM: SS.000Z, providing a granular view of wind energy production dynamics over time.
- Site Identification (site): Data are aggregated from four distinct wind energy sites, each assigned a numerical identifier, allowing for site-specific analyses and comparisons.
- Energy Output: These values represent the amount of energy produced over each hourly period, ranging from −37.40 kWh to 29,148.28 kWh. Negative values may indicate periods of low wind activity or technical downtimes.
- Ambient Temperature (AmbientTemperature_value): Ambient temperatures vary significantly, from −5.54 °C to 39.06 °C, highlighting the diverse environmental conditions across the sites.
- Nacelle Angle (NacelleAngle_value): The angle of the nacelle is the structure at the top of the tower that houses all the internal components, such as the transmission system and the power generator. The value of this parameter is a crucial factor in optimizing wind energy capture. It ranges from 4.88° to 355°, reflecting the adjustments made to align with wind direction.
- Rotor Speed (RotorSpeed_value): Rotor speeds, an indicator of turbine activity, vary from 0 revolutions per minute or RPM (indicating standstill conditions) to 12.24 RPM, showcasing the operational speeds necessary for energy production, measured in rpm.
- Wind Direction (WindDirection_value): Wind directions recorded in the dataset range from 21.66° to 337.56°, providing insights into prevailing wind patterns at the sites, as measured in angles.
- Wind Speed (WindSpeed_value): Wind speeds are recorded for 14,439 instances, with measurements ranging from 0 m/s (calm conditions) to 27.17 m/s (indicating strong wind conditions), underscoring the variability of wind resources.
Index | Site | Energy_kWh | AmbientTemperature_Value | Nacelle Angle_Value | RotorSpeed _Value | WindDirection _Value | WindSpeed _Value |
---|---|---|---|---|---|---|---|
14 April 2021 15:00:00,000 | 1 | 370.159 | 14.122 | 102 | 7.408 | 195.625 | 12.2 |
14 April 2021 15:00:00,000 | 2 | 207.683 | 14.921 | 86.291 | 7.484 | 162.667 | 10.06 |
14 April 2021 15:00:00,000 | 3 | 376.644 | 15.668 | 100.212 | 7.075 | 193.450 | 3.92 |
12 May 2021 10:00:00,000 | 4 | 9313.650 | 11.806 | 232.311 | 8.298 | 168.891 | 10.1 |
3. Methods
3.1. Data Understanding
3.2. Data Preparation
- First, data ingestion was performed by retrieving the datasets from the company’s storage systems, ensuring that all available records were loaded into the pipeline. The data sources consisted of SCADA system logs and maintenance records, which were imported into CSV and SQL formats.
- Next, the two datasets were consolidated through an inner join operation in Dataiku, ensuring temporal alignment of the measurements. The join was performed on the “Timestamp” field, which was first standardized to a common format to avoid mismatches due to time zone differences or different timestamp representations.
- A data integrity check was then conducted to identify missing values and inconsistencies. Missing values in critical fields (such as wind speed or power output) were handled using forward-fill or interpolation techniques, while categorical missing values (such as turbine status) were replaced using the mode of the corresponding category.
- Unnecessary data were filtered out, and the units of measurement were standardized by converting MWh to kWh to ensure consistency. It was decided to convert the measurements from MWh to kWh by multiplying by 1000 for consistency, given that there were fewer measurements in MWh.
- To further enhance consistency, all numerical values were converted to float64 format, avoiding potential issues with integer division in subsequent processing steps.
- A pivot operation was applied to structure the data by turbine and date, facilitating analysis, and irrelevant columns were removed.
- Additionally, an outlier detection step was implemented using the interquartile range (IQR) method, where data points falling outside 1.5 times the IQR were flagged for review. This step was crucial in eliminating erroneous readings that could distort the predictive model.
- Categorical variables, such as turbine operational modes, were one-hot encoded to allow for their integration into machine learning models.
- The final dataset structure was optimized for understanding and manipulating the electric production metrics in relation to the specific characteristics of each wind turbine. Unnecessary columns were discarded to improve data quality
- Finally, the cleaned and structured dataset was exported in both CSV and Parquet formats, ensuring compatibility with a wide range of analytical tools.
3.3. Modeling
3.4. Evaluation
4. User Notes
- Optimal Turbine Control Strategy. Developing a model to determine the optimal settings for a wind turbine’s nacelle angle and rotor speed involves using advanced machine learning techniques like reinforcement learning or supervised learning algorithms (e.g., decision trees or gradient boosting machines). Trained on historical data, this model identifies the correlation between wind conditions and optimal turbine settings to maximize power output, ensuring that the turbine operates at peak efficiency, as we can see in Figure 4.
- Wind Speed and Direction Forecasting. Forecasting wind speed and direction at turbine sites is critical for short-term power generation, grid stability, and financial planning. Using time series forecasting models, such as Autoregressive Integrated Moving Average (ARIMA) or Long Short-Term Memory (LSTM), the system leverages historical wind data to predict future conditions accurately. This model helps optimize energy production and maintain a stable power supply. To aid in understanding, an interactive wind map is proposed, showcasing forecasted wind speeds and directions at various turbine locations. This tool supports informed decision-making by highlighting potential wind condition shifts and merging predictive insights with spatial and temporal data for proactive wind farm management.
- Energy Yield Optimization. This model analyzes data to find patterns and correlations between environmental factors (wind speed, direction, temperature) and energy yield. Using regression models or deep learning networks, it aims to identify the most efficient operating conditions for turbines. Advanced visualization techniques like scatter plots and heatmaps illustrate the optimal conditions for maximum energy production, providing actionable insights to enhance turbine efficiency. Figure 5 showcases the intricate relationship between power production, wind direction, and the angle of the turbine’s nacelle.
- Geospatial Analysis for Optimal Wind Farm Location. Geospatial analysis, potentially enhanced by machine learning, is used to evaluate historical wind data, terrain characteristics, and other environmental factors to find optimal wind farm locations. This analysis ensures strategic turbine placement for maximum efficiency and energy production. Maps serve as dynamic visualization tools, marked with color coding or symbols to denote site suitability. These maps, featuring wind patterns, terrain, and infrastructure overlays, offer a comprehensive view of each site’s advantages and limitations, facilitating informed decision-making for new wind farm installations that are efficient and sustainable.
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaría, J.; Fadhel, M.A.; Al-Amidie, M.; Farhan, L. Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. J. Big Data 2021, 8, 53. [Google Scholar] [CrossRef] [PubMed]
- Muñoz, C.Q.G.; Márquez, F.P.G. Wind energy power prospective. In Renewable Energies; Springer: Berlin/Heidelberg, Germany, 2018; pp. 83–95. [Google Scholar]
- Habib, A.; Hossain, M.J. Revolutionizing Wind Power Prediction—The Future of Energy Forecasting with Advanced Deep Learning and Strategic Feature Engineering. Energies 2024, 17, 1215. [Google Scholar] [CrossRef]
- Karaman, Ö.A. Prediction of Wind Power with Machine Learning Models. Appl. Sci. 2023, 13, 11455. [Google Scholar] [CrossRef]
- Rosende, S.B.; Sánchez-Soriano, J.; Muñoz, C.Q.G.; Andrés, J.F. Remote Management Architecture of UAV Fleets for Maintenance, Surveillance, and Security Tasks in Solar Power Plants. Energies 2020, 13, 5712. [Google Scholar] [CrossRef]
- Bloomfield, H.C.; Brayshaw, D.J.; Deakin, M.; Greenwood, D. Hourly historical and near-future weather and climate variables for energy system modelling. Earth Syst. Sci. Data 2022, 14, 2749–2766. [Google Scholar] [CrossRef]
- Borunda, M.; Ramírez, A.; Garduno, R.; García-Beltrán, C.; Mijarez, R. Enhancing Long-Term Wind Power Forecasting by Using an Intelligent Statistical Treatment for Wind Resource Data. Energies 2023, 16, 7915. [Google Scholar] [CrossRef]
- Muñoz, C.Q.G.; Márquez, F.P.G. Future maintenance management in renewable energies. In Renewable Energies; Springer: Berlin/Heidelberg, Germany, 2018; pp. 149–159. [Google Scholar]
- Gómez, C.Q.; García, F.P.; Villegas, M.A.; Pedregal, D.J. Big Data and Web Intelligence for Condition Monitoring: A Case Study on Wind Turbines. In Handbook of Research on Trends and Future Directions in Big Data and Web Intelligence; IGI Global Publishers: Hershey, PA, USA, 2015. [Google Scholar] [CrossRef]
- Ekinci, G.; Ozturk, H.K. Forecasting Wind Farm Production in the Short, Medium, and Long Terms Using Various Machine Learning Algorithms. Energies 2025, 18, 1125. [Google Scholar] [CrossRef]
- Wang, C.; Deng, C.; Horsey, H.; Reyna, J.L.; Liu, D.; Feron, S.; Cordero, R.R.; Song, J.; Jackson, R.B. CHUWD-H v1.0: A comprehensive historical hourly weather database for U.S. urban energy system modeling. Sci. Data 2024, 11, 1383. [Google Scholar] [CrossRef]
- Millstein, D.; Jeong, S.; Ancell, A.; Wiser, R. A database of hourly wind speed and modeled generation for US wind plants based on three meteorological models. Sci. Data 2023, 10, 883. [Google Scholar] [CrossRef]
- Stock-Williams, C.; Swamy, S.K. Automated daily maintenance planning for offshore wind farms. Renew. Energy 2019, 133, 1393–1403. [Google Scholar] [CrossRef]
- Carlos, S.; Sánchez, A.; Martorell, S.; Marton, I. Onshore wind farms maintenance optimization using a stochastic model. Math. Comput. Model. 2013, 57, 1884–1890. [Google Scholar] [CrossRef]
- Simani, S.; Farsoni, S. Fault diagnosis and sustainable control of wind turbines: Robust data-driven and model-based strategies. In Fault Diagnosis and Sustainable Control of Wind Turbines: Robust Data-Driven and Model-Based Strategies; Elsevier: Amsterdam, The Netherlands, 2018. [Google Scholar]
- Hoksbergen, N.; Akkerman, R.; Baran, I. The Springer Model for Lifetime Prediction of Wind Turbine Blade Leading Edge Protection Systems: A Review and Sensitivity Study. Materials 2022, 15, 1170. [Google Scholar] [CrossRef] [PubMed]
- Lu, P.; Ye, L.; Zhao, Y.; Dai, B.; Pei, M.; Tang, Y. Review of meta-heuristic algorithms for wind power prediction: Methodologies, applications and challenges. Appl. Energy 2021, 301, 117446. [Google Scholar] [CrossRef]
- Jiménez, A.A.; Muñoz, C.Q.G.; Márquez, F.P.G. Machine Learning and Neural Network for Maintenance Management. In Lecture Notes on Multidisciplinary Industrial Engineering; Springer: Berlin/Heidelberg, Germany, 2018. [Google Scholar] [CrossRef]
- Marugán, A.P.; Márquez, F.P.G.; Perez, J.M.P.; Ruiz-Hernández, D. A survey of artificial neural network in wind energy systems. Appl. Energy 2018, 228, 1822–1836. [Google Scholar] [CrossRef]
- Wang, S.; Qin, C.; Feng, Q.; Javadpour, F.; Rui, Z. A framework for predicting the production performance of unconventional resources using deep learning. Appl. Energy 2021, 295, 117016. [Google Scholar] [CrossRef]
- Azevedo, A.; Santos, M.F. KDD, SEMMA and CRISP-DM: A Parallel Overview. 2008. Available online: http://hdl.handle.net/10400.22/136 (accessed on 6 March 2025).
- Fayyad, U.; Piatetsky-Shapiro, G.; Smyth, P. The KDD process for extracting useful knowledge from volumes of data. Commun. ACM 1996, 39, 27–34. [Google Scholar] [CrossRef]
- Shaaban, A.G.; Khafagy, M.H.; Elmasry, M.A.; El-Beih, H.; Ibrahim, M.H. Knowledge discovery in manufacturing datasets using data mining techniques to improve business performance. J. Electr. Eng. Comput. Sci. 2022, 26, 1736–1746. Available online: https://www.researchgate.net/profile/Amani-Shaaban/publication/361086019_Knowledge_discovery_in_manufacturing_datasets_using_data_mining_techniques_to_improve_business_performance/links/63330630165ca2278778589a/Knowledge-discovery-in-manufacturing-datasets-using-data-mining-techniques-to-improve-business-performance.pdf (accessed on 6 March 2025). [CrossRef]
- Solano, J.A.; Cuesta, D.J.L.; Ibáñez, S.F.U.; Coronado-Hernández, J.R. Predictive models assessment based on CRISP-DM methodology for students performance in Colombia—Saber 11 Test. Procedia Comput. Sci. 2022, 198, 512–517. [Google Scholar] [CrossRef]
- Martinez-Plumed, F.; Contreras-Ochando, L.; Ferri, C.; Hernandez-Orallo, J.; Kull, M.; Lachiche, N.; Ramirez-Quintana, M.J.; Flach, P. CRISP-DM Twenty Years Later: From Data Mining Processes to Data Science Trajectories. IEEE Trans. Knowl. Data Eng. 2019, 33, 3048–3061. [Google Scholar] [CrossRef]
- IBM SPSS Modeler CRISP-DM Guide. Available online: https://www.ibm.com/docs/it/SS3RA7_18.3.0/pdf/ModelerCRISPDM.pdf (accessed on 17 March 2025).
- Jensen, K. Crisp-dm Ilustrration. Available online: https://es.m.wikipedia.org/wiki/Archivo:CRISP-DM_Process_Diagram.png (accessed on 17 March 2025).
- Saunders, L.J.; Russell, R.A.; Crabb, D.P. The Coefficient of Determination: What Determines a Useful R2 Statistic? Investig. Opthalmology Vis. Sci. 2012, 53, 6830–6832. [Google Scholar] [CrossRef]
- Wang, Z.; Bovik, A.C. Mean squared error: Love it or leave it? A new look at Signal Fidelity Measures. IEEE Signal Process. Mag. 2009, 26, 98–117. [Google Scholar] [CrossRef]
- Pelanek, R. Metrics for Evaluation of Student Models. J. Educ. Data Min. 2015, 7, 1–19. [Google Scholar]
- Ballı, S. Data analysis of Covid-19 pandemic and short-term cumulative case forecasting using machine learning time series methods. Chaos Solitons Fractals 2020, 142, 110512. [Google Scholar] [CrossRef] [PubMed]
HL | Units | LR | Batch Size | Dropout | L2 | L1 | evs | mae | mape | rmse | r2 |
---|---|---|---|---|---|---|---|---|---|---|---|
3 | 140 | 0.001 | 64 | 0.4 | 0.8 | 0.2 | 0.968 | 134.116 | 0.558 | 277.864 | 0.957 |
5 | 180 | 0.0001 | 64 | 0.2 | 0.4 | 0.2 | 0.969 | 133.649 | 0.602 | 276.398 | 0.956 |
7 | 100 | 0.001 | 64 | 0.4 | 0.4 | 0.2 | 0.956 | 148.897 | 0.592 | 288.345 | 0.954 |
9 | 100 | 0.0001 | 64 | 0.4 | 0.4 | 0.2 | 0.956 | 140.325 | 0.592 | 282.813 | 0.955 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sánchez-Soriano, J.; Paniagua-Falo, P.J.; Gómez Muñoz, C.Q. Historical Hourly Information of Four European Wind Farms for Wind Energy Forecasting and Maintenance. Data 2025, 10, 38. https://doi.org/10.3390/data10030038
Sánchez-Soriano J, Paniagua-Falo PJ, Gómez Muñoz CQ. Historical Hourly Information of Four European Wind Farms for Wind Energy Forecasting and Maintenance. Data. 2025; 10(3):38. https://doi.org/10.3390/data10030038
Chicago/Turabian StyleSánchez-Soriano, Javier, Pedro Jose Paniagua-Falo, and Carlos Quiterio Gómez Muñoz. 2025. "Historical Hourly Information of Four European Wind Farms for Wind Energy Forecasting and Maintenance" Data 10, no. 3: 38. https://doi.org/10.3390/data10030038
APA StyleSánchez-Soriano, J., Paniagua-Falo, P. J., & Gómez Muñoz, C. Q. (2025). Historical Hourly Information of Four European Wind Farms for Wind Energy Forecasting and Maintenance. Data, 10(3), 38. https://doi.org/10.3390/data10030038