Comparative Analysis of Solar Radiation Forecasting Techniques in Zacatecas, Mexico

Escalona-Llaguno, Martha Isabel; Solís-Sánchez, Luis Octavio; Castañeda-Miranda, Celina L.; Olvera-Olvera, Carlos A.; Martinez-Blanco, Ma. del Rosario; Guerrero-Osuna, Héctor A.; Castañeda-Miranda, Rodrigo; Díaz-Flórez, Germán; Ornelas-Vargas, Gerardo

doi:10.3390/app14177449

Open AccessArticle

Comparative Analysis of Solar Radiation Forecasting Techniques in Zacatecas, Mexico

by

Martha Isabel Escalona-Llaguno

^†

,

Luis Octavio Solís-Sánchez

^*,†

,

Celina L. Castañeda-Miranda

^*

,

Carlos A. Olvera-Olvera

,

Ma. del Rosario Martinez-Blanco

,

Héctor A. Guerrero-Osuna

,

Rodrigo Castañeda-Miranda

,

Germán Díaz-Flórez

and

Gerardo Ornelas-Vargas

Laboratorio de Sistemas Inteligentes de Visión Artificial, Posgrado en Ingeniería y Tecnología Aplicada, Universidad Autonoma de Zacatecas, Zacatecas 98000, Mexico

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Appl. Sci. 2024, 14(17), 7449; https://doi.org/10.3390/app14177449 (registering DOI)

Submission received: 23 July 2024 / Revised: 14 August 2024 / Accepted: 20 August 2024 / Published: 23 August 2024

(This article belongs to the Topic Applied Computer Vision and Pattern Recognition: 2nd Volume)

Download

Browse Figures

Versions Notes

Abstract

:

Featured Application

Accurate solar radiation forecasting is essential for optimizing solar energy systems in Zacatecas, Mexico. Our comparative analysis identifies the most reliable methods for the region, enhancing solar power plant operations, energy grid management, agricultural planning, and climate studies. Implementing the best techniques can provide significant economic, environmental, and social benefits for Zacatecas and similar regions.

Abstract

This work explores the prediction of daily Global Horizontal Irradiance (GHI) patterns in the region of Zacatecas, Mexico, using a diverse range of predictive models, encompassing traditional regressors and advanced neural networks like Evolutionary Neural Architecture Search (ENAS), Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), and Meta’s Prophet. This work addressing a notable gap in regional research, and aims to democratize access to accurate solar radiation forecasting methodologies. The evaluations carried out using the time series data obtained by Comisión Nacional del Agua (Conagua) covering the period from 2015 to 2018 reveal different performances of the model in different sky conditions, showcasing strengths in forecasting clear and partially cloudy days while encountering challenges with cloudy conditions. Overall, correlation coefficients (r) ranged between 0.55 and 0.72, with Root Mean Square Error % (RMSE %) values spanning from 20.05% to 20.54%, indicating moderate to good predictive accuracy. This study underscores the need for longer datasets to bolster future predictive capabilities. By democratizing access to these predictive tools, this research facilitates informed decision-making in renewable energy planning and sustainable development strategies tailored to the unique environmental dynamics of the region of Zacatecas and comparable regions.

Keywords:

solar radiation forecasting; artificial neural networks; Zacatecas; predictions solar patterns; convolutional neural networks

1. Introduction

The transition towards renewable energy is a topic of significant relevance, with solar energy standing out as a promising option. However, the challenge of intermittency in its generation persists, which is evident in technologies such as photovoltaic panels [1]. Additionally, climatic and geographic factors hinder the effective integration of solar energy into conventional electrical grids [2,3]. To overcome these challenges, innovative solutions are being researched and developed, such as algorithms based on artificial intelligence for energy management, to improve solar energy’s reliability and efficiency as a primary renewable energy source [4].

In the city of Zacatecas, Mexico, research on solar radiation is limited due to several factors, despite its significant energy potential. The primary limitations include a lack of comprehensive and long-term data on solar radiation, which restricts the ability to develop and validate accurate predictive models. The lack of comprehensive studies on this topic in the region highlights the need to develop prediction models that consider local climate variability, such as cloud cover, incident radiation, and the position of the sun [5]. These models are essential to accurately estimate the amount of solar radiation available during specific periods, which is crucial to optimize the efficiency of photovoltaic systems and improve the management of solar energy generation and storage in Zacatecas [6]. This approach can contribute significantly to the development of a sustainable and resilient energy sector in the region, despite the limited previous research in this area.

Currently, there is a trend towards the construction of new photovoltaic power plants in the region. However, they have do not usually take advantage of innovative technologies, such as neural networks, to improve the accuracy of predictions. Neural networks are designed to analyze complex and non-linear data. They have the potential to capture local weather patterns and generate reliable forecasts of solar radiation in the city of Zacatecas [7]. By integrating historical daily Global Horizontal Irradiance (GHI) data that are specific to this area into models based on neural networks, it is possible to obtain a detailed and precise estimation of solar radiation, which is crucial for strategic planning and decision-making in the solar energy sector of the region [8].

Harnessing the power of neural networks could create new opportunities to improve the efficiency and profitability of solar energy systems in Zacatecas. The use of specialized models represents a promising innovation to address problems in predicting solar radiation [9]. This approach, based on machine learning and time series techniques, has proven effective in various fields and offers the ability to capture complex patterns in climate data [10]. However, successful implementation of the model requires access to accurate climate data and continuous validation of the results to ensure its reliability and utility in the region [10] and to address the challenge of accurately predicting solar radiation. This work explores the use of 15 different models, ranging from traditional statistical approaches to advanced machine learning techniques, to provide a comprehensive evaluation of their predictive capabilities. The selection includes persistence models, artificial neural networks (Recurrent Neural Networks (RNNs), Convolutional Neural Networks (CNNs), Evolutionary Neural Architecture Search (ENAS)), linear regressors, random forests, XGBoost, Support Vector Machines (SVMs), K-Nearest Neighbors (KNN), extra trees, AdaBoost, ElasticNet, Ridge, Lasso, and Meta’s Prophet. Each model brings unique strengths: for example, neural networks are well-suited for capturing non-linear relationships, while ensemble methods like random forests and XGBoost are robust against overfitting [11,12,13]. By comparing these diverse methodologies, we aim to identify the most effective approaches for the specific climatic and geographical context of Zacatecas.

Therefore, this research focuses on improving the prediction of solar radiation in Zacatecas using these various techniques, using historical data provided by Comisión Nacional del Agua (CONAGUA). The goal is to evaluate the advantages and limitations of each model to obtain accurate and reliable results, which is crucial to optimize the planning and management of solar energy systems in the region. This work will contribute to the development of a sustainable and efficient energy sector in the city of Zacatecas.

2. Materials and Methods

In this section, we detail the approach employed to enhance solar radiation prediction in Zacatecas, using historical data of GHI collected by CONAGUA from 2015 to 2018. The performance of various models was rigorously compared and evaluated using two key metrics: Root Mean Squared Error (RMSE %) and the coefficient of correlation (r).

The goal of RMSE % is to measure the accuracy of a prediction by expressing the prediction error as a percentage of the actual observed values, providing a normalized error metric that facilitates comparisons across different datasets or models [14]. In solar radiation forecasting, the aim is to minimize RMSE %, indicating that the predictions closely match actual GHI values. A lower RMSE % signifies a more accurate and reliable model, which is crucial for optimizing solar energy management. Meanwhile, r measures the strength and direction of the linear relationship between predicted and actual values, ranging from −1 to 1. In forecasting, achieving an r value close to 1 reflects a strong correlation, demonstrating the model’s effectiveness in capturing data patterns and ensuring reliable solar energy forecasts [15].

This study began with the implementation of a persistence model, which served as a baseline to compare the predictive performance of more advanced techniques. This model assumes that future values will be equal to the last observed value, providing a straightforward but valuable reference point.

Subsequently, more sophisticated methods were applied. A Recurrent Neural Network (RNN) [16] was developed to capture the complex nonlinear relationships inherent in solar radiation data. The network architecture was specifically tuned using a grid search to optimize hyperparameters, including the number of hidden layers and neurons per layer, to best fit the GHI time series data. Additionally, Meta’s Prophet model [17] was customized for the study by incorporating seasonal components specific to the Zacatecas region, which allowed it to effectively model local trends and cyclic patterns observed in the GHI data.

For comparison, classical machine learning algorithms were also employed, with a particular focus on regression techniques. Linear regression [18] was implemented using past GHI values as input features, capturing the linear trend over time. More complex models like random forests [12,19], XGBoost [13] and extra trees [20] were configured to handle the high-dimensional feature space generated from the time-lagged values of GHI. These ensemble methods were fine-tuned using cross-validation to select the optimal number of trees and maximum depth, ensuring the models could robustly predict future values despite the variability in solar radiation.

Additionally, supervised learning techniques like Support Vector Machines (SVMs) [21] and k-Nearest Neighbors (KNN) [22] were adapted by constructing feature sets based on past GHI values to enhance the capture of local solar radiation variability. Boosting techniques, particularly Adaboost [23], were also explored, focusing on minimizing prediction errors by iteratively refining weaker learners.

To address regularization and feature selection challenges, ElasticNet [24], Ridge [25], and lasso [26] regression models were utilized. These methods were particularly effective in handling multicollinearity among the time-lagged features and ensuring model generalization by balancing between feature selection and regularization.

Finally, advanced approaches like Evolutionary Neural Architecture Search (ENAS) [27] were used to automatically design optimized neural network architectures tailored to the specific characteristics of local solar radiation data. Furthermore, Convolutional Neural Networks (CNNs) [16] were implemented to exploit temporal dependencies in the GHI time series, capturing intricate patterns through the use of convolutional layers applied to sequences of past observations.

2.1. Experimental Data

The historical data of GHI were obtained from observations taken every 15 min at the Zacatecas Meteorological Observatory (OMZ) during the measurement period from 2015 to 2018. The meteorological station operates at 2650 m above the mean sea level. The selection of this period is justified by the limited availability of continuous and reliable data beyond this time frame. The interruption of long-term measurements and difficulties in accessing various local databases have been limiting factors. To ensure the quality of the irradiance data used in this study, several quality tests were applied to ensure the reliability and consistency of the measurements [28].

To explore how cloud cover affects the accuracy of time series forecasts, we classified each day into three categories based on the sky’s conditions, clear, partly cloudy, and cloudy, using the daily clear-sky index,

K_{c} (t)

, which relates the measured solar radiation,

H (t)

, to ideal clear-sky radiation. To simplify and keep

H (t)

as the only measured variable, we applied the most basic daily model of [29] to estimate solar radiation in clear skies,

H_{c s} (t)

. This allowed us to calculate the average daily

K_{c}

index, as shown in Equation (1). This approach helps us understand how variations in the sky conditions affect forecast accuracy using validated methods to characterize solar radiation in different atmospheric environments [30,31].

K_{c} = \frac{H}{H_{c s}}

(1)

Using the index

K_{c} (t)

, it was normalized to obtain a uniform distribution for the impact of cloudiness. The maximum value of

K_{c} (t)

is 1, which correspond to perfectly clear skies. The minimum value of

K_{c} (t)

is 0, indicating completely obscured skies. Thus, the ranges of values are defined for the three conditions:

Cloudy day:

K_{c} (t)

< 0.33

Partly cloudy day:

0.33 \leq K_{c} (t)

< 0.66

Clear day:

0.66 \leq K_{c} (t)

Table 1 shows the number of days that belong to each condition, using the previously defined ranges.

2.2. Persistence Model

The persistence model has established itself as a solid basis for the prediction of solar radiation, being widely used in studies related to meteorology [32]. This approach, although simple, has proven useful by assuming that solar energy in the future will be equal to the last measurement obtained. Mathematically, this concept is described directly and understandably [33]. By providing an initial reference for comparison with other more complex models, the persistence model plays an important role in the evaluation and improvement of solar radiation prediction techniques [34]. The equation for this model is as follows:

H (t + 1) = H (t)

(2)

In Equation (2), H(t + 1) represents the forecasted value; meanwhile, H(t) represents the previous value.

The persistence model serves as a foundational baseline for assessing solar radiation forecasting methodologies. This approach provides a straightforward method to establish initial predictions without additional variables or complex computations by capturing basic temporal dependencies in solar radiation patterns.

2.3. Neural Network-Based Methods

2.3.1. Convolutional Neural Networks (CNNs)

The CNN operates by applying convolutional filters across the GHI data. Each filter captures unique local characteristics, like rapid shifts or daily trends, as it slides through the temporal sequence. Subsequent convolutional layers combine and aggregate these features to construct a hierarchical representation of the information, allowing the CNN to discern complex spatial and temporal patterns [35]. This approach enables CNN to effectively capture relationships within GHI data, facilitating precise predictions based on automatically identified patterns [16,36]. Figure 1 graphically presents the use of the CNN for time series prediction.

2.3.2. Recurrent Neural Networks (RNNs)

RNNs are a type of neural network designed to handle sequential data, such as time series. Unlike traditional neural networks, RNNs have feedback connections that allow them to retain previous states. This makes them ideal for predicting patterns in temporal data like daily GHI. RNNs utilize Long Short-Term Memory (LSTM) units to capture long-term dependencies and nonlinear trends in the data [37,38].

RNNs propagate information through memory units across a sequence of data, such as daily solar radiation records. Each unit in the network, equipped with LSTM, remembers and updates its internal state based on the current input and previously stored information. This enables the RNN to capture complex temporal dependencies, such as daily and seasonal variations in GHI, dynamically adapting to different patterns as it processes each sequential data point [37]. Figure 2 illustrates the comparison between the compressed (left) and expanded (right) configuration of a basic RNN.

2.3.3. Evolutionary Neural Architecture Search (ENAS)

ENAS is an advanced technique that automates the search and optimization of neural network architectures. It works by evaluating multiple network configurations and automatically adjusting hyperparameters to find the most efficient and accurate structure for a specific dataset [27]. ENAS uses evolutionary strategies to explore and select architectures capable of capturing complex patterns in GHI data, gradually improving their performance through iterations.

ENAS employs an optimization approach based on evolutionary algorithms to automatically enhance neural network architectures. It functions by generating and evaluating a series of candidate architectures, adjusting their components and hyperparameters across multiple iterations.

For training ENAS, an educational open-source codebase was employed [39]. The initial parameters included a starting population of 100 individuals, each randomly generated, which evolved over 10 generations to robustly explore the search space. A multilayer perceptron was implemented using MATLAB’s^® NarNet version 5 [40], aiming to optimize the number of neurons in the hidden layer and lag length while maximizing r on the test dataset. The evolutionary algorithm hyperparameters included an elitism rate of 30% to preserve the top individuals across generations, with crossover and mutation probabilities set at 90% and 20%, respectively, to enhance the population diversity. Following optimization, the best architecture was identified, trained, and tested using data from the final year (2018). Figure 3 presents the basic architecture of a multilayer perceptron (MLP).

2.4. Regression Models

The regression models used in this work include linear regression, random forest, XGBoost, Support Vector Machine (SVM), K-Nearest Neighbors (KNN), extra trees, Adaboost, ElasticNet, ridge regression, and lasso regression. Each model offers unique advantages and is suited for different types of data and modeling scenarios.

Linear regression, a conventional method in statistical modeling, assumes a linear relationship between the GHI and the predicted values, making it both interpretable and simple to implement [41]. In contrast, ensemble methods like random forests [19] and extra trees combine predictions from multiple decision trees, helping to mitigate overfitting and capture complex interactions within the GHI data through randomized tree construction processes [19,42,43].

XGBoost, an optimized gradient-boosting algorithm, sequentially builds decision trees to improve predictive accuracy by focusing on previously underperforming predictions, refining the model with each iteration [13]. SVM, on the other hand, constructs hyperplanes in a high-dimensional space to maximize the margin between the predicted and actual GHI values, making it suitable for datasets with distinct separations [21].

KNN, a non-parametric method, predicts GHI values by averaging those of its nearest neighbors, making it an intuitive and effective tool for recognizing local patterns within the data. Adaboost, another ensemble method, sequentially adjusts the weights of weak learners based on their performance, improving overall prediction accuracy by focusing on GHI values that were previously mispredicted [23].

Ridge regression mitigates the effects of multicollinearity by shrinking coefficients toward zero, particularly in datasets with highly correlated predictors. In contrast, lasso regression promotes sparsity by eliminating less-significant predictors, enhancing model interpretability and reducing complexity [26]. ElasticNet combines the penalties of lasso and ridge regression to balance feature selection and coefficient regularization, which is particularly useful when dealing with multicollinearity in the GHI data [11].

Each regression model was iteratively optimized during training using techniques such as cross-validation and grid search to identify the optimal hyperparameters, ensuring robust performance. The 2018 dataset served as a test set to evaluate and compare the model’s predictive accuracy using metrics like RMSE % and r. This comprehensive approach allowed for the selection of the most suitable regression model for accurate GHI forecasting in Zacatecas under various sky conditions.

2.5. Meta’s Prophet

Meta’s Prophet has been established as a highly effective tool in predicting time series data, including solar radiation, and has demonstrated its ability to automatically adapt to the non-linear trends and seasonal patterns present in the data [44]. Unlike neural network-based models, which often require a delicate initial setup, Prophet uses a flexible and adaptable approach that eliminates the need for manual parameter adjustments. This allows users to effectively capture long-term variations, as well as annual, weekly, and daily seasonalities in GHI data [17]. Additionally, Prophet has proven to be robust to missing data and outliers, contributing to greater reliability in the foretasted predictions.

Another notable advantage of using this model is its accessibility and simplicity in the implementation process. The code needed to apply the model is completely free and available for download, removing the barriers associated with expensive software or licenses [45]. This means that any interested researcher or professional can use this model without incurring additional costs. Additionally, being open-source and publicly available, the documentation and supporting resources are usually well-structured and easily accessible [17]. This significantly simplifies the process of learning and utilizing the model, as users only need to follow the instructions provided on the author’s page to start using it effectively [17].

3. Results

In this section, we present the results of our comprehensive study on the prediction of daily GHI patterns in Zacatecas using a diverse array of regression and neural network models. The findings from our study highlight the importance of selecting appropriate forecasting models for predicting daily solar radiation patterns in Zacatecas. The use of both traditional and advanced techniques allowed for a comprehensive evaluation of their effectiveness in capturing the complexities of GHI data. The correlation coefficient and RMSE % metrics provided a robust framework for comparing model performance, emphasizing the need for accurate and reliable prediction methods. This comparative analysis underscores that while traditional regression models offer a baseline, advanced techniques, including regression methods and neural networks, are crucial for addressing the inherent challenges of forecasting in variable environmental conditions. The diverse array of models tested in this study demonstrates that no single approach universally excels, but rather, the choice of model should be guided by its ability to handle specific data characteristics and forecasting requirements. These insights contribute to a deeper understanding of model performance in GHI prediction and offer valuable guidance for future research and practical applications in similar regions. Each model and technique, including traditional methods such as linear regression, random forest, XGBoost, SVM, KNN, extra trees, Adaboost, ElasticNet, ridge, lasso, and advanced approaches like ENAS, CNN, RNN, and Meta’s Prophet, was meticulously trained and optimized to find the best configuration of their respective hyperparameters. This rigorous approach aimed to uncover the most effective models for accurately forecasting GHI levels under the different environmental conditions observed in Zacatecas from 2015 to 2018. We utilized three years for training and validation across each of the models employed in this study. This method ensured a fair and robust comparison of the models in forecasting daily solar radiation levels in Zacatecas.

When evaluating the forecasting models, it became apparent that their performance varied significantly across different weather conditions. The models were tested across a range of scenarios, including clear, partly cloudy, and cloudy conditions. The results indicate that while the models generally performed well under clear and partly cloudy conditions, their accuracy diminished under cloudy conditions. This disparity is attributed to the limited number of cloudy days in the dataset, which restricts the models’ ability to generalize effectively to such scenarios. This variability in performance highlights the need for further refinement and potential augmentation of the models to better handle less-frequent weather conditions. The RMSE % results are displayed in Figure 4, showcasing the performance of each model under the defined conditions.

Due to the limited number of cloudy days, the models struggled to generalize accurately; however, they demonstrated excellent performances under other weather conditions. Table 2 presents the evaluation metrics and the actual performance of each model.

While overall, the models performed well, ENAS demonstrated the best performance across the varied conditions. RNN excelled in predicting clear days, while linear regression stood out for partially cloudy days. The persistence model showed the lowest error percentage on cloudy days, primarily due to the limited occurrence of cloudy days (<9%), which challenges more sophisticated methods when learning their characteristics.

Figure 5 illustrates scatter plots comparing forecasted and measured outcomes for ENAS (best) and Prophet (worst), segmented according to three daily cloudiness categories. The ENAS predictions show a notable alignment with measured values, particularly on days categorized as partially cloudy. In contrast, both the ENAS and Prophet models demonstrated substantial overestimation for cloudy days. This discrepancy highlights the challenge of accurately predicting GHI under heavily clouded conditions, where the models struggled to adjust for the reduced solar input effectively.

Figure 6 shows the GHI time series predicted one day in advance by the Prophet, ENAS, and Persistence (baseline) models. There was generally good agreement between the predicted and actual values. Although Prophet occasionally yielded less-accurate results, this could potentially be mitigated by retraining the models with longer time series data. Future work will incorporate larger data sets, with the goal of improving model accuracy. The persistence model demonstrates why, despite being an extremely simple model, it remains an efficient baseline for predicting GHI.

Despite the initial differences between each model, particularly during the optimization of their hyperparameters, the resulting performance metrics showed consistency. Figure 7 illustrates the behavior of r, indicating a stable trend hovering around 0.70 across the various models. This stability suggests that, despite model variations and tuning efforts, their ability to predict solar radiation remains relatively consistent within this range of correlation. Regarding the RMSE %, we observed a more unpredictable behavior, with values ranging between 20% and 25%. It is interesting to note that under all sky conditions, the ENAS model showed notable robustness, standing out for its more consistent performance and lower variability compared to the other atmospheric conditions evaluated. This finding suggests that model accuracy can vary significantly depending on environmental conditions, underscoring the importance of considering climate variability when evaluating and selecting predictive models for applications related to solar energy and other areas that are sensitive to atmospheric conditions.

4. Conclusions

This work addresses a significant gap in the research landscape of GHI prediction in Zacatecas, where such comprehensive investigations are scarce. By employing a variety of predictive models ranging from traditional regressors to advanced neural networks, we have democratized access to accurate GHI forecasting techniques. This diversity not only enhances the reliability of predictions but also provides stakeholders with a range of tools to make informed decisions regarding renewable energy strategies and agricultural planning in the region.

Moving forward, future research efforts in Zacatecas and similar regions would greatly benefit from access to longer and high-quality datasets. The limited availability of extensive and reliable data has been a significant challenge in this study, particularly affecting the accuracy of predictions under certain weather conditions such as cloudy days. Longer time series and improved data quality would enable models to capture more nuanced patterns and variations in solar radiation, thereby enhancing their predictive capabilities and robustness in diverse environmental scenarios.

Expanding the forecast horizon for GHI prediction is a critical area for future work due to its potential to significantly enhance the planning and management of solar energy resources. While short-term forecasts (1–2 days ahead) are highly accurate and valuable for immediate operational decisions, extending the forecast horizon could provide substantial benefits for long-term energy planning, grid stability, and the integration of renewable energy into the power supply.

For instance, accurate long-term solar radiation forecasts could enable better scheduling of maintenance for photovoltaic (PV) systems, optimization of energy storage solutions over longer periods, and improved forecasting of energy production for both utility-scale solar farms and distributed energy resources. Moreover, longer-term predictions could aid in strategic planning for energy trading and resource allocation, reducing the reliance on fossil fuels and enhancing the overall sustainability of the energy grid.

However, extending the forecast horizon presents challenges, including the increased uncertainty and complexity of predicting GHI over longer periods. Future research should focus on developing advanced modeling techniques, such as hybrid models that combine machine learning with physical-based approaches, as well as improving data assimilation methods that incorporate real-time observations to continually refine and update predictions. By addressing these challenges, the development of reliable long-term forecasts could greatly enhance the utility and impact of solar energy forecasting, contributing to more efficient and sustainable energy systems.

Moreover, the democratization of predictive models through this study highlights the potential for broader applications across various sectors in Zacatecas. By making these methodologies accessible and adaptable to local conditions, policymakers, energy analysts, and researchers can leverage these tools to optimize energy production, improve resource management, and foster sustainable development initiatives in the region. This approach not only supports informed decision-making but also lays the groundwork for future advancements in renewable energy technologies and climate adaptation strategies tailored to local needs.

While this study provides valuable insights into GHI prediction methodologies in Zacatecas, it underscores the critical need for ongoing efforts to expand data collection initiatives. By addressing these challenges, future studies can build upon this foundation, advancing our understanding and capabilities in solar energy forecasting and environmental management in the region and beyond.

Author Contributions

Methodology, M.I.E.-L.; validation, M.I.E.-L., L.O.S.-S. and C.L.C.-M.; formal analysis, M.I.E.-L., C.A.O.-O., L.O.S.-S., M.d.R.M.-B. and C.L.C.-M.; writing—review and editing, H.A.G.-O., R.C.-M. and G.D.-F.; supervision, C.L.C.-M., L.O.S.-S., G.O.-V. and R.C.-M.; project administration, M.I.E.-L., C.L.C.-M., G.D.-F., H.A.G.-O. and M.d.R.M.-B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dataset used can be found at the following GitHub link: https://github.com/RosaInfernal66/Dataset-Solar-Radiation-Forecasting-T. (accessed on 21 July 2024).

Acknowledgments

The authors would like to acknowledge the financial support provided by Consejo Nacional de Humanidades, Ciencias y Tecnologías (CONAHCyT) with scholarship 772743. We thank Comisión Nacional del Agua (CONAGUA), especially to Carlos Alean Rocha, for providing the dataset. Without their support, this work would not have been possible.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Nwaigwe, K.; Mutabilwa, P.; Dintwa, E. An overview of solar power (PV systems) integration into electricity grids. Mater. Sci. Energy Technol. 2019, 2, 629–633. [Google Scholar] [CrossRef]
Wu, C.; Zhang, X.P.; Sterling, M. Solar power generation intermittency and aggregation. Sci. Rep. 2022, 12, 1363. [Google Scholar] [CrossRef]
García Vera, Y.E.; Dufo-López, R.; Bernal-Agustín, J.L. Energy management in microgrids with renewable energy sources: A literature review. Appl. Sci. 2019, 9, 3854. [Google Scholar] [CrossRef]
Yousef, L.A.; Yousef, H.; Rocha-Meneses, L. Artificial Intelligence for Management of Variable Renewable Energy Systems: A Review of Current Status and Future Directions. Energies 2023, 16, 8057. [Google Scholar] [CrossRef]
Pinedo, V.J.; Mireles, F.; Ríos, M.C.; Quirino, T.L.; Dávila, R.J. Spectral signature of ultraviolet solar irradiance in Zacatecas. Geofísica Int. 2006, 45, 263–269. [Google Scholar] [CrossRef]
Attar, N.F.; Sattari, M.T.; Prasad, R.; Apaydin, H. Comprehensive review of solar radiation modeling based on artificial intelligence and optimization techniques: Future concerns and considerations. Clean Technol. Environ. Policy 2023, 25, 1079–1097. [Google Scholar] [CrossRef]
Duranay, Z.B. Fault Detection in Solar Energy Systems: A Deep Learning Approach. Electronics 2023, 12, 4397. [Google Scholar] [CrossRef]
Benchikh, S.; Jarou, T.; Boutahir, M.K.; Nasri, E.; Lamrani, R. Design of Artificial Neural Network Controller for Photovoltaic System. In Proceedings of the International Conference on Artificial Intelligence and Smart Environment, Errachidia, Morocco, 23–25 November 2023; Springer: Berlin/Heidelberg, Germany, 2023; pp. 559–565. [Google Scholar]
Thota, A.; Blanchard, B.; Mathew, L.; Rai, P.; Swarupananda, S. Short Term Forecasting of Solar Radiation. SMU Data Sci. Rev. 2022, 6, 12. [Google Scholar]
Gupta, R.; Yadav, A.K.; Jha, S.; Pathak, P.K. Time series forecasting of solar power generation using Facebook prophet and XG boost. In Proceedings of the 2022 IEEE Delhi Section Conference (DELCON), New Delhi, India, 11–13 February 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–5. [Google Scholar]
Zhang, G.; Patuwo, B.E.; Hu, M.Y. Forecasting with artificial neural networks: The state of the art. Int. J. Forecast. 1998, 14, 35–62. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
Willmott, C.J.; Matsuura, K. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clim. Res. 2005, 30, 79–82. [Google Scholar] [CrossRef]
Pearson, K. VII. Note on regression and inheritance in the case of two parents. Proc. R. Soc. Lond. 1895, 58, 240–242. [Google Scholar]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
Taylor, S.J.; Letham, B. Forecasting at scale. Am. Stat. 2018, 72, 37–45. [Google Scholar] [CrossRef]
Hastie, T.; Tibshirani, R.; Friedman, J.H.; Friedman, J.H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction; Springer: Berlin/Heidelberg, Germany, 2009; Volume 2. [Google Scholar]
Villegas-Mier, C.G.; Rodriguez-Resendiz, J.; Álvarez-Alvarado, J.M.; Jiménez-Hernández, H.; Odry, Á. Optimized random forest for solar radiation prediction using sunshine hours. Micromachines 2022, 13, 1406. [Google Scholar] [CrossRef] [PubMed]
Geurts, P.; Ernst, D.; Wehenkel, L. Extremely randomized trees. Mach. Learn. 2006, 63, 3–42. [Google Scholar] [CrossRef]
Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Cover, T.; Hart, P. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 1967, 13, 21–27. [Google Scholar] [CrossRef]
Freund, Y.; Schapire, R.E. Experiments with a new boosting algorithm. In Proceedings of the Thirteenth International Conference on International Conference on Machine Learning, San Francisco, CA, USA, 3–6 July 1996; Volume 96, pp. 148–156. [Google Scholar]
Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat. Methodol. 2005, 67, 301–320. [Google Scholar] [CrossRef]
Hoerl, A.E.; Kennard, R.W. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 1970, 12, 55–67. [Google Scholar] [CrossRef]
Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 1996, 58, 267–288. [Google Scholar] [CrossRef]
Pham, H.; Guan, M.; Zoph, B.; Le, Q.; Dean, J. Efficient neural architecture search via parameters sharing. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; PMLR: London, UK, 2018; pp. 4095–4104. [Google Scholar]
Gueymard, C.A.; Ruiz-Arias, J.A. Extensive worldwide validation and climate sensitivity analysis of direct irradiance predictions from 1-min global irradiance. Sol. Energy 2016, 128, 1–30. [Google Scholar] [CrossRef]
Lopez, G.; Batlles, F.J.; Tovar-Pescador, J. A new simple parameterization of daily clear-sky global solar radiation including horizon effects. Energy Convers. Manag. 2007, 48, 226–233. [Google Scholar] [CrossRef]
Gueymard, C.A. Clear-sky irradiance predictions for solar resource mapping and large-scale applications: Improved validation methodology and detailed performance analysis of 18 broadband radiative models. Sol. Energy 2012, 86, 2145–2169. [Google Scholar] [CrossRef]
Sun, X.; Bright, J.M.; Gueymard, C.A.; Acord, B.; Wang, P.; Engerer, N.A. Worldwide performance assessment of 75 global clear-sky irradiance models using principal component analysis. Renew. Sustain. Energy Rev. 2019, 111, 550–570. [Google Scholar] [CrossRef]
Kumler, A.; Xie, Y.; Zhang, Y. A New Approach for Short-Term Solar Radiation Forecasting Using the Estimation of Cloud Fraction and Cloud Albedo; Technical Report; National Renewable Energy Lab. (NREL): Golden, CO, USA, 2018.
Wan, C.; Zhao, J.; Song, Y.; Xu, Z.; Lin, J.; Hu, Z. Photovoltaic and solar power forecasting for smart grid energy management. CSEE J. Power Energy Syst. 2015, 1, 38–46. [Google Scholar] [CrossRef]
López, G.; Sarmiento-Rosales, S.M.; Gueymard, C.A.; Marzo, A.; Alonso-Montesinos, J.; Polo, J.; Martín-Chivelet, N.; Ferrada, P.; Barbero, J.; Batlles, F.J.; et al. Effect of cloudiness on solar radiation forecasting. In Solar Energy Resource Management for Electricity Generation from Local Level to Global Scale; Nova Science Publishers: New York, NY, USA, 2019; pp. 1–11. [Google Scholar]
Lara-Benítez, P.; Carranza-García, M.; Luna-Romera, J.M.; Riquelme, J.C. Temporal convolutional networks applied to energy-related time series forecasting. Appl. Sci. 2020, 10, 2322. [Google Scholar] [CrossRef]
Zafar, A.; Aamir, M.; Mohd Nawi, N.; Arshad, A.; Riaz, S.; Alruban, A.; Dutta, A.K.; Almotairi, S. A comparison of pooling methods for convolutional neural networks. Appl. Sci. 2022, 12, 8643. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Hora, S.K.; Poongodan, R.; De Prado, R.P.; Wozniak, M.; Divakarachari, P.B. Long short-term memory network-based metaheuristic for effective electric energy consumption prediction. Appl. Sci. 2021, 11, 11263. [Google Scholar] [CrossRef]
Sarmiento-Rosales, S.M. ENAS-Time-Series. 2024. Available online: https://github.com/SergioSarmientoRosales/ENAS-Time-Series (accessed on 27 June 2024).
MathWorks. Design Time Series NARX Feedback Neural Networks. 2024. Available online: https://la.mathworks.com/help/deeplearning/ref/narnet.html (accessed on 31 May 2024).
Su, X.; Yan, X.; Tsai, C.L. Linear regression. Wiley Interdiscip. Rev. Comput. Stat. 2012, 4, 275–294. [Google Scholar] [CrossRef]
Biau, G.; Scornet, E. A random forest guided tour. Test 2016, 25, 197–227. [Google Scholar] [CrossRef]
Mastelini, S.M.; Nakano, F.K.; Vens, C.; de Leon Ferreira, A.C.P. Online extra trees regressor. IEEE Trans. Neural Netw. Learn. Syst. 2022, 34, 6755–6767. [Google Scholar] [CrossRef] [PubMed]
Samal, K.K.R.; Babu, K.S.; Das, S.K.; Acharaya, A. Time series based air pollution forecasting using SARIMA and prophet model. In Proceedings of the 2019 International Conference on Information Technology and Computer Communications, Singapore, 16–18 August 2019; pp. 80–85. [Google Scholar]
Spirling, A. Why open-source generative AI models are an ethical way forward for science. Nature 2023, 616, 413. [Google Scholar] [CrossRef] [PubMed]

Figure 1. CNN applied to time series forecasting.

Figure 2. Compressed (left) and unfolded (right) basic RNN.

Figure 3. Basic architecture for ENAS.

Figure 4. RMSE for 1-day-ahead forecasting of daily GHI was calculated using all models on the test dataset (2018) for the OMZ stations, categorized by the three sky conditions defined in the text.

Figure 5. Comparison on the forecasted and measured next-day GHI for the test year 2018 using (a) the Prophet model and (b) ENAS according to three types of daily sky conditions: clear, partly cloudy, and cloudy. Statistical indicators of model performance are also provided.

Figure 6. Time series of 1-day-ahead GHI forecasted by the best and worst model and the corresponding measurements using the test data (2018).

Figure 7. Model comparison for the test year (2018) using (a) r and (b) RMSE %.

Table 1. Number of clear, partly cloudy, and cloudy days at OMZ station.

Station	Lat. (°)	Long. (°)	Elev. (m)	Clear Days	Partly Cloudy Days	Cloudy Days
OMZ	22.779	−102.565	2650	640	693	127
				43.84%	47.47%	8.69%

Table 2. Performance measures across all the models including RMSE % and r. Marked by an asterisk, ENAS demonstrates the best balance of these metrics across all sky conditions. The best results are highlighted with bold text.

	All-Sky	All-Sky	Clear	Clear	Partly Cloudy	Partly Cloudy	Cloudy	Cloudy
Results	r	RMSE %	r	RMSE %	r	RMSE %	r	RMSE %
Persistence	0.71	22.55	0.82	14.39	0.52	27.79	0.39	65.22
RNN	0.71	24.36	0.89	10.83	0.62	29.81	0.66	102.12
Linear Regressor	0.70	21.26	0.89	17.95	0.64	19.23	0.71	89.49
Random Forest	0.70	20.75	0.87	15.21	0.60	20.78	0.57	85.26
XGBoost	0.71	20.64	0.87	15.01	0.62	20.35	0.63	87.69
SVM	0.71	20.43	0.89	13.95	0.60	21.85	0.62	81.81
KNN	0.57	24.10	0.81	15.96	0.53	24.27	0.46	109.28
Extra trees	0.71	20.51	0.89	14.95	0.63	20.23	0.62	86.85
Adaboost	0.70	20.94	0.88	13.73	0.58	22.11	0.46	89.45
ElasticNet	0.72	20.70	0.90	14.93	0.64	20.09	0.63	90.68
Ridge	0.71	20.43	0.88	14.27	0.59	21.53	0.63	81.85
Lasso	0.72	20.27	0.89	14.27	0.61	20.75	0.59	84.57
ENAS *	0.73 *	20.05 *	0.86	14.17	0.57	21.42	0.56	77.1
CNN	0.66	22.38	0.86	18.77	0.59	20.18	0.64	87.21
Meta’s Prophet	0.55	24.54	0.82	14.43	0.58	23.98	0.58	125.32

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Escalona-Llaguno, M.I.; Solís-Sánchez, L.O.; Castañeda-Miranda, C.L.; Olvera-Olvera, C.A.; Martinez-Blanco, M.d.R.; Guerrero-Osuna, H.A.; Castañeda-Miranda, R.; Díaz-Flórez, G.; Ornelas-Vargas, G. Comparative Analysis of Solar Radiation Forecasting Techniques in Zacatecas, Mexico. Appl. Sci. 2024, 14, 7449. https://doi.org/10.3390/app14177449

AMA Style

Escalona-Llaguno MI, Solís-Sánchez LO, Castañeda-Miranda CL, Olvera-Olvera CA, Martinez-Blanco MdR, Guerrero-Osuna HA, Castañeda-Miranda R, Díaz-Flórez G, Ornelas-Vargas G. Comparative Analysis of Solar Radiation Forecasting Techniques in Zacatecas, Mexico. Applied Sciences. 2024; 14(17):7449. https://doi.org/10.3390/app14177449

Chicago/Turabian Style

Escalona-Llaguno, Martha Isabel, Luis Octavio Solís-Sánchez, Celina L. Castañeda-Miranda, Carlos A. Olvera-Olvera, Ma. del Rosario Martinez-Blanco, Héctor A. Guerrero-Osuna, Rodrigo Castañeda-Miranda, Germán Díaz-Flórez, and Gerardo Ornelas-Vargas. 2024. "Comparative Analysis of Solar Radiation Forecasting Techniques in Zacatecas, Mexico" Applied Sciences 14, no. 17: 7449. https://doi.org/10.3390/app14177449

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Comparative Analysis of Solar Radiation Forecasting Techniques in Zacatecas, Mexico

Abstract

Featured Application

Abstract

1. Introduction

2. Materials and Methods

2.1. Experimental Data

2.2. Persistence Model

2.3. Neural Network-Based Methods

2.3.1. Convolutional Neural Networks (CNNs)

2.3.2. Recurrent Neural Networks (RNNs)

2.3.3. Evolutionary Neural Architecture Search (ENAS)

2.4. Regression Models

2.5. Meta’s Prophet

3. Results

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI