1. Introduction
Renewable energy sources (RESs) are crucial for solving several social, economic, and environmental problems. The transition to RESs is largely driven by the need to combat climate change and a sustainable means of generating power. PV solar, one of the most used RESs, significantly reduces greenhouse gas emissions, reduces the carbon effect, and mitigates global warming. PVSs presently provide 1.7% of the world’s energy, and by 2025, their output power should be close to 1 TW [
1]. PV systems can be classified into two main categories, namely on-grid and off-grid, depending on their connection to the electrical grid. On-grid systems are interconnected with the utility grid. The grid is utilized as a storage mechanism, enabling surplus electricity produced by the solar panels to be injected back into the grid. At the same time, Off-Grid Systems are characterized by their lack of connection to the electrical grid. These systems function autonomously and are engineered to fulfill the electrical requirements of a particular site, such as a distant cabin or a secluded facility, without being dependent on external sources of power [
2].
The growth and success of industries, along with the development of the services sector, indicate that the need for electricity in these sectors will have a significant impact on the future energy landscape of the Kingdom of Saudi Arabia (KSA). The prioritization of industrialization and the expansion of service-oriented enterprises are significant factors that contribute to the increasing demand seen in this context, as depicted in
Figure 1. As can be seen, power consumption has exhibited a consistent linear rise over the past decade. Nevertheless, the decline observed in 2019 and 2020 can be attributed to the impact of the coronavirus pandemic.
Figure 2 shows how much power in Giga Watt hour (GWh) is consumed by different sectors, i.e., residential, industrial, government, commercial, and other loads [
3].
Saudi Arabia’s 2030 Vision aims to diversify the economy, reduce dependency on oil, and promote social and cultural development to transform the country into a more dynamic and globally competitive nation. The incorporation of a national renewable energy program is a crucial component of the strategic blueprint outlined in the kingdom’s 2030 vision [
4]. The geographic positioning of Saudi Arabia and the dry weather makes investments in renewable energy sources more attractive and reliable. The KSA has achieved very competitive pricing on a worldwide scale for the generation of power via wind and solar farms. As per the Saudi vision, it is projected that by the end of 2030, around 50% of the overall electricity generation will be attributed to clean energy sources, namely photovoltaic solar and wind turbine systems. The government supports and encourages partnerships between private companies and public organizations to invest in the renewable energy sectors. Therefore, the government started 12 Mega-Watt projects all over the kingdom. PV solar and wind turbine systems are both significant forms of renewable energy sources, each with distinct benefits and concerns. The selection between PV solar and wind energy is often influenced by variables such as the accessibility of resources, geographical attributes, and the particular demands of a certain project. In the unique setting of the Kingdom of Saudi Arabia (KSA), where there is a substantial amount of sun irradiation, it is relevant to emphasize some advantages of the PV when compared to wind energy. This assertion is supported by the installed PV and wind systems projects [
5]. This obligation has been translated to a reality when the government announced the 12 Mega-Watt projects all over the kingdom and more similar projects are to be announced [
4]. As a result of establishing these projects, oil usage will be reduced by 18.533 million barrels/year, which will have a substantial positive impact on air pollution.
Integrating the RESs with the power grid should comply with international codes and standards. Unlike the small-scale photovoltaic plant, certain requirements and codes should be applied when connecting a large-scale solar plant to the transmission network i.e., solar energy grid connection code (SEGCC) and grid code (GC) [
6]. In addition to these codes and standards, it is particularly important to know in advance how much power coming from RESs will be injected into the electricity grids. Therefore, predicting the solar farm output’s power is an essential factor for the power utility to conduct their plan correctly. However, the accuracy of predicting the output power is normally low due to the uncertainty of predicting the environment’s conditions such as rain, temp, cloud, etc. [
7]. As a result, certain power forecasting techniques are normally used to precisely predict the output power of these RES.
Accurate forecasting of PV-generated power is of paramount importance to optimize energy management, facilitate grid integration, and ensure the overall dependability of the system. This precise forecasting enables enhanced integration of solar energy into the electrical grid. Utilities and grid operators can accommodate variations in power production strategically, hence enhancing their capacity to efficiently manage the equilibrium between supply and demand in order to maintain system stability [
8]. The developed power prediction model of this study can offer numerous benefits that enhance both the operational efficiency and financial performance of the solar farm. To explain, the developed prediction model can accurately estimate the amount of power the solar farm will generate, allowing for better integration with the power grid. Also, once the solar company decides to install an energy storage system, the predictive data can be used in managing energy storage systems more efficiently. Moreover, accurate power predictions help in maintaining grid stability by ensuring that the energy supply from the solar farm matches the grid’s demand. In addition, predictive data can guide the scheduling of maintenance activities. By anticipating periods of lower power production, maintenance can be planned during these times to minimize the impact on overall energy output. Also, the predictive model provides data essential for financial planning and risk management. By predicting the power output, the farm can forecast revenue more accurately. Finally, accurate power prediction models are crucial for integrating larger shares of renewable energy into the power grid. They help in balancing and managing the variability and intermittency associated with solar power.
Typically solar power forecasting methods can be classified into two main approaches: statistical [
9] and artificial intelligence (AI) models [
10,
11]. Statistical approaches are usually used when historical time-series data are available. However, AI techniques are used for predicting the power of solar energy due to precise learning and regression capabilities. A comprehensive study and comparison of these two approaches and their different models are presented in [
12,
13]. The following paragraph will explain in more detail the popular methods used for predicting the output power of a PV system.
The multi-linear adaptive regression splines model is used when historical meteorological data (temperature, irradiance, humidity, etc.) is available [
14]. The machine learning method is widely used for predicting the output power of the solar farm based on the input data [
15]. The historical data can be classified based on weather parameters’ intermittency (sunny, cloudy, raining, etc.), and an ANN model is employed to predict a short-term PV power as shown in [
16]. The extreme learning machine ELM method is used for prediction to forecast near future (short time, i.e., 15, 30 min) parameters. The ELM can be optimized using the particle swarm optimization PSO model to obtain high accuracy. Optimized ELM shows better results compared to the ANN model [
17]. Data-driven models, i.e., the support vector machine (SVM), boosted regression tree (BRT), least absolute shrinkage and selection operator LASSO, and ANN, are usually used for multi-step forward prediction [
18]. Another method used for PV power prediction is the hybrid forecasting model, which is a combination of PSO/SVM with wavelet transformation to predict the PV output power in the short-term (day ahead) [
19]. Nowadays, deep learning is a hot research area of machine learning and AI. Deep learning depends on learning useful features from given data automatically, unlike traditional feature selection methods. Deep learning shows outstanding results in the field of PV power prediction [
20,
21].
More recent solar power prediction methods are proposed in many scientific studies, in particular, the selecting/clustering approach based on relevancy and redundancy criteria and the hybrid classification-regression forecasting (HCRF) engine [
22]. The selecting/clustering approach filters out unrelated features and divides relevant features into two different subsets to minimize the presence of redundancy of features. Each subset is connected to an HCRF engine which categorizes its training samples via a set of regression models based on their training. This proposed technique showed better results when compared to the well-known seven forecasters including multilayer perceptron (MLP), RBF, SVR, convolutional neural network (CNN), long short-term memory (LSTM), deep belief network (DBN), and gradient boosting machine (GBM). However, the error metrics, i.e., MSE, MAE, and MAPE, have higher values during the winter months [
22].
Several forecasting approaches have been proposed to estimate the output power of a solar farm. A comprehensive review presented in [
23] evaluates many research studies, published between 2010 and 2020, focusing on PV systems, output power forecasting using machine learning and deep learning methods, the approaches executed, the datasets employed, and the methods’ evaluation performance. However, the power scale of PV power solar farms is in the range of a few MW for short-term prediction [
23]. A research work introduced in [
24] presents an effective algorithm technique, combining support vector machines and weather classification, to predict the one-day-ahead power output of PV systems. The work was evaluated using a 20 kW PV station in China, whereas the model shows reliability in forecasting the power output for grid-connected PV systems amidst varying weather conditions. The findings from [
25] show that by implementing predefined data preprocessing, the model’s regression coefficient (
) can be enhanced. However, for PV systems with large datasets, the smoothing technique is not an ideal solution for the preprocessing method. Another study presented in [
26] examines two different methods to evaluate output power forecasting of 20 MW solar farm stations in China. A statistical and artificial intelligence based on time-series analysis techniques were used to predict output power hourly under different environmental conditions. For one-day-ahead prediction, the combination of two forecasting techniques shows better performance when compared to using only one forecasting method as proposed in [
27]. Most of the previous work proposed in the literature focuses on short-time forecasting and validates their proposed method using a small-scale solar farm system ranging from kW to a few MW capacity. However, this paper classifies the output power data into three categories (low, medium, and high) and assists the proposed idea by adapting the 300 MW solar farm’s data.
It is substantially essential for electric power utilities to know in advance the amount of power produced by the grid-connected RESs so that these companies can efficiently plan and dispatch energy from RESs and traditional sources. Additionally, the accurate prediction of the injected RESs’ power helps maintain the balance between the supply and consumed power so that power outages or surges are avoided. Usually, power utilities, such as the Saudi Electricity Company (SEC), utilize megawatt (MW) power turbine generators, making it difficult to efficiently manage and operate these large power units. In the literature, many research articles focused on developing power prediction models of solar PV systems that work on a scale of a few megawatts, which does not match with real-life power generators’ ratings. Therefore, these models cannot provide the required high power prediction accuracy so that electric utilities can operate safely and efficiently. Hence, this work presents different ML models to accurately predict the generated power of the investigated solar farm. These models are developed, taking into consideration the classification of the produced 300 MW, presented in
Table 1, since traditional power turbines are normally rated in tens of megawatts.
The developments of this research article are to utilize the obtained data of the 300 MW solar farm located in the north of Saudi Arabia to test various machine learning (ML) models on the output power prediction of the PV facility. The developed ML models are then tested, considering these data as classified into three categories, namely low, medium, and high, to achieve high accuracy prediction of the produced power.
The remainder of this paper is organized as follows.
Section 2 describes the data collection and preparation that was used in this study. Developed methods based on the machine learning (ML) approach are developed in
Section 3. The experimental results for the developed ML approaches are given in
Section 4, and the concluding remarks are given in
Section 5.
4. Experimental Results of Real Pv Solar Farm Data
The different machine learning techniques were implemented using the Keras library based on TensorFlow as a backend. The implementation was performed using Python program language. The data that have been used is for May, June, July, and August. There are 1476 reads on average for each month. The measures for the total solar irradiance on an inclined, total solar irradiance on a horizontal plane, ambient temperature (degree centigrade), and module surface temperature (degree centigrade) are taken each half hour.
To evaluate the classification approaches, the leave-one-out cross-validation test was used by dividing the whole dataset into five folds. This methodology is a rigorous and accurate evaluation method compared to the division of the data into training and testing sets. One-fold out of the five folds is removed to represent the testing set and the remaining four folds are combined to represent the training set that will be used for training the machine learning method. This process is then repeated five times by removing one-fold each time in order to have a different fold for testing each time. The average of the results from the five folds was taken to represent the final prediction result.
Table 2 provides a comprehensive evaluation of six machine learning models, namely the SVM with RBF, the SVM with the polynomial kernel, the SVM with the sigmoid kernel, the SVM with the linear kernel, deep Neural Network, and Decision Tree. It utilizes the metrics accuracy, precision, recall, F1 Measure, Mean Squared Error (MSE), and R-squared for comparison. The SVM with the RBF kernel model shows exceptional performance, leading in almost all metrics, notably in accuracy, precision, recall, and F1 Measure. This indicates its effectiveness in making correct predictions as well as its balanced approach between precision and recall. The SVM with the linear kernel model also shows similarly high performance, particularly in accuracy and precision, suggesting its suitability for applications in electricity classification. In contrast, the SVM with the sigmoid kernel model under performs across all metrics. Its lower scores in precision, recall, and F1 Measure indicate a tendency to make incorrect predictions and a poor balance between identifying true positives and negatives.
This makes it less suitable for output power classification. The Deep Neural Network and Decision Tree models display robust performances, with high accuracy and R-Square scores. The Deep Neural Network’s high accuracy suggests its effectiveness in learning from the training data, while the decision tree’s competitive R-Square indicates its capability to explain variance in data. The SVM with a polynomial kernel model, while not leading in any metric, shows strong results, especially in precision and R-Square, making it a reliable choice for output power classification. Overall,
Table 2 highlights the varied strengths and weaknesses of these models, providing valuable insights for selecting the most appropriate model for different machine learning tasks.
Figure 11,
Figure 12,
Figure 13,
Figure 14,
Figure 15 and
Figure 16 show the Receiver Operating Characteristic (ROC) curves, which are used to evaluate the performance of the six classification models as threshold-independent measures. The ROC Curve is a plot with the True Positive Rate (TPR) on the y-axis and the False Positive Rate (FPR) on the x-axis. Each point on the ROC curve represents a sensitivity/specificity pair corresponding to a particular decision threshold. The True Positive Rate (TPR), also known as sensitivity, is the ratio of correctly predicted positive observations to all actual positives. It is calculated as TPR = TP/(TP + FN), where TP is the number of true positives and FN is the number of false negatives. The False Positive Rate (FPR) is the ratio of incorrectly predicted positive observations to all actual negatives. It is calculated as FPR = FP/(FP + TN), where FP is the number of false positives and TN is the number of true negatives.
The classification models evaluate more than two classes. In this case, three classes (0, 1, and 2), represent Low, Medium, and High power outputs, respectively. As presented in
Figure 11, the SVM with the RBF kernel curve shows the area under the curve (AUC) for each class, indicating how well the model is at distinguishing between classes. Class 0, Class 1, and Class 2 have an AUC of 0.99, which indicates that all three classes are distinguished by the model. The micro-average or the average ROC curve that considers the performance across all classes shows an AUC of 1.00. The micro-average AUC being 1.00 suggests that the SVM with the RBF kernel performs exceptionally well across all classes. It is important to note that the specifics of the data and the task at hand determine the type of kernel that should be used with the SVM.
Since solar power generation often exhibits complex, non-linear patterns due to various factors, like weather conditions, time of day, and seasonal changes, then the SVM used with the RBF kernel, which is particularly effective at capturing non-linear relationships in the data, is a potentially effective approach for optimizing solar power prediction.
Figure 12 shows the SVM with the polynomial kernel curve shows AUCs of 1.00, 0.92, and 0.99 for Class 0, Class 1, and Class 2, respectively. The AUC of Class 1 shows that the SVM with the polynomial kernel is not as good as the SVM with the RBF kernel in distinguishing Class 1. The micro-average AUC of the SVM with the linear kernel is 0.99, which indicates that the model performs exceptionally well across all classes, as presented in
Figure 13. The SVM with the linear kernel curve shows AUCs of 1.00, 0.83, and 1.00 for Class 0, Class 1, and Class 2, respectively. The AUC of Class 1 shows that the SVM with the linear kernel has a problem in distinguishing Class 1. The micro-average AUC of the SVM with the linear kernel is 0.99, which indicates that the model performs exceptionally well across all classes. The SVM with the sigmoid kernel curve, as depicted in
Figure 14, shows AUCs of 0.98, 0.03, and 0.84 for Class 0, Class 1, and Class 2, respectively. The micro-average AUC of the SVM with the linear kernel is 0.84. These results show that the SVM with the sigmoid kernel has the poorest performance among the SVM models.
Figure 15 illustrates the Decision Tree model which has a performance almost similar to that of the SVM with a polynomial kernel. The Deep Learning model shows AUCs of 0.99, 0.97, and 0.99 for Class 0, Class 1, and Class 2, respectively, as shown in
Figure 16. The micro-average AUC of the Deep Learning model is 0.99. These results show that the Deep Learning model has a good performance in distinguishing between the classes.