**Comparison of Power Output Forecasting on the Photovoltaic System Using Adaptive Neuro-Fuzzy Inference Systems and Particle Swarm Optimization-Artificial Neural Network Model**


Received: 15 November 2019; Accepted: 27 December 2019; Published: 10 January 2020

**Abstract:** The power output forecasting of the photovoltaic (PV) system is essential before deciding to install a photovoltaic system in Nakhon Ratchasima, Thailand, due to the uneven power production and unstable data. This research simulates the power output forecasting of PV systems by using adaptive neuro-fuzzy inference systems (ANFIS), comparing accuracy with particle swarm optimization combined with artificial neural network methods (PSO-ANN). The simulation results show that the forecasting with the ANFIS method is more accurate than the PSO-ANN method. The performance of the ANFIS and PSO-ANN models were verified with mean square error (MSE), root mean square error (RMSE), mean absolute error (MAP) and mean absolute percent error (MAPE). The accuracy of the ANFIS model is 99.8532%, and the PSO-ANN method is 98.9157%. The power output forecast results of the model were evaluated and show that the proposed ANFIS forecasting method is more beneficial compared to the existing method for the computation of power output and investment decision making. Therefore, the analysis of the production of power output from PV systems is essential to be used for the most benefit and analysis of the investment cost.

**Keywords:** PVs power output forecasting; adaptive neuro-fuzzy inference systems; particle swarm optimization-artificial neural networks; solar irradiation

#### **1. Introduction**

At present, the world population has become more alert to the use of renewable energy due to the impact of the use of energy from coal, petroleum, and natural gas, emitting large amounts of carbon dioxide gas, which causes global warming [1]. There are many interesting renewable energy sources such as solar energy, wind energy, water energy, ocean tidal energy, geothermal energy, and biofuels. These renewable energy sources are non-polluting and do not negatively affect the environment [2].

Nowadays, many scientists and researchers have studied and developed the ability to use renewable energy, such as solar energy, wind energy, and water energy, and it is expected that by 2030, 100% of energy will come from such renewable resources. Therefore, there is a clear trend that these

renewable energies will play an important role in Thailand as well [3,4]. The most popular renewable energy in Thailand is solar energy. Since Thailand is located near the equator, it receives high amounts of solar energy. The average energy that can be obtained nationwide is about 4 to 4.5 kilowatt-hours per square meter per day. It consists of approximately 50% of direct radiation and the rest is diffused radiation, which is caused by water droplets in the atmosphere (clouds), which are higher than the area away from the equator to the north-south [5]. According to energy consumption estimates by the International Energy Agency in the year 2011, the use of solar thermal energy may be the main energy for electricity generation in the world in the next 50 years, which can reduce greenhouse gas emissions that affect the environment [6]. Solar radiation is measured to evaluate the energy potential, with hourly measurements that are 295–2800 nm and 695–2800 nm and ultraviolet (UV) 295–385 nm at the National Observatory of Athens (NOA) [7]. The intensity of the solar radiation will vary according to the area, day, time, and season, it will be of very high intensity in the afternoon and the sky conditions and wind speed will also affect the intensity and distribution of solar radiation [8]. However, the PV power output has a significant limitation in terms of instability of the power system as its output varies over a wide range throughout the day according to the available solar radiation. The energy storage can improve the stability of the PV system until low solar irradiance. Now, it is well known that the storage is still expensive, so we should get the most out of it [9]. Today, the industrial internet of things (IIoT) plays an essential role in creating tools to help entrepreneurs or investment decision-makers know the benefits of investing in IIoT applications of machine learning and deep learning to provide a new way to develop complex system models, instead of using system physics models to describe the behavior of that system. Some algorithms can infer the operation of the model from the sample input data, in which these models are used to forecast the state of the system, and it is often called forecasting analysis [10].

The power forecasting of PV systems has a variety of methods depending on the data that is used, such as the PV power forecasting with the 1D5P method. This uses the parameters of the solar panel and the solar radiation intensity to calculate the PV power of the solar system and can also improve the accuracy of power output with a weight function that is appropriate for each area [11]. Later, the predictive method with the HIstorical SImilar MIning (HISIMI) method was used to predict the short-term power of the solar system, and it was adjusted by using genetic algorithms and using historical forecasting data [12]. There are discrepancies in the power output forecasting for renewable energy due to fluctuations in renewable energy. The learning methods of machine learning models are more accurate than traditional predictive models. Precise forecasting helps stakeholders decide to plan investment and install solar farms connected to grid systems [13,14]. The auto-regressive moving average (ARMA) model is used, including the exponentially weighted moving average (EWMA) improvements by Piorno et al. [15]. Biological systems and natural phenomena exhibit chaotic behaviors that have been applied for PV power output forecasting. Such as least-squares support vector machines (SVM) methods, it can predict experimental chaotic time series [16] The power forecasting methods with artificial neural networks (ANN) is one method that is providing precise values, using actual measurement data learning to predict future power [17]. The hybrid method forecasting using artificial neural networks has been the basis of fuzzy inference systems (ANFIS) [18] and predictions using particle swarm optimization methods combined with artificial neural networks (PSO-ANN) [19]. The PV power forecasting with these hybrid methods is more accurate than other methods, which are interesting for many researchers to improve the power efficiency of the PV system as well as to develop different forecast models. In this research, we wanted to focus on the ANFIS model because it is an interesting method of forecasting at present, and it is the forecasting method with deep learning techniques that have not been widely used forecasting on solar power generation systems. It is a forecasting technique that can be applied to other complex systems that provide accurate and quick predictions. Then the ANFIS model is applied and compared with the ANN-PSO method, which has similar forecasting techniques. PSO-ANN is one of the forecasting techniques that is more accurate. As the number of iterations and the number of particle swarms increases, the accuracy becomes more and more accurate. Therefore, the researcher focused on these two forecasting techniques.

So, this work was to study the power output forecasting of the PV systems by the hybrid model and the fast-operating process model. Therefore, the forecasting model using the ANFIS was applied in the study. The simulation results were compared with the PSO-ANN model forecasting. This article is divided into the following six sections. The first section is the introduction. The second section is an explanation of energy efficiency analysis in Thailand. The third section explains the PV power output forecasting model. The fourth section will discuss the case studies of this research and PV power data analysis, and in this study, we will use case study data as a solar system in Thailand. The fifth section will mention simulation results and analysis, and the last section is conclusions.

#### **2. Energy E**ffi**ciency Analysis in Thailand**

#### *2.1. Energy Sources in Thailand*

Natural gas is the primary fuel for electricity generation in the country. If considering the installed production capacity, the power plants that use natural gas as a fuel are approximately 67 percent. Natural gas is a hydrocarbon compound consisting of hydrogen and carbon, which is caused by the accumulation of fossilized micro-organisms that are hundreds of millions of years old. It is also able to separate components into methane, ethane, propane, butane, pentane, petroleum with gasification, etc.

Natural gas is clean energy, and the cost of electricity with natural gas is lower than fuel oil. However, it is slightly higher than coal because the electricity generating system in Thailand has a high proportion of natural gas, so we want to diversify to use other forms of fuel. The advantage of natural gas fuel is that it is a petroleum fuel that can be used with high efficiency. It has complete combustion and highly secure usability because it is lighter than the air, therefore, it floats up when a natural gas leak occurs. Most of it used in Thailand is produced by itself from domestic sources, thus helping to reduce the import of other fuels and saving a lot of foreign currency.

Lignite coal is a natural fuel used in electricity generation, which is a combustible fuel mineral. It consists of four essential elements, which are carbon, hydrogen, nitrogen, and oxygen. It is the second most used fuel from natural gas. The advantage of lignite is that electricity production costs are lower than other fuels, whether natural gas and renewable energy. Coal has a large number of reserves, especially lignite and sub-bituminous coal, which are found mainly in Mae Moh District and Li District, Lampang Province, Mueang District, and Krabi Province. Currently, clean coal technology can be used, which can eliminate more than 99% of coal pollution.

Normally, two types of fuel, namely furnace oil and diesel oil, are used for electricity generation in Thailand. Due to price hike in the international market, such fuels are becoming more and more expensive, resulting in an increase in electricity prices. Additionally, fuel oil causes more pollution than diesel and natural gas. Therefore, fuel oil was used as a secondary fuel rather than a primary fuel.

Diesel is used as fuel for electricity generation of diesel power plants. In Thailand, there are only three locations. Nowadays, the price of diesel has dramatically increased, resulting in a high cost. It also causes more pollution than natural gas. The use of diesel fuel, therefore, is used as a secondary fuel rather than a primary fuel.

Solar energy is a natural energy that is clean and free from pollution. At this time, it is used widely around the world. It is renewable energy with high potential and can be used endlessly. Especially, the use of solar energy to produce electricity, which will help strengthen the electrical system of Thailand and also help reduce global warming. The advantage of solar energy is that it is the largest natural energy source. It is an energy that will never run out, and there is no fuel cost. Solar energy can be used in energy sources that do not have electricity and are far away from power transmission and distribution systems. It is clean energy that does not cause pollution from the electricity production process. There is also renewable energy from water power, wind power, biomass energy, biogas, and waste energy. Figure 1 shows the proportion of electricity production from fuel and renewable energy sources in Thailand. In Thailand, the production of electricity from biomass is the highest, followed by water and solar energy [20].

**Figure 1.** The proportion of electricity production from fuel and renewable energy sources in Thailand.

#### *2.2. Solar Radiation*

Solar radiation is the energy released from the sun, which hits the edge of the atmosphere called extraterrestrial solar radiation. In Thailand, ten provinces have the highest potential for the production of electricity from solar energy. Nakhon Ratchasima Province has the highest potential for solar electricity production in Thailand. However, the availability of high-intensity solar radiation will not guarantee the most efficient site for the installation of generation facilities, there are many other factors to be considered [21,22].

The solar potential in Thailand found that most areas receive the highest solar radiation between April and May in the range of 5.56–6.67 kWh/m2/day. Moreover, the area that has the highest average annual solar radiation is in the Northeast. It covers parts of Nakhon Ratchasima, Buriram, Surin, Sisaket, Roi Et, Yasothon, Ubon Ratchathani, and Udon Thani and parts of the central region with an average annual solar radiation of 5.28–5.56 kWh/m2/day. The area accounts for 14.3% of the country's total area. Additionally, 50.2% of the total area receives the average annual solar radiation during 5–5.28 kWh/m2/day, and only 0.5% of the total area exposed to solar radiation is less than 4.45 kWh/m2/day [23]. Figure 2 shows the average daily solar radiation intensity in Thailand.

**Figure 2.** The average daily solar radiation intensity in Thailand [23].

#### **3. PV Power Output Forecasting Model**

The accurate forecasting of the PV power generation is essential for the estimation of cost and breakeven. Currently, there are many researchers, including this research, interested in studying the forecasting of power output produced by solar cell systems. In this section, we will explain the principles of the ANFIS and PSO-ANN forecasting models. The PSO-ANN model was used to compare the forecasting result, which has higher accuracy than other methods and can be applied to other areas of research [18]. The models of both methods are shown in the following topics.

#### *3.1. ANFIS Model*

Adaptive network-fuzzy inference systems (ANFIS) is a type of network adaptation based on the fuzzy inference systems (FIS), which is a theory adapted from the fuzzy logic theory. The fuzzy rules from the input and output data groups are created using the basis constructing of the neural network system. Artificial neural network (ANN) is a calculation model where its functions and methods are based on the structure of the human brain cells. The neural network follows graph topology in which neurons are nodes of the graph and weights are edges of the graph. It consists of such multilayers that should be a limit in order to the time of problem-solving. The weights changing in the connections between network layers are the training process of the network to achieve the expected output. Another model is neuro-fuzzy, which is a combination of fuzzy logic and neural networks to solve a variety of problems efficiently.

This theory is used in the analysis of problems consisting of information that is widely characterized by uncertainty. The structure of ANFIS will be a fuzzy inference system, which under consideration has two inputs (*x* and *y*), and one output is *f* for the first Sugeno fuzzy model [24,25]. This study uses the hybrid learning algorithm with the following principles. Figure 3 shows the fuzzy inference system [26].

**Figure 3.** The fuzzy inference system [26].

Layer 1 consists of a member function of each input variable, which can be adjusted. The variables in this layer are also known as premise parameters and can be shown in Equation (1).

$$
\rho\_{1,i} = \mu\_{Ai}(\mathbf{x}) \tag{1}
$$

Layer 2 is the calculation of all possible equations of the input vector relationships, which can be expressed in Equation (2). It has 4 (2<sup>2</sup> = 4) fuzzy rules.

$$
\rho\_{2,i} = wi(\mathbf{x}, y) = \mu\_{Ai}(\mathbf{x})\mu\_{Bi}(y) \tag{2}
$$

Layer 3 is normalized to find the input vector obtained from Layer 2, as shown in Equation (3).

$$\rho\_{\mathfrak{Z}\_l} = \overline{w} = \frac{w\_l}{w\_1 + w\_2} \tag{3}$$

*Energies* **2020**, *13*, 351

Layer 4 is called standard perceptron, which can be written according to Equation (4), where (*p*, *q*, *r*) is called the consequent parameter.

$$
\rho\_{4,i} = \overline{w}\_i \underline{f}\_i = \overline{w}\_i (p\_i \mathbf{x} + q\_i \mathbf{y} + r\_i) \tag{4}
$$

Layer 5 is the calculation to find the output value in real numbers, and the result can be written in Equation (5).

$$
\rho\_{\overline{5},i} = \sum\_{i} \overline{w}\_{i} f\_{i} \tag{5}
$$

In this study, the determination of consequent parameters using the least-square estimator method and the reverse pattern study to modify the premise parameters using the gradient descent method. The structure of the ANFIS is shown in Figure 4. Using the ANFIS model to predict the PV power output in this article, it will use 2 inputs, panel temperature, and solar radiation. The model training with one output using the measured power output from the PV system. The structure of a 5-layer model identifies a fixed node, while a square refers to a modified node in which parameters are changed during adjustment or training.

**Figure 4.** The structure of the adaptive neuro-fuzzy inference systems (ANFIS) model.

#### *3.2. PSO-ANN Model*

Particle swarm optimization (PSO) is a natural-inspired algorithm, especially the movement of fish and bird swarms. The change of both types is the movement of the small elements that move together in synchronous time. Fish or birds can move in a swarm, separate from the swarm, and then reunite into the swarm again. The movement of this particle swarm can be considered social behavior. Details of the PSO process were presented in 1995 by James Kennedy and Russell Eberhart [27].

A bird, which is comparable to one particle, and each particle remembers its current position, along with the direction and speed of its movement. Figure 5 shows the position and direction of particle movement. When each particle moves, each particle collects its best data (*Pbest*) and compares it to find the best position of every particle (*Gbest*). Every cycle at *t* time, the speed of movement is changed by using data of the best position of each particle and the best position of all particles [28,29].

**Figure 5.** The position and direction of particle movement [28].

PSO has many features that are similar to evolutionary calculations, such as the genetic algorithm (GA). The initial population is randomly generated and used to find the best answer by adjusting that population in every calculation cycle. The solution of the system is represented by particles moving in the search space in the direction of the particles that are closest to the answer that is most appropriate at that time. PSO has been successful in many applications such as optimization of functions, training of artificial neural networks, and the fuzzy control system, including finding the power output or forecasting the suitable power of the photovoltaic power generation systems [30].

Artificial neural networks (ANN) is a simulation of the human nervous system with repeated learning. When learning from something that is repeated many times, it will be able to find a relationship from past learning [31,32]. Figure 6 shows the structure of an artificial neural network that is simulated from the human nervous system. This article will be a supervised learning network to help define the output target for the artificial neural network that uses the multilayer feedforward neural network. It is a back-propagation type which is rather complicated and non-linear. Each neural consists of weight and bias, which begin at random. There is also an activation function or transfer function, which helps to calculate the suitable values such as tan sigmoid, log-sigmoid, and linear. ANN consists of neurons, also called nodes in a circle (Node); the line connecting the nodes is called the weight (Weight) indicates the connection between nerve cells. The artificial neural network has three layers, the input layer, hidden layer, and the output layer.

**Figure 6.** Forecast model based on artificial neural networks (ANN).

The procedure of the hybrid PSO-ANN forecast model is a combination of particle swarm optimization algorithm and BP\_ANN algorithm using MATLAB® programs [33,34]. The first step is to determine the number of particles in the ANN structure, beginning with the sampling of particles showing weights, determining the position and speed of the particles. Next, simulate an artificial neural network and evaluate the suitability of initial particles. Find the best of *Gbest* and *Pbest*, calculate the fitness of each particle in the ANN structure. Find the best fitness in the group or calculation cycle, improve particle velocity and position. Then collect the best particle of the current particle and repeat it until you reach the maximum number of iterations you set [30,35]. Figure 7 shows a diagram of the operation of the hybrid PSO-ANN forecast model.

The parameters of the PSO algorithm are set before the optimization of the ANN model, which has two inputs: panel temperature and solar radiation. The training output is PV power output form the PV systems. Therefore, the structure of the ANN model is 2 8 8 8 4 1. For this article, use the number of particle swarm 100, the maximum number of iterations 100, lower and upper bound of variables −5 to 5, inertia weight 1, inertia weight damping ratio 0.99, personal learning coefficient 1.5, global learning coefficient 2.0, and 4 the number of neurons as shown in Table 1.

**Table 1.** PSO-ANN parameters.

**Figure 7.** Diagram of the operation of the hybrid PSO-ANN forecast model.

#### *3.3. Accuracy of the Simulation Results*

After the simulation, the simulation results will be checked for accuracy, so all measurement forms will always have inaccuracies or uncertainties. An effective experiment must begin with the smallest data error; the percentage of the error can determine the accuracy and reliability of the experiment. Therefore, there must be a realistic and accurate quantity for comparison. If *S* is defined as the standard quantitative physics and *E* is the same physical quantitative value as *S* but obtained from the experiment, then the percentage error can be determined by the Equation (6).

$$\text{The percentage error} = \frac{|E - S|}{S} \times 100\% \tag{6}$$

Statistical analysis can find the best numerical solution of all data sets and determine the statistical error of the answer. The best replacement number is the average value or mean value [36]. In order to assess the accuracy of the prediction methods, the mean absolute percent error (MAPE) and root mean squared error (RMSE) are used as criteria for consideration, [31] which can be found in the following equation.

Mean square error (MSE) is an indication of the variance of the forecasting error which can be obtained by Equation (7)

$$MSE = \frac{1}{n} \sum\_{t=1}^{n} \left( \frac{X\_i - \mathbf{Y}\_i}{\mathbf{Y}\_i} \right)^2 \tag{7}$$

where *Yi* is the measured power value, *Xi* is the predicted value, and *n* is the amount of data to be tested; it is the power in every period of measurement.

Root mean square error (RMSE) or standard error (SE) is shown by Equation (8).

$$RMSE = \sqrt{\frac{1}{n} \sum\_{t=1}^{n} \left(\frac{X\_i - \mathbf{Y}\_i}{\mathbf{Y}\_i}\right)^2} \tag{8}$$

Mean absolute error (MAP) is shown by Equation (9).

$$MAE = \frac{1}{n} \sum\_{i=1}^{n} \left| \frac{X\_i - \mathbf{Y}\_i}{\mathbf{Y}\_i} \right| \tag{9}$$

The mean absolute percent error (MAPE) is shown by Equation (10). It is also possible to calculate the forecast accuracy (Acc), which indicates how close the forecast the power value to the actual value, which can be obtained by Equation (11).

$$MAPE(\%) = \frac{1}{n} \sum\_{i=1}^{n} \left| \frac{X\_i - \mathcal{Y}\_i}{\mathcal{Y}\_i} \right| \times 100\tag{10}$$

$$Acc = 100 - \text{MAPE (\%)}\tag{11}$$

#### **4. PV Power Output Data Analysis**

This research uses a case study area in the northeastern region of Thailand with the installation of a 14 MW solar cell system using a 330 W polycrystalline solar panel, the maximum solar irradiation of 1.142 kW/m2, the highest ambient temperature of 39.4 ◦C, and the highest panel temperature of 57.44 ◦C. This data is obtained from a PV plant in the Nakhon Ratchasima province. The results of the measurement are used for one year as forecasting data. Figures 8 and 9 shows the annual solar irradiation and solar panels temperature.

 **'D\\HDU**

**Figure 9.** Annual solar panels temperature.

This PV power generation system will be connected to the grid system (Grid connected) and it will produce the electricity for the power distribution systems to be the source of the residential electricity load. In order to study the energy efficiency, we must also analyze the output power of this solar power generation system. Figure 10 shows the PV power output produced by the solar system case study and solar irradiance, for example 5-day data during April in 2018. The input data used for learning of the model is the temperature of the solar panel, ambient temperature, solar irradiance, and PV output power.

**Figure 10.** PV power output and solar irradiance.

#### **5. Simulation Results and Discussion**

A MATLAB program is used to simulate the PV power output forecasting by inputting one year of input data, which is the actual measured power output, solar irradiation, panel temperature, and ambient temperature. The PSO-ANN technique is used to simulate the calculation cycle of 100 cycles, the population of 100 population, and the number of neurons of 10 cells. The annual power output forecasts are shown in Figure 11, in which some of the predicted electrical power is less than zero because it is a random method. It has a maximum error of 1721 kW, as shown in Figure 12. In April it is the month in which the most energy was produced from solar energy and it is summer in Thailand, which has the most solar radiation intensity. Therefore, the forecasting results of April 1–7 were selected. Figure 13 shows a weekly PV power output forecasting with PSO-ANN. Figure 14 shows a weekly percentage error of PV forecasting using PSO-ANN. In a positive state, the predicted value is higher than the actual value, and in a negative state, the predicted value is less than the real value. It can be seen that the percentage error from the forecast is as high as 190.7% because the PV power forecasting value is much higher than the actual PV power of the system (actual PV power value of 238.5 kW and PV power forecasting value of 693.5 kW). The PSO-ANN is a technique for determining optimal values with sampling, so high errors may occur due to the input instability. The simulation results, when compared to the real power output, can be seen that at night without irradiation intensity, the PSO-ANN model has a rather high discrepancy. The calculation efficiency of the ANFIS method was 429.522 s.

**Figure 12.** Annual PV power output error with PSO-ANN forecasting.

**Figure 14.** Weekly PV power output percentage error with PSO-ANN forecasting.

The simulation of PV power output forecasting with ANFIS is shown in Figure 15, which is a comparison between the actual PV power output and the PV power forecasting. Figure 16 shows the annual error of PV power output forecasting, which a maximum of 1742.2 kW. The maximum error occurs in the period from October to December. Figure 17 shows a weekly PV power output forecasting with ANFIS, and Figure 18 shows a weekly percentage error of PV power forecasting using ANFIS. It can be seen that the forecasted results at night have less error than the PSO-ANN method. Figure 19 shows the comparison of power output forecasting, which shows that the PV power forecasting using the ANFIS method provides more accurate forecasting results than the PSO-ANN method. The calculation efficiency of the ANFIS method was 3.5675 s.

**Figure 16.** Annual power output error with ANFIS forecasting.

**7LPHKRXUV**

**Figure 19.** Comparison of PV power output forecasting.

Table 2 shows the calculation results to assess the accuracy of the simulation results of the PV power forecasting model using the PSO-ANN model and ANFIS model. The indicators such as MSE, RMSE, MAD, MAPE, and Accuracy are used for evaluating forecasting performance. The simulation results show that the ANFIS model has less error than the PSO-ANN model and has 99.8532% greater accuracy. RMSE is another indicator used to check the accuracy of the simulation results. Forecasting results with the ANFIS model provided an RMSE value of only 0.1184, which is less than the PSO-ANN model. When compared with the results of the PV power output forecasting of D. Lee and K. Kim [37], they predict the power output with long- and short-term memory (LSTM) -based models with an RMSE of 0.563 with summer, so it shows that the simulation results with the ANFIS model have greater accuracy. The calculation efficiency of the ANFIS model takes less time than the PSO-ANN model. Therefore, this article shows that the ANFIS model is more efficient for the PV power forecasting than the PSO-ANN model, and it also takes quick calculations.


**Table 2.** Performance and forecast accuracy of the model.

#### **6. Conclusions**

In this article, the PV power output is forecast using one-year of electricity production data from a solar power plant in the northeast Thailand area. A comparison of the PV power output forecasting using the ANFIS and PSO-ANN method was undertaken. The performance of the ANFIS and PSO-ANN models were verified accurate with MSE, RMSE, MAP, and MAPE. The accuracy of the ANFIS model is 99.8532%, and the PSO-ANN method is 98.9157%. The calculation efficiency of the ANFIS model takes less time than the PSO-ANN model. The simulation results show that the ANFIS method has more accurate simulation results than the PSO-ANN method. For the most efficient use of PV power generation systems, it is necessary to analyze the energy consumption of the user load (household loads, industrial loads, department store loads). The forecasting results of the PSO-ANN model has more discrepancies than the ANFIS method at night. Therefore, nighttime inputs may be omitted before the simulation. The ANFIS model is an interesting method for forecasting at present, and it uses deep learning techniques to solve the problem. It is a forecasting technique that can compare to other complex systems which provide accurate and quickly predictions. The power output forecasting is essential for planning the installation of a PV power system. The PV power output forecasting using the ANFIS model is another method that can predict and analyze the energy, cost, and cost-effectiveness that will occur in the future. Future work will study the PV power output forecasting in a new method and a more efficient way and improve the input data with small deviations, which can make the simulation results more accurate. Additionally, to analyze the simulation results using reliable and widely used methods.

**Author Contributions:** P.D. and T.B. conducted the initial design, conceptualization, writing, editing, and developing the proposed forecasting method; N.J., K.S., and S.K. contributed the resources, data curation, and analysis; W.T. and S.N. conceived the theoretical approaches, review, and investigation. All authors have read and agreed to the published version of the manuscript.

**Funding:** The APC was funded by King Mongkut's Institute of Technology Ladkrabang (KMITL).

**Acknowledgments:** The author's thanks to King Mongkut's Institute of Technology Ladkrabang for supporting this research. The authors would like to thank the solar energy technology laboratory, National Electronics and Computer Technology Center, National Science and Technology Development Agency (NSTDA) for supporting the data for this research.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Multi-Step Solar Irradiance Forecasting and Domain Adaptation of Deep Neural Networks**

**Giorgio Guariso 1, Giuseppe Nunnari <sup>2</sup> and Matteo Sangiorgio 1,\***


Received: 2 June 2020; Accepted: 27 July 2020; Published: 2 August 2020

**Abstract:** The problem of forecasting hourly solar irradiance over a multi-step horizon is dealt with by using three kinds of predictor structures. Two approaches are introduced: Multi-Model (*MM*) and Multi-Output (*MO*). Model parameters are identified for two kinds of neural networks, namely the traditional feed-forward (FF) and a class of recurrent networks, those with long short-term memory (LSTM) hidden neurons, which is relatively new for solar radiation forecasting. The performances of the considered approaches are rigorously assessed by appropriate indices and compared with standard benchmarks: the clear sky irradiance and two persistent predictors. Experimental results on a relatively long time series of global solar irradiance show that all the networks architectures perform in a similar way, guaranteeing a slower decrease of forecasting ability on horizons up to several hours, in comparison to the benchmark predictors. The domain adaptation of the neural predictors is investigated evaluating their accuracy on other irradiance time series, with different geographical conditions. The performances of FF and LSTM models are still good and similar between them, suggesting the possibility of adopting a unique predictor at the regional level. Some conceptual and computational differences between the network architectures are also discussed.

**Keywords:** feed-forward neural networks; recurrent neural networks; LSTM cell; performances evaluation; clear sky irradiance; persistent predictor

#### **1. Introduction**

As is well known, the key challenge with integrating renewable energies, such as solar power, into the electric grid is that their generation fluctuates. A reliable prediction of the power that can be produced using a few hours of horizon is instrumental in helping grid managers in balancing electricity production and consumption [1–3]. Other useful applications could benefit from solar radiation forecasting, such as the management of charging stations [4,5]. Indeed, such a kind of micro-grid is developing in several urban areas to provide energy for electric vehicles by using Photo Voltaic (PV) systems [6]. To optimize all these applications, a classical single-step-ahead forecast is normally insufficient and a prediction over multiple steps is necessary, even if with decreasing precision.

Reviews concerning machine learning approaches to forecasting solar radiation were provided, for instance, in [7–10], but the recent scientific literature has seen a continuous growth of papers presenting different approaches to the problem of forecasting solar energy. According to Google Scholar, the papers dealing with solar irradiance forecast were about 5000 in 2009 and became more than 15,000 in 2019, with a yearly increase of more than 15% in the last period. The growth of papers utilizing neural networks as a forecasting tool has been quite more rapid since they grew up at a pace of almost 30% a year in the last decade and they constituted about half of those cataloged in 2019.

The short-term prediction of solar radiation and/or photovoltaic power output has expanded from traditional ARMA (Auto-Regressive Moving Average) models and their evolution such as ARMA-GARCH (Generalized Autoregressive with Conditional Heteroscedasticity) models [11], to Bayesian statistical models [12], and Naïve Bayes Classifiers [13], and to Neuro-Fuzzy Systems [14–16], and Fuzzy with Genetic Algorithms models [17]. More recently, other approaches such as Markov Switching Models [18], Support Vector Machines [19–21], wavelets transformation [22], kernel density estimation [23], and tree-based algorithms [24] have been suggested. Traditional Feed Forward (FF) neural network for ultra-short-time horizon (one-minute resolution) was proposed in [25]. Specific classes of neural networks have also been used, such as TDNN (Time Delay Neural networks) [26], Recurrent [27–29], Convolutional Neural Networks [30] and other configurations [31–35]. Other works specifically concentrate on the use of Long Short-Term Memory (LSTM) networks [36–38]. Specifically, Abdel-Nasser and Mahmoud's (2017) [36] use of the Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) to accurately forecast the output power of PV systems was proposed, based on the idea that such a network can model the temporal changes in PV output power due of their recurrent architecture and memory units. The proposed method was evaluated using hourly datasets of different sites for a year. The authors have implemented five different model structures in order to predict the average hourly time series one-step ahead. The accuracy of the models was assessed comparing the RMSE error in comparison with other techniques, namely Multiple Linear Regression (MLR) models, Bagged Regression Trees (BRT) and classical Feed-Forward (FF) Neural Networks (NN), showing that one of the five studied models outperforms the others. One of the limitations of this attempt seems to be the very short time horizon considered (1 h). On the contrary, several regression-based trees, were considered by Fouilloy et al. (2018) [39] for prediction horizons from 1 to 6 hours ahead. They compared the performance of these ML approaches with the more traditional ARMA and Multi-layer Perceptrons (MLP), over three different data sets recorded in France and Greece. The performances, in terms of nRMSE and MAE, seem to point out that among the methods considered there is no prevalent one and their ranking depends on the variability of the data set. The recent paper [40] provides a large bibliography on neural network applications to solar power forecast. The paper by Husein et al. [41] also analyses several preceding studies and compares the results of LSTM-RNN and FF networks for hourly ahead solar irradiance forecasting using only indirect weather measures (namely, dry-bulb temperature, dew-point temperature, and relative humidity), even if irradiance data are still needed for network training. Various deep learning approaches, namely Gated Recurrent Units (GRUs), LSTM, RNN and FF neural networks were considered by Aslam et al. [42] to represent the statistical relations of solar radiation data over a very long horizon (one year ahead).

What clearly emerges is that the very large majority of these studies concentrate on a single (very short to very long) forecast step. However, the implementation of advanced control methods, such as model predictive control requires the availability of forecasted values for some steps ahead. This paper thus contributes to the field by clearly defining the possible model structures for a multi-step prediction and comparing the performances of feed-forward and LSTM recurrent networks in the problem of multi-step ahead forecasting of the solar irradiance at a station in Northern Italy, which is characterized by a relatively high annual and daily variability. The forecasting approaches are rigorously evaluated using appropriate indices and compared with standard benchmarks: the clear sky irradiance and two persistent predictors.

Furthermore, the adaptation capability of the neural predictors is investigated evaluating their accuracy on three locations with different geographical conditions with respect to that used for the training.

The paper is organized as follows: In Section 2, the basic characteristics of the multi-step feed-forward and LSTM forecasting model are presented together with the different ways of formulating and approaching the prediction task. Some features of solar irradiance time series are pointed out through statistical and spectral analysis. In addition, the benchmark predictors and the evaluation metrics are presented. Section 3 shows the results of solar irradiance forecasting performed by autoregressive neural models, along with the domain adaptation capability of such predictors. Section 4 reports a discussion on the intrinsic differences in the identification phase of feed-forward and LSTM neural predictors. In Section 5, some concluding remarks are drawn.

#### **2. Materials and Methods**

#### *2.1. Structure of the Multi-Step Neural Predictors*

The time-series forecasting task is inherently dynamic because the variable to be predicted can be seen as the output of a dynamical system. Recurrent neural networks, which are themselves dynamical systems due to the presence of one or more internal states, are a natural choice. On the other hand, the more traditional feed-forward (FF) nets are static and can reproduce any arbitrarily complex mapping from the input to the output space. Static models can be adapted to deal with dynamic tasks in different ways.

Thinking for instance to an hourly autoregressive forecast, we can consider three different ways to predict the *h* step-ahead hourly sequence - <sup>ˆ</sup>*I*(*<sup>t</sup>* + *<sup>h</sup>* <sup>−</sup> <sup>1</sup>), ... , <sup>ˆ</sup>*I*(*<sup>t</sup>* + <sup>1</sup>), <sup>ˆ</sup>*I*(*t*) . basing on the observations of the last *<sup>d</sup>* steps (i.e., [*I*(*<sup>t</sup>* <sup>−</sup> <sup>1</sup>), *<sup>I</sup>*(*<sup>t</sup>* <sup>−</sup> <sup>2</sup>), ... , *<sup>I</sup>*(*<sup>t</sup>* <sup>−</sup> *<sup>d</sup>*)]) with a FF network. Here, the symbols <sup>ˆ</sup>*I*(*t*) and *<sup>I</sup>*(*t*) indicate the forecasted and observed solar irradiance, respectively.

#### 2.1.1. The Recursive (*Rec*) Approach

*Rec* approach repeatedly uses the same one-step-ahead model, assuming as input for the second step the forecast obtained one step before [43]. Only one model, *fRec*, is needed in this approach, whatever the length of the forecasting horizon and this explains the large diffusion of this approach.

$$\dot{I}(t) = f\_{\text{Rec}}(I(t-1), \ I(t-2), \dots, \ I(t-d)) \tag{1}$$

$$I(t+1) = f\_{\text{Rec}}(l(t), I(t-1), \dots, I(t-d+1))\tag{2}$$

Equations (1) and (2) show the first two steps of the procedure, which has to be repeated *h* times.

#### 2.1.2. The Multi-Model (*MM*) Approach

Although the *Rec* model structure is the most natural one also for *h*-steps ahead prediction, other alternatives are possible and in this work, we have explored two new structures, here respectively referred as multi-model (*MM*) and multi-output (*MO*), which are illustrated below. The peculiarity of the *MM* model is that, unlike the *fRec*, its parameters are optimized for each prediction time horizon, following the framework expressed by Equations (3)–(5). *MM* thus requires *h* different models to cover the prediction horizon.

It is trivial to note that for *h* = 1 the model works exactly as a *Rec* model, but it represents a generalization when *h* > 1.

$$I(t) = f\_{\text{MM},1}(I(t-1), I(t-2), \dots, I(t-d))\tag{3}$$

$$\hat{I}(t+1) = f\_{\text{MM,2}}(I(t-1), I(t-2), \dots, I(t-d))\tag{4}$$

$$\hat{I}(t+h-1) = f\_{\text{MM},h}(I(t-1), I(t-2), \dots, I(t-d))\tag{5}$$

#### 2.1.3. The Multi-Output (*MO*) Approach

The *MO* model [44] expressed by Equation (6) can be considered a trade-off between the *Rec* and the *MM* since it proposes to develop a single model with a vector output composed by the *h* values predicted for each time step.

$$\begin{bmatrix} \hat{I}(t+h-1), \ \ldots, \hat{I}(t+1), \hat{I}(t) \end{bmatrix} = f\_{\text{MO}}(I(t-1), I(t-2), \ \ldots, I(t-d))\tag{6}$$

The number of parameters of the *MO*, is a bit higher than the *Rec* (a richer output layer has to be trained), but much lower than the *MO*. However, with respect to the *Rec* it could appear, at least in principle, more accurate, since it allows its performance to be optimized over a higher dimensional space. A *MO* in comparison to the *Rec* offers the user a synoptic representation of the various prediction scenarios and therefore can be more attractive from the application point of view. Furthermore, in case the *Rec* model has some exogenous inputs, one must limit the forecasting horizon *h* to the maximum delay between the external input and the output. Nothing changes, on the contrary, for the other two approaches.

#### *2.2. Model Identification Strategies*

Neural networks are among the most popular approaches for identifying nonlinear models starting from time series. Probably this popularity is due to the reliability and efficiency of the optimization algorithms, capable of operating in the presence of many parameters (many hundreds or even thousands, as in our case). Two different kinds of neural networks have been used in this work. One of the purposes behind this work was in fact to explore, on rigorous experimental bases, if more complex but more promising neural network architectures, such as the LSTM neural networks, could offer greater accuracy for predicting solar radiation, compared to simpler and more consolidated architectures such as feed-forward (FF) neural networks.

LSTM cells were originally proposed by Hochreiter and Schmidhuber in 1997 [45] as a tool to retain a memory of past errors without increasing the dimension of the network explosively. In practice, they introduced a dynamic within the neuron that can combine both long- and short-term memory. In classical FF neural networks, the long-term memory is stored in the values of parameters that are calibrated on past data. The short-term memory, on the other side, is stored in the autoregressive inputs, i.e., the most recent values. This means that the memory structure of the model is fixed. LSTM networks are different because they allow balancing the role of long and short-term in a continuous way to best adapt to the specific process.

Each LSTM cell has three gates (input, output, and forget gate) and a two-dimensional state vector *s*(*t*) whose elements are the so-called cell state and hidden state [46,47]. The cell state is responsible for keeping track of the long-term effects of the input. The hidden state synthesizes the information provided by the current input, the cell state, and the previous hidden state. The input and forget gates define how much a new input and the current state, respectively, affect the new state of the cell, balancing the long- and short-term effects. The output gate defines how much the output depends on the current state.

Recurrent nets with LSTM cells appear particularly suitable for solar radiation forecast since the underlying physical process is characterized by both slow (the annual cycle) and fast (the daily evolution) dynamics.

An LSTM network can be defined as in Equations (7)–(8), namely as a function computing an output and an updated state at each time step.

$$\left[\hat{l}(t), s(t)\right] = f\_{\text{LSTM}}(l(t-1), s(t-1))\tag{7}$$

$$\left[\mathbb{I}(t+1), \mathbf{s}(t+1)\right] = f\_{\text{LSTM}}(\mathbb{I}(t), \mathbf{s}(t)) \tag{8}$$

The predictor iteratively makes use of *fLSTM* starting from an initial state *s*(*t* − *d* − 1) of LSTM cell and processes the input sequence [*I*(*t* − 1), *I*(*t* − 2), ... , *I*(*t* − *d*)], updating the neurons internal states in order to store all the relevant information contained in the input. The LSTM net is thus able to consider iteratively all the inputs and can directly be used to forecast several steps ahead (*h*) at each time *t*. In a sense, these recurrent networks unify the advantages of the FF recursive and multi-output approaches mentioned above. They explicitly take into account the sequential nature of the time series as the FF recursive, and are optimized on the whole predicted sequence as the FF multi-output.

The four neural predictors presented above have been developed through an extensive trial-and-error procedure implemented on an Intel i5-7500 3.40 GHz processor with a GeForce GTX 1050 Ti GPU 768 CUDA Cores. FF nets have been coded using the Python library Keras with Tensorflow as backend [48]. For LSTM networks, we used Python library PyTorch [49]. The hyperparameters of the neural nets were tuned by systematic grid search together with the number of neurons per layer, the number of hidden layers, and all the other features defining the network architecture.

#### *2.3. Preliminary Analysis of Solar Data*

The primary dataset considered for this study was recorded from 2014 to 2019 by a Davis Vantage 2 weather station installed and managed by the Politecnico di Milano, Italy, at Como Campus. The station is continuously monitored and checked for consistence as part of the dense measurement network of the Centro Meteorologico Lombardo (www.centrometeolombardo.com). Its geographic coordinates are: *Lat* = 45.80079, *Lon* = 9.08065 and *Elevation* = 215 m a.s.l. Together with the solar irradiance *I*(*t*), the following physical variables are recorded every 5 min: air temperature, relative humidity, wind speed and direction, atmospheric pressure, rain, and the UV index. However, as explained in Section 2.2, the current study only adopts purely autoregressive models; namely, the forecasted values of solar irradiance ˆ*I*(*t*) are computed only based on preceding values.

A detail of the time series recorded at 00:00 each hour is shown in Figure 1a. We can interpret this time series as the sum of three different components: the astronomical condition (namely the position of the sun), that produces the evident annual cycle; the current meteorological situation (the attenuation due to atmosphere, including clouds); and the specific position of the receptor that may be shadowed by the passage of clouds in the direction of the sun. The first component is deterministically known, the second can be forecasted with a certain accuracy, while the third is much trickier and may easily vary within minutes without a clear dynamic.

**Figure 1.** Hourly solar irradiance time series (**a**), compared with clear sky values for a few specific days (**b**).

The expected global solar radiation in average clear sky conditions *IClsky*(*t*) (see Figure 1b) was computed by using the Ineichen and Perez model, as presented in [50] and [51]. The Python code that implements this model is part of the SNL PVLib Toolbox, provided by the Sandia National Labs PV Modeling Collaborative (PVMC) platform [52].

#### 2.3.1. Fluctuation of Solar Radiation

Solar radiation time series, as well as other geophysical signals, belong to the class of the so-called 1/*f* noises (also known as pink noise), i.e., long-memory processes whose power density spectrum exhibit a slope, α, ranging in [0.5,1.5]. In other words, they are random processes lying between white noise processes, characterized by α = 0, and random walks, characterized by α = 2 (see, for instance [53]). Indeed, the slope of hourly solar irradiance recorded at Como is about α = 1.1 (Figure 2), while the daily average time series exhibits a slope of about α = 0.6, meaning that solar radiation at daily scale is more similar to a white process.

**Figure 2.** Power spectral density of the hourly solar irradiance (grey) and trend 1/f <sup>α</sup> characterized by α = 1.1 (black).

Figure 2 also shows that the power spectral density has some peaks corresponding to periodicity of 24 hours (1.16·10−<sup>5</sup> Hz) and its multiples.

#### 2.3.2. Mutual Information

To capture the nonlinear dependence of solar irradiance time series from its preceding values, we computed the so-called mutual information *M*(*k*), defined as in Equation (9) [54].

$$M(k) = -\sum\_{i,j} p\_{ij}(k) \cdot \ln \frac{p\_{ij}(k)}{p\_i p\_j} \tag{9}$$

In this expression, for some partition of the time series values, *pi* is the probability to find a time series value in the *i*-th interval, and *pij*(*k*) is the joint probability that an observation falls in the *i*-th interval, and the observation *k* time steps later falls into the *j*-th interval. The partition of the time series can be made with different criteria, for instance by dividing the range of values between the minimum and maximum in a predetermined number of intervals or by taking intervals with equal probability distribution [55]. In our case, we chose to divide the whole range of values into 16 intervals. The normalized mutual information of solar irradiance time series at Como is shown in Figure 3. In the case of Como hourly time series, it gradually decays reaching zero after about six lags. Moreover, it can

be observed that the mutual information for the daily values decays more rapidly, thus confirming the greatest difficulty in forecasting solar radiation at a daily scale.

**Figure 3.** Normalized mutual information of hourly and daily solar irradiance at Como.

#### *2.4. Benchmark Predictors of Hourly Solar Irradiance*

Multi-step ahead forecasting of the global solar irradiance *I*(*t*) at hourly scale was performed by using models of the form (1–8) defined above.

The performance of such a predictor has been compared with that of some standard baseline models. More specifically, we computed:


#### *2.5. Performance Assessment Metrics*

The performances of the various predictors were assessed by computing the following error indices [56]: *Bias* (10), Mean Absolute Error—*MAE* (11), Root Mean Square Error—*RMSE* (12), and *R2*, also known as Nash-Sutcliffe Efficiency—*NSE* (13).

$$Bias = \frac{1}{T} \sum\_{t=1}^{T} \left( I(t) - \dot{I}(t) \right) \tag{10}$$

$$MAE = \frac{1}{T} \sum\_{t=1}^{T} \left| I(t) - \dot{I}(t) \right| \tag{11}$$

$$RMSE = \sqrt{\frac{1}{T} \sum\_{t=1}^{T} \left( I(t) - \mathbb{I}(t) \right)^2} \tag{12}$$

$$NSE = 1 - \frac{\sum\_{t=1}^{T} \left( I(t) - \hat{I}(t) \right)^2}{\sum\_{t=1}^{T} \left( I(t) - \tilde{I} \right)^2} \tag{13}$$

where *T* is the length of the time series, while *I* is the average of the observed data.

As concerning to the NSE, an index originally developed for evaluating hydrological models [57], it is worth bearing in mind that it can range from −∞ to 1. An efficiency of 1 (*NSE* = 1) means that the model perfectly interprets the observed data, while an efficiency of 0 (*NSE* = 0) indicates that the model predictions are as accurate as the mean of the observed data. It is worth stressing here that, in general, a model is considered sufficiently accurate if *NSE* > 0.6.

Regardless of what performance index is considered, it is worth noticing that the above classical indicators may overestimate the actual performances of models when applied to the complete time series. When dealing with solar radiation, there is always a strong bias due to the presence of many zero values. In the case at hand, they are about 57% of the sample due to some additional shadowing of the nearby mountains and to the sensitivity of the sensors. When the recorded value is zero, also the forecast is zero (or very close) and all these small errors substantially reduce the average errors and increase the *NSE*. Additionally, forecasting the solar radiation during the night is useless, and the power network dispatcher may well turn the forecasting model off. In order to overcome this deficiency, which unfortunately is present in many works in the current literature, and allow the models' performances to be compared when they may indeed be useful, we have also computed the same indicators considering only values above 25 Wm−<sup>2</sup> (*daytime* in what follows), a small value normally reached before dawn and after the sunset. These are indeed the conditions when an accurate energy forecast may turn out to be useful.

Since the *Clsky* represents what can be forecasted even without using any information about the current situation, it can be assumed as a reference, and the skill index *Sf* (14) can be computed to measure the improvement gained using the *f* LSTM and the *f* FF models:

$$S\_f = 1 - \frac{RMSE\_f}{RMSE\_{Clyy}} \tag{14}$$

#### **3. Results**

#### *3.1. Forecasting Perfomances*

The comparison of the multi-step forecast of solar irradiance with LSTM and FF networks was performed setting the delay parameter *d* to 24, despite the mutual information indicated that it would be enough to assume *d* = 6. This choice is motivated by the intrinsic periodicity of solar radiation at an hourly scale [54]. We have experimentally observed that this choice gives more accurate estimates for all the models considered. This probably helps the model in taking into account the natural persistence of solar radiation (see the comments about the performance of the *Pers24* model, below). The length of the forecasting horizon *h,* representing the number of steps ahead we predict in the future, was varied from 1 to 12. Data from 2014 to 2017 were used for network training, 2018 for validating the architecture, and 2019 for testing.

The average performances computed on the first 3 hours of the forecasting horizon of the *Pers*, *Pers24*, and *Clsky* models for the test year 2019 are shown in Tables 1 and 2. The *Pers* predictor performance rapidly deteriorate when increasing the horizon. They are acceptable only in the short term: after 1 h, the *NSE* is equal to 0.79 (considering whole day samples) and 0.59 (daytime samples only), but after two hours *NSE* decreases to 0.54 (whole day) and 0.14 (daytime only), and six steps ahead the *NSE* becomes −0.93 (whole day) and −1.64 (daytime). The *Pers24* and *Clsky* preserve the same performances for each step ahead since they are independent on the horizon. Such models inherently take into account the presence of the daily pseudo-periodic component, which affects hourly global solar radiation.


**Table 1.** Average performances of the *Pers*, *Pers24*, and *Clsky* predictors on the first 3 hours (whole day).

**Table 2.** Average performances of the *Pers*, *Pers24*, and *Clsky* predictors on the first 3 hours (daytime samples only).


The *Pers24* predictor appears to be superior to the *Clsky* (lower error indicators, higher *NSE*) confirming that the information of the last 24 h is much more relevant for a correct prediction than the long-term annual cycle. Indeed, the sun position does not change much between one day and the following, and the meteorological conditions have a certain, statistically relevant tendency to persist. Additionally, the *Pers24* predictor is the only one with practically zero bias, since the small difference that appears in the first column of Table 1 is simply due to the differences between 31/12/2018 (which is used to compute the predicted values of 1/1/2019) and 31/12/2019. The clear sky model that operates by definition in the absence of cloud cover, overestimates the values above the threshold by 89.86 Wm<sup>−</sup>2, on average.

From Table 2, it appears that *Pers*, *Pers24* and *Clsky*, are not reliable models, on average, especially if the *NSE* is evaluated by using daytime samples only.

Figure 4 reports the results obtained with the three different FF approaches (Figure 4a–c) and the LSTM forecasting model (Figure 4d) in terms of *NSE* (both in the whole day and in daytime samples only).

**Figure 4.** NSE of hourly solar irradiance at Como obtained with recursive FF (**a**), multi-output FF (**b**), multi-model FF (**c**), and LSTM (**d**). Solid line represents the performance in the whole day, dashed line in daytime only. Values refer to the test year 2019.

Generally speaking, Figure 4 shows that all the considered neural predictors exhibit an NSE which reaches an asymptotic value around six steps ahead. This is coherent with the previous analysis about the mutual information (see Figure 3), which, at an hourly scale, is almost zero after six lags.

If the evaluation is carried out considering whole day samples, all the models would have to be considered reliable enough since *NSE* is only slightly below 0.8, even for prediction horizons of 12 h. On the contrary, if the evaluation is made considering daytime samples only, it clearly appears that models are reliable for a maximum of 5 h ahead, as for higher horizons the *NSE* value typically falls below 0.6. Therefore, removing the nighttime values of the time series is decisive for a realistic assessment of a solar radiation forecasting model, that would otherwise be strongly biased.

Going in deeper details, the following considerations can be made.

The FF recursive approach performs slightly worse, particularly as measured by *NSE* and specifically after a forecasting horizon of 5 h. The FF multi-output and multi-model approaches show performances similar to the LSTM. Additionally, one can note that the performance regularly decreases with the length of the horizon for the FF recursive approach and LSTM net, since they explicitly take into account the sequential nature of the task. Conversely, the FF multi-output and the multi-model ones have irregularities, particularly the latter being each predictor for each specific time horizon completely independent on those on a shorter horizon. If perfect training was possible, such irregularities might perhaps be reduced, but they cannot be completely avoided, particularly on the test dataset, because they are inherent to the approach that considers each predicted value as a separate task.

For all the considered benchmarks and neural predictors, the difference between the whole time series (average value 140.37 Wm<sup>−</sup>2) and the case with a threshold (daytime only), that excludes nighttime values (average 328.62 Wm<sup>−</sup>2), emerges clearly, given that during all nights the values are zero or close to it, and thus the corresponding errors are also low.

FF nets and LSTM perform similarly also considering the indices computed along the first 3 h of the forecasting horizon as shown in Tables 3 and 4, for the whole day and daytime, respectively.


**Table 3.** Average performances of FF and LSTM predictors on the first 3 hours (whole day).



All the neural predictors provide a definite improvement in comparison to the *Pers*, *Pers24,* and *Clsky* models. Looking, for instance, at the *NSE* the best baseline predictor is the *Pers24*, scoring 0.63 (whole day) and 0.28 (daytime only). The corresponding values for the neural networks exceed 0.86 and 0.73, respectively.

An in-depth analysis should compare the neural predictors performance at each step with the best benchmark for that specific step. The latter can be considered as an ensemble of benchmarks composed by the the *Pers* model, the most performing one step ahead (*NSE* equal to 0.79), and the *Pers24* on the following steps (*NSE* equal to 0.63 for *h* from 2 to 12). Under this perspective, the neural nets clearly outperform the considered baseline since their *NSE* score varies from 0.90 to 0.75 (see the solid lines in Figure 4 referring to the whole day). The same analysis performed excluding nighttime values leads to quite similar results, confirming that the neural networks always provide a performance much higher than the benchmarks considered here.

An additional way to examine the model performances is presented in Table 5. We report here the NSE of the predictions of the LSTM network on three horizons, namely 1, 3, and 6 hours ahead. The sunlight (i.e., above 25 W/m2) test series is partitioned into three classes: cloudy, partly cloudy and sunny days, which constitute about 30, 30, and 40% of the sample, respectively. More precisely, cloudy days are defined as those when the daily average irradiance is below 60% of the clear sky index and sunny days those that are above 90% (remember that the clear sky index already accounts for the average sky cloudiness).

**Table 5.** LSTM performances in terms of NSE for cloudy, partly cloudy and sunny days (daytime samples only).


It is quite apparent that the performance of the model decreases consistently from sunny, to partly cloudy, to cloudy days. This result is better illustrated in Figure 5 where the 3-hour-ahead predictions are shown for three typical days. In the sunny day, on the right, the process is almost deterministic (governed mainly by astronomical conditions), while the situation is completely different in a cloudy day. In the last case, the forecasting error is of the same order of the process (NSE close to zero) and it can be even larger at 6 hours ahead. This determines the negative NSE value shown in Table 5.

**Figure 5.** Three-hour-ahead LSTM predictions versus observations in three typical days with different cloudiness: a cloudy day (**a**), a partly cloudy day (**b**), and a sunny day (**c**). Values refer to the test year (2019).

#### *3.2. Domain Adaptation*

Besides the accuracy of the forecasted values, another important characteristic of the forecasting models is their generalization capability, often mentioned as domain adaptation in the neural networks literature [58]. This means the possibility of storing knowledge gained while solving one problem and applying it to different, though similar, datasets [59].

To test this feature, the FF and LSTM networks developed for the Como station (source domain) have been used, without retraining, on other sites (target domains) spanning more than one degree of latitude and representing quite different geographical settings: from the low and open plain at 35 m a.s.l. to up the mountains at 800 m a.s.l. In addition, the test year has been changed because solar radiation is far from being truly periodical and some years (e.g., 2017) show significantly higher values than others (e.g., 2011). This means quite different solar radiation encompassing a difference of about 25% between yearly average values.

Figure 6 shows the *NSE* for the multi-output FF and LSTM networks for three additional stations. All the graphs reach somehow a plateau after six steps ahead, as suggested by the mutual information computed on Como station, and the differences between FF and LSTM networks appear very small or even negligible in almost all the other stations. Six hours ahead, the difference in *NSE* between Como,

for which the networks have been trained, in the test year (2019) and Bema in 2017, which appears to be the most different dataset, is only about 3% for both FF models and LSTM.

**Figure 6.** *NSE* of hourly solar irradiance forecast at Casatenovo (**a**–**d**) in 2011, Bigarello (**e**–**h**) in 2016, and Bema (**i**–**l**) in 2017. Each column is relative to a different neural predictor. Solid line represents the performance in the whole day, dashed line in daytime only.

As a further trial, both FF models and LSTM have been tested on a slightly different process, i.e., the hourly average solar radiation recorded at the Como station. While the process has the same average of the original dataset, the variability of the process is different since its standard deviation decreased of about 5%. The averaging process indeed filters the high frequencies. Forecasting results are shown in Figure 7. Additionally for this process, the neural models perform more or less as for the hourly values for which they have been trained. The accuracy of both LSTM and FF networks improves by about 0.02 (or 8%) in terms of standard *MAE* and slightly less in terms of *NSE*, in comparison to the original process. For a correct comparison with Figure 4, however, it is worth bearing in mind that the 1-hour-ahead prediction corresponds to (*t* + 2) in the graph, since the average computed at hour (*t* + 1) is that from (*t*) to (*t* + 1) and, thus, includes values that are only 5 minutes ahead of the instant at which the prediction is formulated.

An ad-hoc training on each sequence would undoubtedly improve the performance, but the purpose of this section is exactly to show the potential of networks calibrated on different stations, to evaluate the possibility of adopting a predictor developed elsewhere when a sufficiently long series of values is missing. The forecasting models we developed for a specific site could be used with

acceptable accuracy for sites were recording stations are not available or existing time series are not long enough. This suggest the possibility of developing a unique forecasting model for the entire region.

**Figure 7.** *NSE* of hourly average solar irradiance at Como obtained with recursive FF (**a**), multi-output FF (**b**), multi-model FF (**c**), and LSTM (**d**). Solid line represents the performance in the whole day, dashed line in daytime only. Values refer to the test year (2019).

#### **4. Some Remarks on Network Implementations**

The development of many successful deep learning models in various applications has been made possible by three joint factors. First, the availability of big data, which are necessary to identify complex models characterized by thousands of parameters. Second, the intuition of making use of fast parallel processing units (GPUs) able to deal with the high computational effort required. Third, the availability of an efficient gradient-based method to train these kinds of neural networks.

This latter are the well-known backpropagation (BP) techniques, which allow efficient computing of the gradients of the loss function with respect to each model weight and bias. To apply the backpropagation of the gradient, it is necessary to have a feed-forward architecture (i.e., without self-loops). In this case, the optimization is extremely efficient since the process can be entirely parallelized, exploiting the GPU.

When the neural architecture presents some loops, as it happens in recurrent cells, the BP technique has to be slightly modified in order fit the new situation. This can be done by unfolding the neural networks through time, to remove self-loops. This extension of BP technique is known as backpropagation through time (BPTT) in the machine learning literature. The issue with BPTT is that the unfolding process should in principle lasts for an infinite number of steps, making the technique useless for practical purposes. For this reason, it is necessary to limit the number of unfolding, considering only the time steps that contain useful information for the prediction (in this case, we say that the BPTT is truncated). As it is easy to understand, BPTT is not as efficient as the traditional BP, because it is not possible to fully parallelize the computation. As a consequence, we are not able to fully exploit the GPU's computing power, resulting in a slower training. The presence of recurrent units also produces much complex optimization problems due to the presence of a significant number of local optima [60].

Figure 8 shows the substantial difference in the evolution of the training process. As usual, the mean of the quadratic errors of the FF network slowly decreases toward a minimum while this function shows sudden jumps followed by several epochs of stationarity in the case of LSTM. The training algorithm of LSTM can avoid being trapped for too many epochs into local minima, but on the other side, these local minima are more frequent.

**Figure 8.** Evolution of the *MSE* across the training epochs for recursive FF (**a**) and LSTM (**b**) predictors.

For the sake of comparison, we trained the four neural architectures using the same hyperparameters grid, considering the ranges reported in Table 6. As training algorithm, we used the Adam optimizer. Each training procedure has been repeated three times to avoid the problems of an unlucky weights initialization.


Decay rate 0–10−<sup>4</sup> 0 10−<sup>4</sup> 10−<sup>4</sup> 10−<sup>4</sup> Batch size 128–512 512 128 512 512

**Table 6.** Hyperparameters values considered in the grid-search tuning process for FF and LSTM predictors.

While performing similarly under many viewpoints, the FF and LSTM architectures show significant differences if we consider the sensitivity to the hyperparameters values. Figure 9 shows the sensitivity bands obtained for each architecture. The upper bound represents, for each step ahead, the best *NSE* score achieved across the hyperparameters combinations. The cases in which the optimization process fails due to a strongly inefficient initialization of the weights have been excluded.

**Figure 9.** Sensitivity bands obtained with FF-recursive (**a**), FF-multi-output (**b**), FF-multi-model (**c**), and LSTM (**d**) on the 12-step-ahead prediction. Values refer to the test year (2019).

The variability of performances of LSTM across the hyperparameter space is quite limited if compared with that of FF architectures. This is probably because the LSTM presents a sequential structure and is optimized on the whole 12-step-forecasting horizon. The recursive FF model is identified as the optimal one-step ahead predictor, and thus there are some cases characterized by poor performances in the rest of the horizon. The multi-output and multi-model FF seem to suffer of the same problem because, as already pointed out, they predict the values at each time step as independent variables.

#### **5. Conclusions**

The availability of accurate multi-step ahead forecast of solar irradiance (and, hence, power) is of extreme importance for the efficient balancing and management of power networks since they allow the implementation of accurate and efficient control procedures, such as model predictive control.

The results reported in this study further confirm the well-known accuracy of FF and LSTM networks for the above purpose and, more in general, in predicting time series related to environmental variables. Another interesting conclusion is that among the *Rec*, *MM* and *MO* models, the *Rec* is the one that exhibits the lowest performances. A rough explanation probably lies in the fact that its parameters are optimized over a time horizon of 1 step, but then artificially used for a longer horizon, thus propagating the error. Therefore one of the merits of this study is to clarify that a common practice, namely to identify a model of the type *x*(*t* + 1) = *f*(*x*(*t*)), to then predict *x*(*t* + *h*) is not the best choice. To this end, the proposed *MM* and *MO* may represent alternatives that are more appropriate.

However, such good performances are obtained at a cost in terms of data and time required to train the network. In actual applications, one has to trade these costs versus the improvement in the precision of the forecasting. In this respect, the solution which appears to be a trade-off between accuracy and complexity seems to be the *MO*. Indeed, it can reach a very good performance with a minimum increment of the number of parameters compared to the recursive approach. The *MM* approach performs slightly better than the *MO* one but requires a different training (and possibly, a different architecture) for each forecasting horizon, which, in the present study, means training over a million parameters.

In more general terms, the selection of the forecasting model should be made by looking at the comparative advantages that a better precision provides versus the effort to obtain such a precision. Though the economic cost and the computation time required to synthesize even a very complex LSTM network are already rather low and still decreasing, one may also consider adopting a classical FF neural network model, which outperforms the traditional *Pers24* model and is much easier to train with respect to the corresponding LSTM.

Another of the peculiarities of this work has been showing how performance indices are strongly affected by the presence of null values and, in this respect, nighttime samples should be removed for a correct assessment of model performances.

Both FF and LSTM networks developed in this study have proved to be able to forecast solar radiation in other stations of a relatively wide domain with a minimal loss of precision and without the need to retrain them. This opens the way to the development of county or regional predictors, valid for ungauged locations with different geographical settings. Such precision may perhaps be further improved by using other meteorological data as input to the model, thus extending the purely autoregressive approach adopted in this study.

**Author Contributions:** Conceptualization, G.G., G.N. and M.S.; methodology, G.G., G.N. and M.S.; software, M.S.; validation, G.G., G.N. and M.S.; data curation, G.G., G.N. and M.S.; writing—original draft preparation, G.G., G.N. and M.S.; writing—review and editing, G.G., G.N. and M.S.; visualization, M.S.; supervision, G.G. and G.N. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
