*2.2. Data Processing (Sliding Window)*

While using the sliding window data processing approach for CNNs, a time series dataset is split as follows. The input data column is split into vectors consisting of an equal number of time steps. So, assuming the input data has 10 time steps, it is split into 5 vectors consisting of 2 time steps each. Then, these vectors are mapped to a label that is an output value from the training data. In this way, 5 vectors are mapped to 5 output values and 5 values are dropped, resulting in a reduced computational burden during the training of the model. The algorithm for the sliding window approach is presented in Algorithm 1.


While a general definition of the sliding window algorithm is presented here, every CNN model needs data to be prepared according to its structure. The sliding window for the CNN model in this study is applied to multivariate (the presence of more than one variable for every time step) time series data. In this case, every window determined by the algorithm has 2-time steps and its associated variables mapped to one output. The multi-headed CNN has 4 convolutional layers for every available input variable, hence the input time series is split into 4 univariate (one variable per time step) time series for each convolutional layer. Then, the sliding window algorithm is applied to each univariate series, and every window determined by the algorithm has 2-time steps and its associated variable mapped to an output.

The CNN-LSTM model reads input data in a different manner. In this case, the first step involves the application of the sliding window, where every window determined has 4-time steps, and then it is reshaped into 2 sub sequences containing associated variables and is mapped to outputs. The window is applied to a multivariate time series data.

#### *2.3. Evaluation Metrics*

The evaluation metrics chosen for this study were chosen based on recommendations of studies and reports in the field of solar PV output forecasting [6,14]. The metrics are the Root Mean Square Error (RMSE), Mean absolute error (MAE), and Mean Bias Error (MBE). RMSE is a metric that is widely used in forecast studies. According to [29], it is suitable for such data since it has the tendency to punish the largest errors with the largest effect, which the MAE and the MBE are unable to do. MAE is calculated as the average of the forecast errors. The MBE also calculates the average forecast errors but does not take in the absolute magnitude alone, this gives information regarding whether the model has a tendency to over or under forecast. The metrics are as follows:

$$RMSE = \sqrt{MSE} = \sqrt{\frac{1}{N} \sum\_{i=1}^{N} e\_i^2} \tag{7}$$

$$MAE = \frac{1}{N} \sum\_{i=1}^{N} |c\_i| \tag{8}$$

$$MBE = \frac{1}{N} \sum\_{i=1}^{N} c\_i \tag{9}$$

$$\mathcal{e}\_i = \mathcal{y}\_{i(forecast)} - \mathcal{y}\_{i(observed)} \tag{10}$$

where *yi*(*f orecast*) and *yi*(*observed*) represent the forecasted and observed observations at the *i th* time step. *ei* is the error at *i th* time step. *i* = 1, ...... , *N* represents all the time steps within the data.

The evaluation metrics presented in the results section were calculated on the basis of original data after normalized prediction values were converted back using the inverse of the min–max scaling algorithm presented in Equation (1).

#### **3. Results**

All models were built on PYTHON using jupyter notebook. The deep learning tools that were used are TensorFlow and KERAS where the models were assembled. Additionally, Sci-kit learn and other basic Python libraries were used for data processing and data handling. The computer used for this purpose was equipped with an Intel®Core™ i5-4210 U CPU@ 1.70 GHz 2.40 GHz processor with an installed 8 GB of RAM operating Windows 10. It was also equipped with a 2048 MB GeForce 840M Nvidia graphics card. The training times for the CNN, Multi-CNN, and the CNN-LSTM models were 1364 s, 1657 s, and 3534 s, respectively. All architectures used the same data stretching over 6 years for model training and were trained for 100 epochs. The ARMA and MLR models were fit quite instantaneously, providing an advantage over the CNN based models with regard to the computational cost involved in model fitting. Once the models are fit, they are quite easy to use for the purposes of predictions. There is not any significant difference in terms of ease of usage amongst the statistical and CNN based techniques. Both models would need re fitting from time to time in order to take into account the changes in climate.

The data used for training the models were 6+ years' worth of data recorded from 1 March 2012 up to 31 December 2018. The validation split (test/train split) used was 20%, meaning that 80% of the data was used to train the CNN models and 20% was used to test them. The evaluation metrics obtained for 1 h, 1 day, and 1 week for both summer and winter months were obtained by testing the model for the months of July and December in 2019, which was unknown to the training models. There was no validation split for the MLR and ARMA models. They were fit on to the whole data and were tested with the July and December data of 2019, same as for the CNN models.

Figure 4 represents the time series data used in this study for the ARMA model without the validation split, it is quite evident that data has seasonality where the peaks in power output are observed during the summer. Hence, the periodicity for this study would be taken as 12 months. A look at the ACF with 20 lags indicates significant correlation. In fact, a clear pattern is visible when the lags are further increased to 60 and above. The PACF of the data also does not show any large cut-offs after the initial value hence the time series is non-stationary and has to be converted to a stationary time series before the ARMA model is fit to the data.

**Figure 4.** Solar panel output data with an auto correlation function (ACF) and a partial auto correlation function (PACF) analysis.

Figure 5 presents the differentiated time series. It can be seen from its characteristic that it fluctuates around zero, which is a defining characteristic for a stationary signal. Furthermore, in comparison with Figure 4, it can be seen that the ACF is not significant and also does not possess a trend, which is also the case for the PACF. In both cases, there is a sharp cutoff at 12, indicating seasonality at 12, which is in line with the selection of seasonality or periodicity at 12.

The ADF test made with the differentiated signal resulted in a *p*-value of 0.001, which confirms that the signal is stationary. Now the ARMA model parameters can be determined since the ACF and PACF are negligible beyond lag 2 therefore *m* and *n* could have a maximum value of 2. In this study, the *m* and *n* are taken as 1 and 2 and the following ARMA model is obtained.

Table 1 presents the ARMA model parameters that are used to predict solar output values for an hour, 1 day, and 1 week. The model ignores the constant value due to its high *p*-value. The evaluation metrics for the model predictions are presented in Table 2, and a comparison of the predictions as a result of model application with other methods is shown later on. Figure 6 presents the manner in which an appropriate forecasting model is obtained by different CNN architectures used. Figure 6a represents the loss value that is optimized in every epoch for the multi-headed CNN structure. It can be observed that for this model there is not any improvement in reduction of the loss function over many epochs of training. After an initial drop in the loss value it remains a constant, which means that training the architecture for a small number of epochs is sufficient for an accurate model. Figure 6b represents the loss value minimization for a simple CNN structure. In contrast to the multi-headed CNN structure, the loss minimization is more gradual, yet in a small number of epochs, a satisfying model is obtained. It has been noticed during several trials that, in the simple CNN structure, the loss minimization keeps improving up to a 1000 epochs and more. However, the improvement in forecast accuracy is not significant vis-à-vis the time it takes to train the model for a high number of epochs.

**Figure 5.** Differentiated output with ACF and PACF analysis.

**Table 1.** Autoregressive moving average (ARMA) model Parameters.


∅1—AR coefficient 1, θ1—MA coefficient 1, θ2—MA coefficient 2.

**Figure 6.** Model fitting test and train loss minimization for (**a**) Multi-headed CNN (**b**) and Simple CNN structure.

Figure 7 represents the loss value minimization for the CNN-LSTM architecture. In comparison with Figure 6a,b, it can be observed that the model fitting takes slightly longer, yet it is completed with sufficient accuracy within 20 epochs. The model keeps improving with an increasing number of epochs, but it has been observed that, with a higher number of epochs (>500), the model tends to overfit with the loss curves of the test and train the data crossing over one another. For comparison purposes, keeping in mind the time for model fitting, 100 epochs was considered to be sufficient for all models.

**Figure 7.** Model fitting test and train loss minimization for the CNN-LSTM network.

Table 2 presents the various metrics, as described in the previous section, which help understand the accuracy of the forecasts. The metrics are calculated for one hour (h), 1 day (D), and 1 week (W) in order to understand its consistency over the short and medium term. Table 2 is specifically for the summer months and the week in question is the 1st week of July 2019.


**Table 2.** Forecast metrics for the short and long term during the summer months.

The RMSE, MAE, BIAS values are all in kW, hour—h, day—D, week—W.

It can be noticed that the values of RMSE, MAE, and the BIAS for the 1h forecast for all CNNbased methods are nearly the same. This is because the number of observations within an hour is just limited to four (due to 15 min time step) and the neural network methods take multiple inputs in order to make one prediction. In fact, for the CNN-LSTM model, the inputs needed are four for one prediction, hence the RMS, MAE, and the BIAS are the same for the 1 h forecast. It can be noticed from the BIAS value that, in the case of all methods being used except the ARMA model, there is a slight tendency to overpredict. The ARMA model has performed as good as any used CNN method and has the most accurate prediction followed by the CNN-LSTM.

For the 1-day forecasting (2 July 2019), it can be seen that the CNN-simple and the CNN-LSTM perform in a similar manner. The RMSE being around 0.051 kW. All methods have shown better performance than the MLR. The BIAS still indicates a tendency to over predict except for the ARMA model. It can be noticed that while the ARMA model provided very accurate results for the 1-h predictions, the RMSE value has increased considerably for the 1-day forecasts. The multi-headed CNN performs the worst amongst the CNN models.

For the 1-week forecasting (1st week of July) it can be noticed that the CNN-LSTM makes more accurate forecasts than the CNN-simple and multi-headed CNN models. In general, it can be noticed that with longer forecasts the accuracy metrics improve, indicating a general improvement or at least consistency in predictions for the CNN-based models. On the contrary for the ARMA model, the RMSE value has increased considerably from the 1-h and 1-day predictions. The BIAS in this case is to overpredict, except for the ARMA model.

Figure 8 presents the 1-day forecasting (2 July 2019) made by different algorithms explained previously. It can be inferred from the Figure that the most accurate forecasts for the day are made by the CNN and the CNN-LSTM models, wherein they almost overlap the actual values. These are followed by the multi-headed CNN model, which has a higher error in its predicted values, the ARMA model, which, despite being more accurate than MLR, underpredicts at moments of sharp changes.

**Figure 8.** Comparison of different methods for the 1-day forecast (summer).

Figure 9 represents the forecasts made by different approaches used for a week (the 1st week of July). It can be noticed that for most parts the forecasts form CNN-LSTM, CNN-simple (CNN), and Multi-CNN closely match the actual values, though, at the peak, some inaccuracy can be noticed with the multi-CNN forecasts. The ARMA model slightly underpredicts, especially during peak output. The MLR is also inaccurate at the peaks.

**Figure 9.** Comparison of different methods for the 1-week forecast (summer).

Table 3 provides the evaluation metrics of the models used during the winter months, it can be seen that the models have less accuracy for the 1-h predictions when compared to their own predictions during the summer. Again, the ARMA model performs with the highest accuracy for the 1-h predictions followed by the CNN-LSTM model. For the longer 1-day (28 December 2019) and 1-week periods (3rd week of December) it can be seen that the performance of all CNN-based models is very consistent with their performance during the summer months. The most accurate forecasting is made by the simple CNN model for 1-day (28 December 2019) forecasting, whereas for the 1-week forecasting it is the CNN-LSTM model. An intriguing observation between the summer and winter models is the fact that the difference between the RMSE and MAE values is higher during the winter period. The RMSE values are two times that of the MAE values and are higher for the CNN-based methods for the 1-day predictions, they are almost 3 times higher during the 1-week predictions, which is an indication that when errors are made during predictions they are higher in magnitude when compared to predictions in the summer because the RMSE has the inherent characteristic to give more weight to bigger errors.

**Table 3.** Forecast metrics for the short and long term during the winter months.


The RMSE, MAE, and BIAS values are all in kW, hour—h, day—D, week—W.

Figure 10 presents the forecasting made for 1 day (28 December 2019) during the winter. In comparison with the 1-day prediction for the summer, the ARMA and MLR forecast values have significantly improved, and all methods predict quite accurately.

**Figure 10.** Comparison of different methods for the 1-day forecast (winter).

Figure 11 represents the forecasting made by different approaches used in this study for the 3rd week of December, it can immediately be noticed that the output values are lower when compared with the summer week, with one day having almost no output. Except for the MLR, it can be seen that all models predict quite closely to the real values, with all of them underpredicting a little during peak power output.

**Figure 11.** Comparison of different methods for the 1-Week forecast (winter).

#### **4. Discussion**

This paper presents a forecasting approach using deep learning neural networks. The neural network structures, used primarily for image recognition, have been adapted to handle time series data with a seasonal characteristic. In order to make this possible, a data processing approach, such as the sliding window algorithm, has been used. A comparison between the performance of different possible structures of the neural network has been carried out along with a multiple linear regression and ARMA model. It has been noticed that the CNN-simple and the CNN-LSTM methods perform best for all 1-h, 1-day and 1-week predictions, with the CNN-LSTM providing better results on certain occasions. The ARMA model performed exceptionally for the 1-h forecasts. The forecasting was carried out for 1 h, 1 day, and 1 week with the function of electricity markets in mind. From the accuracy metrics such as RMSE, MAE, and BIAS it can be concluded that the forecasting algorithms perform satisfactorily. Since its performance has been tested in the short and medium term at the university location, it will be followed by rigorous testing at other locations in order to establish its applicability across geographical regions with different seasonal characteristics. Future work in this regard includes investigating the performance of other architectures that possess more abstraction (the level of complexity of the neural network). Abstraction can be increased by increasing the number of convoluted layers, which may or may not improve the accuracy metrics. It also has an effect on the training time for fitting an appropriate model. Furthermore, a different combination of CNNs and RNNs could be considered. Additionally, the effect of clustering is to be explored. The CNN-based models were fit, and, on the whole, the model performs quite uniformly across all seasons, which we believe is due to the fact that during the training method a set of values from the past is used to train the model at every step. This enables the model to capture seasonality as the weather-related variables from past values are clearly season-dependent. The approach is similar in the ARMA model since it also uses past values, but it is important to remember that the ARMA class of models are applicable only to univariate data (in this case the output values of the PV plant). Moreover, this study is a part of building a stochastic Energy Management System for microgrids, hence they will be used as inputs for optimization algorithms managing the EMS.

**Author Contributions:** Conceptualization and methodology were developed by authors V.S. and P.J., software, validation, formal analysis, investigation was done by V.S., resources, data curation and writing—original draft preparation was done by V.S. and J.R., writing—review and editing, visualization, supervision, project administration was done by author P.J. and Z.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received funding from the Chair of Electrical Engineering Fundamentals (K38W05D02), Wroclaw University of Technology, Wroclaw, Poland.

**Conflicts of Interest:** The authors declare no conflict of interest.
