**1. Introduction**

Global efforts to keep the increase in average temperature below 2 ◦C, with the possibility of keeping it lower than 1.5 ◦C, was agreed upon in the Paris agreement of 2015. In the recent "Climate action and support trends—2019" report, it was mentioned that current greenhouse gas emission levels and reduction efforts are not in line with meeting the targets that were set out [1].

Due to such environmental concerns and ambitious targets, there has been an increasing penetration of renewable energy sources in the power sector, especially in the form of solar photovoltaic panels. One of the biggest concerns connected with solar energy is its stochastic nature and variability, which threatens grid stability. A well-known approach to mitigate such uncertainty is the use of accurate forecasts [2].

The motivation for this study is the need to build a forecasting algorithm for a stochastic energy management system for the microgrid present at the Wroclaw University of Science and Technology. The microgrid currently employs a system that is deterministic, but, considering the stochastic nature of the solar panels, it was considered necessary. Convolutional neural network-based architectures used in forecasting are mainly used to study images of the sky, as explained later, and are used in tandem with statistical techniques for forecasting. This microgrid facility does not possess a device to record images of the sky but a deep learning approach to forecasting was decided upon. Hence, a data level approach using the sliding window algorithm for forecasting was adopted and the results were analyzed.

The area of forecasting is widely researched and is an age-old concept, aiming to predict solar PV outputs, wind turbine power outputs and loads in an electrical power system. A short literature review reveals numerous approaches, some of which are described as follows. In [3], short-term forecasts

for PV outputs were obtained using Support Vector Regression models wherein the parameters of the models were optimized using intelligent methods, such as the Cuckoo Search and Differential Evolution algorithms. In this study, the authors had used data from an inhouse rooftop solar PV unit at Virginia Tech. In [4], multiple linear regression was employed to make forecasts for solar energy output. This study had used extensive data obtained from the European Centre for Medium-Range Weather forecasts, including as many as 12 independent variables. The study described in [5] presents a generalized fuzzy logic approach in order to make short-term output forecasts from measured irradiance data. The input data in this case was for one particular month (October 2014) and the inputs and outputs were normalized within a range of 0.1–0.9. A comprehensive review and analysis of different methods and associated results regarding the forecasting of solar irradiance and solar PV output is presented in [6].

With regard to the application of Convolutional Neural Networks (CNN) for solar PV output forecasts, there is little available literature. One of the approaches as seen in [7,8] is to use a combination of historical data and sky images. The sky images are crucial in order to capture the effect clouds have on PV output. The study described in [8] used a total sky imager, which provides images of the sky and could, whereas [7] used videos recorded by a 6-megapixel 360 degrees fish eye camera by HiKvision. Other approaches, which do not use images but only historical data, have adjusted the CNN in such a way that it is able to deal with time series data. CNN is, in fact, a machine learning tool that is explicitly used for image detection and classification but based on the method by which data is processed, its ability to understand non-linear relationships between the inputs and outputs can be leveraged for time series data. A hybridized approach where CNN is used for pattern recognition and then a long short-term memory network is used for prediction is seen in [9] and then this framework is applied for 30 min ahead forecasting of global solar radiation. In [2], a method in which suitable data processing is applied before training the CNN is presented. In this case, the time series data is split into various frequencies through variational mode decomposition and it is then converted into a 2D data form that is extracted by convolutional kernels. Finally, the approach used in [10] proposes another hybrid method in which a chaotic Genetic Algorithm/Particle Swarm Optimization is used to optimize the hyper parameters of the CNN, which is then used to make solar irradiance prediction.

This paper's forecasting approach is to be applied in developing a stochastic energy management system for microgrids. Hence, a few contributions in this regard are as follows: p A comprehensive review about weather forecasts, forecast errors, data sources, different methodologies used, and their importance in microgrid scheduling is described in [11]. The focus has been kept on wind energy forecasts, solar generation, and load forecasts. Another popular approach for forecasts using the ARMA (Autoregressive moving average) model, especially for load forecasting followed by solving a microgrid unit commitment problem, is described in [12]. An advanced forecasting method using artificial neural networks, support vector regression, and random forest followed by incorporation into a Horizon 2020 project involving several countries has been described in [13].

This paper utilizes a sliding window approach in order to prepare data in such a way that it can be used to train the CNN with historical data and make accurate predictions.

## **2. Forecasting Models, Data Processing, and Evaluation Metrics**

## *2.1. Forecasting Models*

The data for this study comes from a PV panel installed at a university building of the Wroclaw University of Science and Technology. It is a part of a power plant with a peak power capacity of 5 kW. The input measurements are obtained from associated sensors and are Irradiation (W/m2), Wind speed (m/s), Ambient temperature (◦C), and PV Module Temperature (◦C). The output of the panel (W) and all inputs are measured in a 15 min window. The forecasting is also done in steps of 15 min intervals.

The inputs were chosen according to the recommendations of the IEA (International Energy Agency) report on "Photovoltaic and Solar Forecasting" [14] and other reliable sources [15]. The evaluation and

benchmarking techniques to be used for the forecasts were also taken from [14–16] in order to establish the reliability of the results of this study. The metrics are discussed in detail further on.

The structure of the CNN model is shown in Figure 1.

The CNN is a specialized neural network that is explicitly used for image recognition. In such cases, the input images are represented as a two-dimensional grid of pixels. In order to use CNNs for time series data, a 1-D structure is more appropriate. Taking the example of the input time series data used in this study, it is a 175,200 × 4 matrix. The length (number of rows) represents the time step of the input data, whereas the columns (Irradiation, Wind Speed, Ambient temperature, PV Module Temperature) represent the width. This can be equated to the height and width of the pixels that are used as the input data for training CNNs for image recognition.

For efficient and quick training of all networks, the min–max scaling algorithm was used. This is necessary since the distribution and scale of the data varies for every variable. Moreover, the units of measurement for every variable are also different, which could lead to large weight values, and models assuming such large weight values often perform poorly while learning and are sensitive to changes in input values [17]. It was applied to normalize the data within the range of [0,1]. The formula for the same is described in (1).

**Figure 1.** Convolutional Neural Network (CNN) structure.

The convolutional layer that follows input data processing is responsible for feature extraction [18]. The layer is made up of as many filters (neurons) as there are variables (4). These filters carry out convolution, which, by definition, is a function that is applied to the input data to obtain specific information from it. These filters are moved across the entire input data in a sliding window-like manner. In case of 2-D images the sliding window is moved horizontally and vertically but since this study employs a 1-D data the window is made to move vertically. The function used in this case is the Rectified Linear Activation Function (RLAF), which is described below, and the sliding window algorithm is described later.

The RLAF is a function that behaves like a linear function but is actually non-linear in nature, which enables the learning of complex relationships in the input data. It is widely used and can be defined in an easy manner. When the input is greater than 0.0, the output value remains the same as the input value, whereas if the input is less than 0.0 the output is 0.0. Mathematically, it is defined as:

$$\log(z) = \max\left\{0, z\right\}\tag{2}$$

where *z* is the input value and *g* is the RLAF function. The advantage of this function includes computational ease, sparsity, and the ease of implementation to neural networks due to its linear behavior despite non-linearity [19].

The output of the filters in the convolutional layers are called feature maps. The feature maps hold relationships and patterns from the input data. These feature maps from each filter put together complete the convolutional layer. This layer is followed by the pooling layer, the objective of which is to reduce the feature maps of the convolutional layer (it summarizes the features learnt in the previous layer). This is done in order to prevent overfitting. It also reduces the size of the input data, which results in increased processing speeds and reduced memory demand. While there are numerous pooling functions, such as max, average, and sum [18], this study employs the max function, hence the max pooling layer.

The flattening layer succeeding the max pooling layer converts the output into a 1-D input vector that can be given to the dense or fully connected layer. The dense layer in this case is a regular neural network that has a non-linear activation function.

The model in this case is fit by the Adam optimization algorithm. The advantage of using this optimizer is that the learning rate is adjusted as the error is reduced.

It is in fact a combination of two well-known extensions of stochastic gradient descent, which are the Adaptive gradient algorithm (AdaGrad) and Root mean square propagation (RMSProp). Adam is discussed in detail in [20].

The second CNN structure used in this study is the multi-headed CNN. This approach involves handling every input series by its own CNN. This approach has shown some flexibility. While there is no significant proof in the literature behind the advantages of multi-headed CNN over the regular CNN using multiple filters, a multi-headed CNN with 3 convolutional 2-D nets has been used for enhanced image classification as shown in [21]. This paper uses a similar, yet different, architecture. The structure of the multi-headed CNN is shown in Figure 2.

**Figure 2.** Multi-headed CNN structure.

In this study, as described in Figure 2, the multi-headed CNN has 4 CNNs, one for each input. This is followed by 4 max pooling layers and then by 4 flattening layers, and then the results from these layers is combined before the information is fed to the dense neural network, which makes the final prediction.

The third approach for forecasting is the CNN-LSTM (CNN-Long Short-Term Memory) network. Recently, the CNN-LSTM has been implemented in many areas for time series predictions. Study [22] presents a problem where water demand in urban cities is predicted. The correlation between water demand and changes in temperature and holiday periods is obtained using CNN-LTSM networks, and an improvement in predictions was observed. Similarly, an improvement in weather predictions was demonstrated in [23] by using such a hybrid CNN-LSTM architecture.

The LSTM is in fact an RNN (Recurrent neural network), which is efficient in working with time series data and is known to be a powerful tool for classification and forecasting associated with time series data. The uniqueness of LSTM comes from the memory cell, which behaves as a collector of state information. Whenever new information is obtained if the input gate is triggered it will be accumulated in the cell and past information would be forgotten if the forget gate is triggered. The latest cell obtained in such a process would be propagated to the final stage only if the output gate is triggered. This kind of cell behavior prevents the gradients trapped in the cell from vanishing quickly and is characteristic of LSTM, which makes it better suited to handle time series data and make predictions compared to other RNN structures [24].

The advantage of using a hybrid CNN-LSTM architecture is that the CNN is used to extract features from the raw input time series and then these features are given as an input to the LSTM, which is efficient with time series data.

Figure 3 provides the CNN-LSTM architecture. It can be noticed that, overall, the structure is similar to the CNN structure in Figure 1, with the exception of the LTSM layer, which enables the whole network to process the time series data more efficiently.

**Figure 3.** CNN-LSTM structure.

In order to provide a benchmark with an established technique for forecasting, the ARMA model is proposed. The ARMA model is utilized mainly for stationary time series data. In this method, the predicted variable is calculated on the basis of a linear relationship with its past values [25,26]. In cases when the data is non-stationary and has seasonal characteristics, as will be explained in the next section, it has to be transformed into a stationary one before an ARMA model is fit. The model consists of two parts, AR (Autoregressive) and MA (Moving Average), and is defined as ARMA (m, n) where m, n represent the orders of the model.

$$\mathbf{y}\_t^{'AR} = \sum\_{t=1}^{m} \mathcal{Q}\_t \mathbf{x}\_{t-i} + \omega\_t = \mathcal{Q}\_1 \mathbf{x}\_{t-1} + \mathcal{Q}\_2 \mathbf{x}\_{t-2} + \dots + \mathcal{Q}\_m \mathbf{x}\_{t-m} + \omega\_t \tag{3}$$

$$y\_t^{\prime M \Lambda} = \sum\_{j=0}^{n} \theta\_j \omega\_{t-j} = \omega\_l + \theta\_1 \omega\_{l-1} + \theta\_2 \omega\_{l-2} + \dots + \theta\_n \omega\_{l-n} \tag{4}$$

$$\mathcal{Y}\_t^{'ARMA} = \sum\_{i=1}^m \mathcal{Q}\_i \mathbf{x}\_{t-i} + \sum\_{j=0}^n \theta\_j \omega\_{t-j} \tag{5}$$

where *y AR <sup>t</sup>* , *y MA <sup>t</sup>* , and *y ARMA <sup>t</sup>* represent the time series values of the autoregression (AR), the Moving average (MA), and the Autoregression moving average (ARMA), respectively. ∅*<sup>i</sup>* is the autoregressive coefficient and θ*<sup>j</sup>* is the moving average coefficient. ω*<sup>t</sup>* is the noise.

The autoregressive (AR) part involves representing the current value as a result of a linear combination of the previous values and the noise ω*t*. It is represented in Equation (3). The Moving average part is a combination of previous individual noise components, which is used to create a time series, as shown in Equation (4). ARMA is a combination of both AR and MA [27].

The parameters of the model m and n are chosen on the basis of an auto correlation function (ACF) and a partial auto correlation function (PACF). The ACF provides a correlation between a value of a given time series with past values of the same series, whereas the PACF provides a correlation between a value of the time series with another value at a different lag. If the ACF is reduced to a minimum value after a few lags and PACF depicts a large cut-off after the initial value, the time series is said to be stationary. This is then finally confirmed by the Augmented Dickey Fuller (ADF) test, which is explained in [25]. A confidence level of 95% is assumed for this study, hence a p-value of less than 0.05 is a confirmation of stationarity.

The analysis of the time series data according to the ACF, PACF, and ADF, in addition to its conversion to a stationary time series followed by the fitting of an ARMA model, is discussed in the next section.

Finally, the same data is also fit with a linear regression model. The linear regression model is explained below. A comprehensive study on the use of linear regression along with an improved model for hourly forecasting can be found in [28].

$$Y = \beta\_o + \beta\_1 X\_1 + \beta\_2 X\_2 + \dots + \beta\_k X\_k + \text{ }\epsilon\tag{6}$$

where *Y* is the dependent variable, *Xk* are the independent variables, β<sup>0</sup> is the constant term, β*<sup>k</sup>* is the coefficient corresponding to the slope of each independent variable, and is model's error, also known as residuals
