3.4.4. Processing

In all ANN, LSTM, and DNN approaches, we used the "ReLU" activation function, the "adam" optimizer for the cost function, mean squared error for measuring the loss of training and validation, and an early stopping function (EarlyStopping callback in Keras [56] with 5 epochs patience) in order to avoid overfitting [57]. For the LSTM approach, the previous 365 energy values were used as input, and the consumption for the next 365 days was forecasted. After the model training, all models are being evaluated on the testing portion of the dataset. Their performance is measured based on certain metrics that are described in the following section.

#### **4. Evaluation metrics**

The performance and robustness of each studied natural gas forecasting model is based on four of the most common evaluation metrics. Mean square error (MSE), absolute error (MAE), mean absolute percentage error (MAPE), and coefficient of determination (R2) are all being used in order to determine the best performing model [5,7,43].

All the modelling, tests, and evaluations were performed with the use of Python 3.7 and the Tensorflow 1.14, Keras 2.3, SciKit Learn 0.21, Pandas 0.25, Numpy 1.17, Matplotlib 3.1, Seaborn 0.9 libraries. The mathematical equations of these evaluation metrics are described below:

Mean Squared Error:

$$\text{MSE} = \frac{1}{T} \sum\_{t=1}^{T} \left( Z(t) - X(t) \right)^2,\tag{1}$$

Mean Absolute Error:

$$\text{MAE} = \frac{1}{T} \sum\_{t=1}^{T} \left| Z(t) - X(t) \right|. \tag{2}$$

Mean Absolute Percentage Error:

$$\text{MAPE} = \frac{1}{T} \sum\_{t=1}^{T} \left| \frac{Z(t) - X(t)}{Z(t)} \right|, \tag{3}$$

Coefficient of Correlation:

$$\mathcal{R} = \frac{T\sum\_{t=1}^{T} Z(t) \cdot X(t) - \left(\sum\_{t=1}^{T} Z(t)\right)\left(\sum\_{t=1}^{T} X(t)\right)}{\sqrt{T\sum\_{t=1}^{T} \left(Z(t)\right)^{2} - \left(\sum\_{t=1}^{T} Z(t)\right)^{2}} \cdot \sqrt{T\sum\_{t=1}^{T} \left(X(t)\right)^{2} - \left(\sum\_{t=1}^{T} X(t)\right)^{2}}},\tag{4}$$

where *X*(*t*) is the predicted value, *Z*(*t*) is the real value, *t* is the iteration at each point (*t* = 1, ... , *T*), and *T* is the number of testing records.

Low MSE, MAE, and MAPE values signify small error, therefore higher accuracy. On the contrary, R<sup>2</sup> value close to 1 is preferred, signifying better performance for the model and that the regression curve is well fit on the data. A coefficient of determination value of 1 would signify that the regression line fits the data perfectly; however, this could also denote overfitting on the data.

To summarize, the whole process so far can be visually represented into an algorithmic flowchart. Starting from the data preprocessing, to the training of the algorithm and the prediction of the results, all the consecutive steps are shown in Figure 4.

**Figure 4.** Process flow for the algorithm that has been used for the study.
