*3.1. ARIMA Model*

Autoregressive integrated moving average (ARIMA) is a time series model designed to predict future points of an event with regard to its historic time series data [27,28]. The ARIMA model is generally considered with stationary data, wherein an initial differencing is applied several times to nonstationary data to attain a stationary time series. The AR part of ARIMA specifies that the generation variable repeats with respect to its lag values. The MA part of ARIMA specifies that the generation variable repeats with respect to its prior forecast errors obtained continuously. The first part of ARIMA indicates that the data values are restored by the difference value between original value and previous value. The aim of each of these features of ARIMA is to make the model fit the data. The model being used for prediction is an ARIMA model given by ARIMA (*p*, *d*, *<sup>q</sup>*)*s*, where '*p*' denotes the number of autoregressive terms, and '*d*' represents number of seasonal differences required for stationarity. The *'q'* is the number of lagged forecast errors in the prediction errors, and '*s*' is the number of lagged forecast errors in the periods per season (generally 12 in the present case). The structure of ARIMA model is shown in Figure 2.

**Figure 2.** Architectural view of the ARIMA model.

The mathematical representations of AR and MA models can be given as a pure Auto Regressive (AR) model, where *Zt* is only dependent on its own lag. In an AR model, *Zt* is a function of the 'lag components of *Zt*'. Hence, the mathematical representation of AR model becomes:

$$Z\_{l} = \mathbf{a} + \beta\_{1} Z\_{l-1} + \beta\_{2} Z\_{l-2} + \dots \dots \beta\_{n} Z\_{l-n} \in \mathfrak{t} + \dots \dots + \phi\_{1} \in \mathfrak{t}\_{l-1} + \phi\_{2} \in \mathfrak{t}\_{l-2} + \dots + \phi\_{n} \in \mathfrak{t}\_{l-n} \tag{1}$$

The architecture of an ARIMA model works on the basis of a time series analysis. Time series analysis (TSA) is a method to determine the futuristic trend of an event with a view of its past trend. The technique is based on the assumption that the future trend will hold similar to the historical trend. TSA focuses on two aspects, which are the identification of the nature of event (with respect to the series of observations) and the forecasting thereof. ElectricitydemandforecastingbytheutilizationofARIMAmodelconsistsofthe

following steps:

**Step 1:** *Collection of dataset:* Forecasting is always triggered by a set of values.


Therefore, a multiplicative model, given as *Sm* = *Lm* × *Tm* × *Nm*, is preferable. The multiplicative model can be transformed into an additive model with the introduction of logarithms, given as:

$$
\log S\_m = L\_m + \log T\_m + \log N\_m \tag{2}
$$

#### *3.2. Support Vector Regression Forecasting Model*

Support vector network is a branch of machine learning that analyzes data with respect to operational learning techniques. Support vector regression (SVR) uses the principles of support vector machine (SVM), except the fact that SVR adjusts the prediction function with the threshold error. SVR tries to minimize the generalization error so as to achieve generalized performance. On the contrary, most of the other regression techniques try to decrease the observed error between the forecasted value and the original value. SVR is the most common application of SVM. SVR can be applied for time series prediction, financial forecasting, estimation of challenging engineering tasks, etc. SVR is classified into linear, polynomial, and rbf kernel. Due to its high accuracy, the rbf kernel model is considered in this paper.

In this paper, ε-SVR was implemented, and the value of ε was set as 0.2. The regularization parameter 'C' and the gamma 'γ' were defined through grid search.

The train test split function was used, with random classification of 70% of data as train set and using the remaining 30% as test set. The train set was fitted using rbf kernel, and forecasting of the data was achieved. The forecasted data was compared with the test

set to verify the result, and mean square error (MSE), mean absolute error (MAE), and root mean squared error (RMSE) [35,36] indices were calculated. The following steps were implemented for the SVR model forecasting, as also shown in Figure 3:


**Step 5:** Calculate MSE, MAE, and RMSE.

**Figure 3.** Architecture of support vector regression.

#### *3.3. Linear Regression Forecasting Model*

Linear regression is a concept derived from statistics. It is a linear technique for modelling the correlation between a dependent variable and one or more independent variables. If one independent variable is considered, then the process is a simple linear regression. On the contrary, if several independent variables are considered, then the process is a multiple linear regression. Unlike multivariate analysis, which entirely focuses on joint probability, linear regression focuses on conditional probability.

Linear regression is extensively used for the study of practical applications, since it is easy to fit the models that have linear dependence with its historic data. Hence, linear regression has greater significance in forecasting applications.

The simple linear regression model is typically formulated as *y = a + bx; 'y'* is the output, *'x'* is the input, '*b*' is the input coefficient, and *'a*' is a constant. In case of multiple inputs, such as *x1, x2, x3*, the model representation is *y=a+ b1x1 + b2x2 + b3x3*.

Additionally, in this case, 70% of data are randomly considered as train data, and the remaining 30% are considered as test data, using the train\_test\_split function. Finally, the energy demand with respect to the test data is predicted and compared with the original data. To determine the accuracy of the model, MSE, MAE, and RMSE are calculated. The architecture of linear regression forecasting is presented in Figure 4. The model implementation is done according to the listed steps:


**Figure 4.** Architecture of linear regression.

#### *3.4. Long Short Term Memory Model*

The LSTM is a sort of RNN that is capable of remembering information for a significantly long period of time. In contrast to basic neural networks, where each node is characterized by a single activation function, each node in LSTM is employed as a memory cell that may store other information. LSTMs, in particular, have their own cell state, an input gate, an output gate and a forget gate. The cell remembers values over arbitrary time intervals, and the three gates regulate the flow of information into and out of the cell. These characteristic allow LSTMs address the problem of disappearing gradients from prior time-steps [23–25]. In our application, there are three LSTM layers, each with 40 units, a *tanh* activation function, and a drop out value of 0.15. The input sequence length is defined as 20.

**Step 1:** Create testing and training datasets with the train \_test\_split function. Here, we divided it into 70% training dataset, 30% testing dataset.

**Step 2:** Build the linear regression model using long short term memory model with predefined parameters for the training dataset.

**Step 3:** Forecast the consumption values for the testing dataset.

**Step 4:** Plot the actual and forecasted values of the testing dataset.

**Step 5:** Calculate MSE, MAE, and RMSE.

#### *3.5. Recurrent Neural Networks Model*

RNNs are networks that contain loops, which allow information to endure. They are utilized to model data that changes over time [30]. The data is fed into the network one by one, and the network's nodes save their current state at one time step and utilize it to influence the following time step. RNNs exploit the temporal information in the input data, and for this reason, are more suited to manage time series data. The ability of a RNN stands in using recurrent connections between neurons and can generally be described by the following equation [31]:

$$\mathbf{x}\_{t} = \begin{cases} & \mathbf{0}, \text{if}(\mathbf{t} = \mathbf{0}) \\ & \boldsymbol{\phi}(\mathbf{x}\_{t-1}, a\_t), \text{otherwise} \end{cases} \tag{3}$$

#### *3.6. Proposed RNN-GBRT Hybrid Model*

An RNN's goal is to forecast the next step in a sequence of observations in relation to the previous phases in the series [31]. In order to predict future trends, RNN makes use of consecutive observations and learns from previous phases. Data must be remembered throughout the early phases while estimating the following moves. The hidden layers in RNN serve as internal storage for the information obtained during the previous phases of the sequential data processing.

The GBRT algorithm, which is a mix of the CART (classification and regression trees) and GB (gradient boosting) algorithms, is also considered [31]. It is noted that the CART outperforms most artificial intelligence models in terms of prediction, since it can simulate

nonlinear interactions without having previous knowledge of the probability distribution of variables. Inspired by the contribution of [30,31] we propose the hybrid model RNN-GBRT in order to exploit the advantages of the two methods and obtain better forecasting performances. In GBRT, the current iteration's model reduces the previous iteration's residuals. At each iteration, it builds a new regression tree to reduce residuals with the gradient descent of the objective function. In the proposed hybrid model, RNN-GBRT, the generated series after RNN forms the training examples for GBRT. Generally, the performance of GBRT depends on learning rate and the total number of regression trees. In this paper, the value of the learning rate is set from 0.1 to 0.3, and the total number of regression trees is set from 20 to 150.

#### *3.7. Power Theft Detection Algorithm*

Energy is the fundamental resource to make possible every application in domestic, commercial, and industrial environments. The electric grid refers to the combination of transformers, transmission lines, substations, and other components that make energy delivery possible, from the source layout to the field of work in each sector. The complexity of energy generation and distribution systems leads to the necessity of managing and solving several possible issues and challenges. In this context, one of the most significant issues to be addressed is power theft. Particularly in India, power theft is a serious issue that the country has dealt with for many years. No effective solutions have ye<sup>t</sup> been found. A recent survey by the Central Electricity Authority suggests that over 27 percent of the total produced energy from various sources is lost due to the illegal practice of power felony [36]. Consequently, this affects over 5 percent of the country's GDP (gross domestic product). To address the issue, a new scheme is proposed in this paper for analyzing consumer energy and thereby detecting the source of theft, according to the procedure shown in Figure 5. To detect the source of power theft, the developed system is trained with recent data of energy consumption of the past 90 days from the available dataset. The system starts evaluating the mean of the collected data. The statistical mean of the given data is the sum of all the energy consumption values divided by its frequency, which is 90 in our case. The calculated mean 'A' is then stored in the memory manager of the proposed system. Afterwards, the system starts estimating the standard deviation of the data fed.

**Figure 5.** Flowchart of power theft detection procedure.

Standard deviation is defined as the individual differences of the data values with its mean. The purpose of standard deviation in our system is to define the degree of deviation between the adjacent data points. All the odd values of standard deviation are recorded. Among the recorded values of standard deviation, the values of maximum and minimum deviation are considered 'd' and ' *D*', respectively. Now, for testing the developed system, we consider the current day energy expenditure. In particular, the current energy expenditure '*B*' is subtracted with the recorded mean value of the trained data ' *A*'. The obtained value is denoted by ' *K*'.

The final step of the theft detection scheme compares *K* and *D* values. If the *K* value is greater than the *D* value, then it can be concluded that power theft is occurring. The intensity of power theft can also be determined from the magnitude of calculated difference ' *D*'.

#### **4. Implementation of Energy Forecasting and Power Theft Detection Models**

Let us recall that the available dataset from Sceaux, Paris (France), is randomly classified into train and test datasets. The train dataset consists of 70% of total data, and the test dataset consists of the remaining 30% of total 727 values. The daily energy consumption values were forecasted by fitting ARIMA parameters (7,1,6). The household energy demand forecasting results, obtained by applying the ARIMA, linear regression, SVR, LSTM, and RNN models, are, respectively, presented in Figures 6–10.

**Figure 6.** Forecast result of ARIMA model with respect to test dataset over time (day).

**Figure 7.** Forecast result of linear regression with respect to test dataset over time (day).

For better prediction performances, the authors propose the hybrid RNN-GBRT model, as described in Section 3.6. From the simulation results, it is evident that the RNN-GBRT model shows better prediction performance than other conventional methods, as presented in Figure 11.

**Figure 8.** Forecast result of support vector model with respect to test dataset over time (day).

**Figure 9.** Forecast result of LSTM model with respect to test dataset over time (day).

**Figure 10.** Forecast result of simple-RNN with respect to test dataset over time (day).

**Figure 11.** Forecast result of GBRT-RNN with respect to test dataset over time (day).

#### *4.1. Comparative Analysis Based on Error Indices Calculations*

In order to compare the prediction accuracy of the analyzed forecasting models, error indices can be used. With this aim, in this paper, three error indices are used to compare the efficiency of the analyzed prediction models. The first error index is the MSE [30], which is the mean squared difference between the estimated value and original value of the prediction technique. The second index is the MAE [31], which is the difference between the most similar observations of the model. Finally, the RMSE index [31] is calculated as the standard deviation of the residuals, representing the dispersion of residuals in the series. The comparison of the error values is presented in Figure 12. As is evident, the proposed RNN-GBRT performs better than other forecasting methods.

**Figure 12.** Comparison of Errors for accuracy analysis.

#### *4.2. Power Theft Detection Results Analysis*

The proposed scheme to detect the illegal practice of power theft is simulated in this section in order to show its effectiveness.

The range of theft detection is shown in Figure 13, where hourly data samples are evaluated, and no power theft occurs. The figure shows the threshold limit to detect power theft as 7.5 kW mean difference, as proposed in [34,35]. If the power consumption crosses the mentioned threshold, then users will receive a message regarding power theft that is occurring in their connection line. The utilization of a conventional distribution network and energy meters in the city of Vellore in India leads to the possibility of power theft events. Therefore, in order to provide the power theft case, the authors refer to a 727 day data set from Vellore city. In particular, Figure 14 shows the case of power theft detection where two power theft periods are highlighted.

**Figure 13.** The optimal range of power theft (kW), no power theft detected.

**Figure 14.** The case of power theft detection.
