2.3.1. Generalized Regression Neural Network

Donald F. Specht initially presented the General Regression Neural Network (GRNN) in 1991 [25], which is a deformation version of the radial basis function (RBF) neural network [28]. In comparison to RBF, GRNN improves at approximation and learning speed [31]. Its functioning is based on nonlinear or kernel regression, which implies that the result is dependent on the input. GRNNs may be utilized for prediction, modeling, mapping, and interpolation, as well as serving as controllers [25].

The GRNN architecture, as seen in Figure 5, is composed of four layers: the input layer, the pattern layer, the summation layer, and the output layer. The input layer takes and stores the input data *Xi* = [*x*1, *x*2, ... , *xn*]. The number of neurons in a network is proportional to the amount of data input. The input layer's result is then transmitted to the pattern layer. The pattern layer is nonlinear, and its neurons can retain information about the interaction between the input neurons and the pattern layer [31]. A pattern based on the Gaussian function Pi can be expressed as follows

$$P\_i = \exp\left[-\frac{(X - X\_i)^T (X - X\_i)}{2\sigma^2}\right] \text{ ( $i = 1, 2, \dots, n$ )}\tag{1}$$

where *σ* is the smoothing or spreading parameter. The input variable is denoted by *X*, whereas *xi* denotes a more precise training sample from neuron *i* in the pattern layer.

**Figure 5.** The architecture of General Regression Neural Network.

Following the pattern, the summation layer performs two distinct computations referred to as numerators and denominators. The first kind is used to determine the number of weighted outputs from the pattern layer, whereas the second type is used to determine the number of unweighted outputs from the pattern layer [26]. The pattern layer's purpose is as follows:

$$S\_s = \sum\_{i=1} P\_{i\prime} \left( i = 1, 2, \dots, n \right) \tag{2}$$

$$S\_w = \sum\_{i=1} w\_i P\_{\text{i}\prime}(i = 1, 2, \dots, n) \tag{3}$$

where *Ss* is the denominator, *Sw* is the numerator, and *wi* is the weight of the pattern neuron *i* connected to the summation layer.

The last layer is the output layer, the results of which are produced by dividing the neuron numerator *Ss* by the neuron denominator *Sw*. The output layer performs the following calculations:

$$y = \frac{S\_w}{S\_s} \tag{4}$$

In comparison to other approaches, the primary advantage of GRNN is that it is simple to train and requires only one independent parameter [26]. GRNN does not require recurrent training and may be trained in a short period of time. While this is a disadvantage of GRNN over other algorithms, it does need significant processing to analyze new points. These shortcomings, however, can be solved by adopting the clustering version of GRNN or by executing computations using an embedded parallel structure and building a semiconductor chip [25].

#### 2.3.2. Support Vector Machine

Vapnik et al. pioneered the Support Vector Machine (SVM) in 1999 [32]. SVM is a classification and regression technique used in machine learning [23]. This technique is more effective when used in conjunction with Structural Risk Minimization (SRM) than when used in conjunction with Empirical Risk Minimization (ERM) [24]. Support Vector Regression is a machine learning model that allows for trade-offs between minimizing empirical errors and the complexity of the resultant fitted function, hence lowering the danger of overfitting [33]. SVR employs a soft margin approach to achieve the highest degree of generalization; the regression issue is handled using an alternate loss function and two slack variables [24]. As follows is the definition of the nonlinear regression problem using the SVR model.

$$y = f(\mathbf{x}) = \omega \cdot \psi(\mathbf{x}) + b \tag{5}$$

where *ω* is a weighted vector, *b* is a constant bias, and *ψ*(*x*) is the feature space mapping function. The following minimization procedure is used to obtain the coefficients of *ω* and *b*:

$$Minimize \frac{1}{2} ||w^2|| + \mathcal{C} \frac{1}{N} \sum\_{i=1}^{N} (\xi\_i + \xi\_i^\*) \tag{6}$$

$$Subject\ to\left\{ \begin{array}{l} y\_i - (w, \ x\_i + b) \ge \varepsilon + \xi\_i \\ (w, \ x\_i) + b - y\_i \le \varepsilon + \xi\_i^\* \\ \mathfrak{z}\_i, \ \xi\_i^\* \ge 0 \end{array} \right.\tag{7}$$

where the parameters C and *ε* are model-defined. C evaluates the trade-off between empirical risk and smoothness, whereas <sup>1</sup> <sup>2</sup> *w*2 quantifies the function's smoothness. *<sup>ξ</sup>* and *ξ*∗ are positive slack variables that indicate the difference between the actual and corresponding limit values in the approximation function's *ε*-tube model.

Following the application of the Lagrangian multiplier and optimization of the conditions, the nonlinear regression function *f*(*x*) is as follows.

$$f(\mathbf{x}) = \sum\_{i=1}^{N} (\delta\_i - \delta\_i^\*) \, K(\mathbf{x}\_{i\prime}, \mathbf{x}\_j) + b \tag{8}$$

where *K* - *xi*, *xj* is a kernel function that describes the inner product in D-dimensional feature space [34], and *δ<sup>i</sup>* and *δ*<sup>∗</sup> *<sup>i</sup>* are Lagrangian multipliers.

The GRNN and SVR method were utilized for designing machine-learning-based model for electricity load forecasting with weather parameters are features input. In the next section, we perform exploratory data to calculate correlations between weather parameters with electricity load in Bali.
