*2.3. Data-Driven Techniques Used*

2.3.1. Artificial Neural Network

The ANN methodology is a tool used to replicate the problem-solving mechanism of the human brain. ANNs are incredibly robust at modeling and simulating linear and non-linear systems. The ANN's feed-forward back-propagation techniques were highly emphasized among ANNs because their lower level of difficulty in the present study were also used [36,37]. ANN consists of the input layer, output layer, and hidden layers between the input and output layers. Each node within a layer is connected to all the following layer nodes. Only those nodes within one layer are connected to the following layer nodes [29]. Each neuron receives processes and sends the signal to make functional relationships between future and past events. These layers are attached with the interconnected weight Wij and Wjk between the layers of neurons. The typical structure using input variables is shown in Figure 3.

For this analysis, only one hidden layer network was used since it was considered dynamic enough to forecast meteorological variables. There are some transfer functions required to create an artificial neural network neuron. Transfer functions are needed to establish the input–output relationship for each neuron layer. In this analysis, Levenberg– Marquardt was used to train the model. A hyperbolic tangent sigmoid transfer function was used to measure a layer's output from its net input. The neural network learns by changing the connection weights between the neurons. By using a suitable learning

algorithm, the connection weights are altered using the training data set. The number of hidden layers is typically determined by trial and error. A comprehensive ANN overview is available [25,38,39].

**Figure 3.** Three-layered structure of the artificial neural network.

#### 2.3.2. Wavelet Artificial Neural Network (WANN)

The wavelet analysis (WA) offers a spectral analysis dependent on the time that explains processes and their relationships in time-frequency space by breaking down time series [40]. WA is an effective method of time-frequency processing, with more benefits than Fourier analysis [41]. WA is an improvement over the Fourier transformation variant used to detect time functionality in data [40]. Wavelet transformation analysis, breaking down time series into essential functions at different frequencies, improves the potential of a predictive model by gathering sufficient information from different resolution levels [25]. There is excellent literature on wavelet transforming theory [42,43]; we will not go into it in depth here. It is vital to choose the base function carefully (called the mother wavelet). The essential functions are generated by translation and dilation [44]. In general, the discrete wavelength transformation (DWT) has been used preferentially in data decomposition, as compared to continuous wavelet transformation (CWT), because CWT is time-consuming [3,18].

The present used the DWT method for daily EPan (mm) estimation. DWT decomposes the original input time series data of Tmax, Tmin, RH-1, RH-2, WS, and SSH into different frequencies (Figure 4), adapted from Rajaee [44].

This analysis used three stages of the Haar à trous decomposition algorithm using Equations (1) and (2):

$$\mathbb{C}\_r(t) = \sum\_{l=0}^{+\infty} h(l)\mathbb{C}\_{r-1}(t+2^r) \qquad (r = 1, \ 2, \ 3, \ \dots, n) \tag{1}$$

$$\mathcal{W}\_r(t) = \mathbb{C}\_{r-1}(t) - \mathbb{C}\_r(t) \tag{2}$$

where *h*(*l*) is the discrete low-pass filter, *Cr*(*t*) and *Wr*(*t*) (*r* = 1, 2, 3, ... ., *n*) are scale coefficient and wavelet coefficient at the resolution level. Two sets of filters, including low and high passes, are employed by DWT to decompose the main time series. It is discontinuous and resembles a step feature that is ideal for certain time series of abrupt transitions. The abovementioned wave types were evaluated, and finally, the measured monthly time series, H, were decomposed into multi-frequency time series including details (HD1; HD2; . . . ; HDn) and approximation (Ha) by optimum DWT (Qasem et al., 2019).

**Figure 4.** Schematic representation of WANN.

The obtained decomposed frequency values function as an ANN input. Hybridizing the decomposed input time series data of Tmax, Tmin, RH-1, RH-2, WS, and SSH with ANN results in a wavelet artificial neural network (WANN) [42]. Three levels of the Haar à trous decomposition algorithm were used in this study. For the model's training, the Levenberg– Marquardt algorithm was used. The hyperbolic tangent sigmoid transfer function was also used to measure a layer's output from its net input.

#### 2.3.3. Support Vector Machine

The support vector machine (SVM) was developed by [33] for classification and regression procedures. The fundamental concept of an SVM is to add a kernel function, map the input data by non-linear mapping into a high-dimensional function space, and then perform a linear regression in the feature space [45]. SVM is a modern classifier focused on two principles (Figure 5) adapted from Lin et al. [46]. First, data transformation into a high-dimensional space can render complicated problems easier, utilizing linear discriminate functions. Secondly, SVM is inspired by the training principle and uses only specific inputs nearest to the decision region since they have the most detail regarding classification [47].

**Figure 5.** SVM Layout.

We assume a non-linear function *f*(*x*) is given by:

$$f(\mathbf{x}) = \mathbf{w}^T \Phi(\mathbf{x}\_l) + b \tag{3}$$

where **w** is the weight vector, *b* is the bias, and **Φ**(**xi**) is the high dimensional feature space, linearly mapped from the input space *x*. Equation (3) can be transformed into higher dimensions and gives final expression as:

$$f(\mathbf{x}) = \sum\_{i=1}^{m} \left( a\_i^+ - a\_i^- \right) \mathbf{K} \left( \mathbf{x}\_{i\prime} \mid \mathbf{x}\_j \right) + b\_{\prime}^\cdot \tag{4}$$

where, *α*<sup>+</sup> *<sup>i</sup>* ,*α*<sup>−</sup> *<sup>i</sup>* are Lagrangian multipliers which are used to eliminate some primal variables, and the term *K* - *xi*, *xj* is the kernel function. The derivation and excellent literature about SVM can be obtained from [48]. The study's kernel function was a linear function (LF) and radial function (RF).

• Linear kernel function (LF): the most basic form of kernel function is written as:

$$K(\mathbf{x}\_{i\prime}, \mathbf{x}\_{j}) = (\mathbf{x}\_{i\prime}, \mathbf{x}\_{j}) \tag{5}$$

• Radial basis function (RBF): a mapping of RBF is identically represented as Gaussian bell shapes:

$$K(\mathbf{x}\_i, \mathbf{x}\_j) = \exp\left(-\gamma ||\mathbf{x}\_i - \mathbf{x}\_j||^2\right) \tag{6}$$

where *γ* is the Gaussian RBF kernel parameter width; the RBF is widely used among all the kernel functions in the SVM technique.

The efficiency of the SVR technique depends on the environment for an ε-insensitive loss function of three training parameters (kernel, C, *γ*, and ε). However, the values of C and ε influence the complexity of the final model for every specific type of kernel. The ε value measures the number of support vectors (SV) used for predictions. The best value of ε intuitively results in fewer supporting vectors, leading to less complicated regression estimates. However, C's value is the trade-off between model complexity and the degree of deviations permitted within the optimization formulation. Therefore, a more considerable value of *C* undermines model complexity [49]. The selection of optimum values for these training parameters (*C* and ε) guaranteeing fewer complex models is an active research area.
