*3.1. Multilayer Perceptron*

Rosenblatt [46] introduced a multilayer perceptron (MLP) concept with a single perceptron in 1958, consisting of the input layer, middle layers, and output layer. The input layer is a connection between outer space with the network. The middle layers are called hidden layers. Because there is no connection with the outside world, its values are not observed in the training set. The number of neurons in the input layer corresponds to the number of input parameters. Neurons in the hidden layer can be determined by the "trial and error" method. The output layer includes neurons according to our desired output, e.g., the forecasted value in the forecasting problems. A set of weights connects the neurons (see Figure 1).

The output value *y* of a three-layer perceptron can be formulated as:

$$y = \wp\_2(\sum\_{j=1}^{N} v\_j z\_j + b\_0) \tag{1}$$

where *N* is the number of neurons in the hidden layer, *vj* is the weight of the second layer, *zj* is the output of neuron *j*, *b*<sup>0</sup> is the bias of the output neuron and *ϕ*<sup>2</sup> is the activation function of the output neuron. Several activation functions have been used in MLP models, such as scaled conjugate gradient (SCG), Levenberg–Marquardt (LM), gradient descent with adaptive learning rate (GDA), gradient descent with momentum (GDM), and others. The output value of neuron *j* in the hidden layer is given by:

$$z\_j = q\_1 \left(\sum\_{i=1}^{M} w\_{ij} x\_i + b\_j\right) j = 1, \dots, N \tag{2}$$

where *M* is the number of inputs, *wij* are the weights of the first layer, *xi* are inputs and *bj* is the bias of neuron *j*, and *ϕ*<sup>1</sup> is the activation function of hidden layers. The reason behind choosing MLP is that they are fast to train and can afford hidden layer size 256 instead of 32–64. Also, colossal variance gives a strong ensemble with a single model type.
