3.3.5. Autoencoder (AE)

AE is a type of AI algorithm based on deep neural networks that use unsupervised learning for encoding and decoding the input data and are commonly utilized for feature extraction and denoising [68]. Two different processes are performed by AE: encoding and decoding. Hence, its structure is symmetrical. The input data are passed through three different layers: the input, latent, and output layers. These layers make up the AE architecture (Figure 9). The input and output layers have the same size, and the latent layer has a smaller size than the input layer [69]. Encoding and decoding are achieved with the following equations, respectively:

$$\mathbf{e} = f\_{\theta}(\mathbf{x}) = s(\mathcal{W}\mathbf{x} + b) \tag{10}$$

$$
\widetilde{\mathfrak{X}} = \mathfrak{g}\_{\ \mathfrak{G}'}(e) = s \left( \mathcal{W}' e + b' \right) \tag{11}
$$

where *x* is the input vector, *e* ∈ [0, 1] d represents the latent vector, and *x* ∈ [0, 1] D is the produced vector. From the input layer to the latent layer, the encoding process is repeated. Next, the decoding process is repeated from the latent layer to the output layer. *W* and *W* represent the weight from the input to the latent and from the latent to the output layers, respectively. *b* and *b* denote the bias vectors of the input layer and the latent layer. The activation functions of the latent layer neurons and the output layer neurons are represented with *fθ* and *g* θ , respectively. The weight and bias parameters are learned in

the AE structure after reducing the reconstruction error. Equation (12) is used to measure the error between the reconstructed *x* and the input data *x* for individual instances:

$$J(\mathcal{W}, b', \mathbf{x}, \tilde{\mathbf{x}} \mid) = \frac{1}{2} \left\| h\_{w, b}(\mathbf{x}) - \tilde{\mathbf{x}} \right\|^2 \tag{12}$$

**Figure 9.** The structure of the auto-encoder (AE) model.

In a training dataset including *D* instances, the cost function is defined as follows:

$$\sum\_{l=1}^{n\_l} \sum\_{i=1}^{n\_l} \sum\_{j=1}^{c\_l} \left( \mathcal{W}\_{ji}^{(l)} \right)^2 = \left[ \frac{1}{D} \sum\_{i=1}^{D} (\frac{1}{2} \left\| h\_{w\beta} \left( \mathbf{x}^{(i)} - \hat{\mathbf{x}}^{(i)} \right) \right\|^2 \right] + \frac{\lambda}{2} \sum\_{l=1}^{n\_l} \sum\_{i=1}^{c\_l} \sum\_{j=1}^{c\_l+1} \left( \mathcal{W}\_{ji}^{(l)} \right)^2 \tag{13}$$

where *D* refers to the total number of instances, *s* to the number of neurons in layer *l*, *λ* represents the weight attenuation parameter, and the square error is the reconstruction error of each training instance.
