*3.2. Study Area*

The study areas were Taipei, Hsinchu, Taichung, and Kaohsiung city, with pollution data consisting of nitrogen oxide (NOx), atmospheric PM2.5, atmospheric PM10, and sulfur dioxide (SO2) levels. Furthermore, the locations of these areas were as established by the Taiwan Environmental Protection Administration Executive Yuan. Table 1 shows statistical summaries of the amounts of air pollution at the four studied locations. The findings typically demonstrate that Taichung has higher concentrations of PM10, PM2.5, and NOX, but in Kaohsiung, SO2 is the greatest pollutant. Figure 1 shows an overview of the genetic algorithm's training and evaluation phases. Because each type of air pollutant has a different distribution, we trained the same models for each dataset using the same model architecture.


**Table 1.** Descriptive statistics.

**Figure 1.** An overview of the genetic algorithm training and evaluation phases.

Samples for training were split in two, and alternating training and assessment were done in the first half of the samples. After this part was complete, the other half was used for forest training. Again, the first half was divided into smaller sections called stages. We perform simulations for ratios of 90:10, 80:20, 70:30, 60:40, and 50:50. In the training process, the training samples from the stepper were conditioned for all chromosomes, including new chromosomes of the previous level. Before the formation of the new chromosomes, all forests were educated in parallel. After all forests were qualified in the training part, genetic operators were used in the assessment part to calculate fitness values to operate in the genetic pool. This algorithm altered the substitute operator location to first and functioned only when a new chromosome was generated at the previous point.

#### *3.3. Air Pollution Forecasting Using VAR-Cascade-GA*

Poor air quality in Taiwan has mostly been identified as being a result of household burning, largely the source of greenhouse gas emissions. Taiwan's geography was observed to be a primary contributor towards its environmental problems, resulting in poor absorption and pollutant locking. Taipei, Taiwan's capital and most populous city, is surrounded by mountains, and advanced manufacturing offices all along the western and northern coastlines of Taiwan were also built near mountain ranges. In Section 3, we already discussed the construction step and simulation studies. Furthermore, during the construction stage of input, we employed the VAR pollution space–time dataset including Taichung (Y1), Taipei (Y2), Hsinchu (Y3), and Kaohsiung (Y4) in Taiwan.

Figure 2 shows that five hidden layers were used to create the model, and the ratio used was calculated by assessing the error values of the testing results shown in Table 2. During training and testing, PM2.5 is represented in Figure 3, PM10 is represented in Figure 4, NOX is represented in Figure 5, and SO2 is represented in Figure 6. In this context, the cascade neural network genetic algorithm model can be used to study nonlinear and nonstationary data on air pollution. The metrics used to evaluate the test set's result were the root-mean-squared error (RMSE), mean absolute error (MAE), and symmetry mean absolute percentage (sMAPE) between the actual air pollution values and the predicted values. These are metrics that are commonly used in regression problems like our air pollution prediction. If all the metric values are smaller, then the model's performance is better [25].

**Figure 2.** Space–time Cascade Neural Network with genetic algorithm, adopted by [22,36].


**Table 2.** Model comparison based on pollution.

**Noted:** Best simulation with low error (\*) and yellow highlight represent the lowest value of each information pollution, accuracy measurement, and elapsed time.

**Figure 3.** PM2.5 data training of the CFNN using a genetic algorithm and backpropagation.

In the results, the cascade neural network genetic algorithm with ratio 90:10 provided lower RMSE, MAE, and sMAPE values for all variables. The optimum number of hidden neurons showing good performance in the test and validation results could then be selected. Using this model, a prediction of future air pollution was performed. In this study area, the air pollution levels in these four cities in Taiwan influence each other. However, the accuracy of prediction was not improved when we set the training and testing ratio to 80:20, 70:30, 60:40, or 50:50 in the same section. There are several training algorithms, such as backpropagation, Conjugate Gradient Powell–Beale (CGB), Broyden–Fletcher–Goldfarb (BFG), Levenberg–Marquardt (LM), and Scaled Conjugate Gradient (SCG). The rate of change in the error with respect to the connection weight including the error gradient is used as a path for training.

**Figure 4.** PM10 data training of the CFNN using a genetic algorithm and backpropagation.

In order to measure the stage size to optimize performance, we used backpropagation and conducted a search along the conjugate or orthogonal path. Appropriately, we proved that this was the easiest way to train moderate feedback networks. That being said, some matrix multiplication is included in the processing for such issues as air pollution over time. The network is very wide in this research, so using backpropagation is a good way. When overfitting occurs, the transferability of the potential is significantly decreased. To suppress overfitting, methods such as so-called regularization are often used. L1 (L2) regularization adds the sum of the absolute (square) values of weights to the loss function, as in Equations (13) and (14), where Γ is the loss function and *wijk* indicates the weights in the network. In addition, α is the scaling factor for the summation. *NH* denotes the number of layers and *Ni* denotes the number of nodes in the *i*th layer.

$$
\Gamma\_{L1} = \Gamma + \alpha \sum\_{i=1}^{N\_H - 1} \sum\_{j=1}^{N\_i} \sum\_{k=1}^{N\_{i+1}} \left| w\_{jk}^i \right| \tag{13}
$$

**Figure 5.** NOx data training of the CFNN using a genetic algorithm and backpropagation.

#### *3.4. Does the Activation Function Provide High Accuracy and Speed Up the Time Lapse?*

Linear regression models work well throughout short-term predictions based on daily or weekly measurements in time series forecasting, but they cannot tackle nonlinearity in showing variables properly, not even for long-term predictions from seasonal or annual data series. Various machine learning methodologies have been introduced and used to simulate problems and provide predictions in environmental research, as machine efficiency has been evolving rapidly in the last decade. Despite its prominence and outstanding data accuracy, critical issues in the Artificial Neural Network are its propensity to overfit training data and inconsistency for short histories of training data. Several strategies for more effective and efficient preparation of NNs have been recommended. However, these are not simple and also have markedly poor accuracy.

**Figure 6.** SO2 data training of the CFNN using a genetic algorithm and backpropagation.

After the training and testing comparisons already discussed in Section 3.3, we considered proving the performance of the hybrid cascade neural network genetic algorithm when using other activation functions. Computational capabilities are increasing in the era of big data, high-performance computing, parallel processing, and cloud computing. In line with this, we address whether the activation function can improve accuracy and speed up the time lapse. Throughout the last decades, the machine learning domain, a branch of artificial intelligence, has gained popularity, and researchers in the area have led it to expand through various areas of human life. Machine learning is a part of research that employs statistics and computer science concepts to develop mathematical models used to execute large tasks such as estimation and inference [55]. These frameworks are collections of mathematical interactions between the system's inputs and outputs. A learning process entails predicting the model parameters so that the task can be executed effectively. To improve accuracy, researchers have conducted simulated comparisons using various activation functions. The most popular activation functions are SoftMax, tanh, ReLU, Leaky ReLU, sigmoid, and logsig [56–59].

As asserted, the activation function can be defined and applied to an ANN to assist the network in understanding various systems in data. Although contrasted to a neuronbased design seen in human brains, an activation function is essentially responsible for determining what neuron to trigger immediately [60]. Inside an ANN, the activation function is doing the same thing. All of this receives a prior nerve cell output signal and transforms it into a format which can be used as feedback to ye<sup>t</sup> another cell. In this simulation, we used logsig in Equation (15), radbas in Equation (16), SoftMax in Equation (17), and tribas in Equation (18).

$$z\_{\circ} = \frac{1}{1 + \exp\left(-X\_{\circ}\right)}\tag{15}$$

$$z(\mathbf{x}) = \sum\_{i=1}^{N} w\_i \wp(\|\mathbf{x} - \mathbf{x}\_i\|) \tag{16}$$

$$
\sigma(\stackrel{\rightarrow}{z})\_i = \frac{e^{z\_i}}{\sum\_{j=1}^K e^{z\_j}} \tag{17}
$$

$$\operatorname{tri}(\mathbf{x}) = \Lambda(\mathbf{x}) \stackrel{\text{def}}{=} \max(1 - |\mathbf{x}|, 0) \tag{18}$$

Table 3 shows that the best activation function for PM10 was logsig, that for PM2.5 was SoftMax, that for NOx was radbas, and that for SO2 was tribas. The SoftMax activation function provided a shorter time lapse than other activation functions.

**Table 3.** Combining activation functions with the Cascade Neural Network.


**Noted:** Yellow highlight represent the lowest value of each information pollution, accuracy measurement, and elapsed time.

> The cascade feed forward neural network model differs only when determining the input variables. During the simulation, we constructed the input by vector autoregression. Then, we considered the input as the lag variable of each predicted variable, in this case, the air pollution data at the four locations of Taichung, Taipei, Hsinchu, and Kaohsiung. Then, in the CFNN model for the four locations, neurons were compiled in the layer and the signal from the input to the first input layer, then to the second layer (hidden layer), and finally to the output layer. The general equation for forecasting pollution data in the four locations, represented in Equation (19), was used for prediction purposes in these study areas. Meanwhile, Equation (20) shows four input neurons *Yt*−<sup>1</sup> (lag 1) and five neurons

in hidden layer of *Zt*. To perform the forecasting, we used Equation (21) for NOx with the radial basis activation function, Equation (22) for PM2.5 with the SoftMax activation function, Equation (20) for PM10 with the logsig activation function, and Equation (23) for SO2 with the tribas activation function. We provide the results of forecasting in Figure 7 for the next 30 steps. The results show Taichung constantly leading with the highest pollutant score compared to other cities in Taiwan.

$$\begin{aligned} \Psi\_t &= \psi\_2 \left\{ \begin{bmatrix} \boldsymbol{w}\_{l0} \ \boldsymbol{w}\_0 \ \boldsymbol{w}\_{l2} \end{bmatrix} \begin{bmatrix} 1 \\ Z\_t \\ \boldsymbol{Y}\_{t-1} \end{bmatrix} \right\} \\ Z\_t &= \psi\_2 \left( \begin{bmatrix} \boldsymbol{w}\_{bi} & \boldsymbol{w}\_{i1} \end{bmatrix} \begin{bmatrix} 1 \\ \boldsymbol{Y}\_{t-1} \end{bmatrix} \right) \end{aligned} \tag{19}$$

**Figure 7.** Forecasting all pollution datasets using the CFNN with a genetic algorithm and backpropagation.

Cascade Neural Network Genetic Algorithm for NOx using the radial basis activation function:


⎡

⎢⎢⎣


Cascade Neural Network Genetic Algorithm for PM2.5 using the SoftMax activation function:

0.9653 −1.0657 3.2999 −2.4456 −0.7936 −1.1990 0.7359 0.0491 −0.0690 0.1124 1.2965 −1.5045 2.3489 2.0493 −1.9046 −1.6001 0.0225 0.2826 0.3253 0.0500 6.0285 −6.1880 0.3716 2.3998 −6.8009 −6.3181 0.1929 −0.0279 0.4639 0.1170 0.9319 −0.9456 10.3730 3.4200 −1.1032 −0.9099 0.0890 0.0227 0.0463 0.7191 ⎤ ⎥⎥⎦ ⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣ 1*Z*1 *Z*2 *Z*3 *Z*4 *Z*5 *Y*1,*t*−<sup>1</sup> *Y*2,*t*−<sup>1</sup> *Y*3,*t*−<sup>1</sup> *Y*4,*t*−<sup>1</sup> ⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦, *Zt* = *so f tmax* ⎡ ⎢⎢⎢⎢⎣ 6.2714 −7.3511 −7.8139 −1.9507 4.3787 −5.6864 −7.7599 −8.3507 −2.9367 −2.3266 −6.4463 −6.6875 5.0926 1.5264 −1.1062 −5.0961 −3.3109 5.3424 6.5753 −3.4388 3.6740−2.6485−6.2284−6.72084.4362⎤ ⎥⎥⎥⎥⎦ ⎡ ⎢⎢⎢⎢⎣ 1 *Y*1,*t*−<sup>1</sup> *Y*2,*t*−<sup>1</sup> *Y*3,*t*−<sup>1</sup> *Y*4,*t*−<sup>1</sup>⎤ ⎥⎥⎥⎥⎦ (21)

Cascade Neural Network Genetic Algorithm for PM10 using the logsig activation function:

1

*Y*ˆ *t* = ⎡ ⎢⎢⎣ 4.7332 −0.0171 −4.7892 −0.0294 −0.0499 −0.0380 0.5673 0.0698 0.0402 0.0720 −4.7021 0.0480 4.5307 −0.0609 0.0313 −0.1244 0.1252 0.3677 0.1878 0.0614 2.2770 0.0263 −2.4010 −0.0006 −0.0102 0.0014 0.2351 −0.0648 0.5960 0.0696 1.5241 −0.0534 −1.4885 0.0533 −0.1240 0.1097 0.1118 0.1017 0.1786 0.5507 ⎤⎥⎥⎦ ⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣ *Z*1 *Z*2 *Z*3 *Z*4 *Z*5 *Y*1,*t*−<sup>1</sup> *Y*2,*t*−<sup>1</sup> *Y*3,*t*−<sup>1</sup> *Y*4,*t*−<sup>1</sup> ⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦, *Zt* = *logsig* ⎡ ⎢⎢⎢⎢⎣ −7.6484 −5.6012 5.5769 0.2050 −7.1292 9.0472 −2.0909 −7.4709 4.9145 −5.3312 −4.1385 −9.9840 −3.4291 6.0077 2.4277 −1.5068 −4.01473.56765.6306 −9.0051 ⎤ ⎥⎥⎥⎥⎦ ⎡ ⎢⎢⎢⎢⎣ 1 *Y*1,*t*−<sup>1</sup> *Y*2,*t*−<sup>1</sup> *Y*3,*t*−<sup>1</sup>⎤ ⎥⎥⎥⎥⎦ (22)

 7.2003

 6.1290

−5.6488

 5.5027

 Cascade Neural Network Genetic Algorithm for SO10 using the tribas activation function:

6.3822

*Y*4,*t*−<sup>1</sup>

*Y*

ˆ

*t* = ⎡ ⎢⎢⎣ −0.1116 5.1674 −0.0036 −0.0166 −3.4727 −0.2061 0.4570 0.0222 0.0743 0.0524 −0.2789 −1.0341 −0.0087 −0.0114 7.8267 0.1241 0.0817 0.3550 0.1215 0.0309 −0.8104 −7.8058 −0.0085 0.0234 5.2240 0.0292 0.1364 0.0647 0.4179 0.1027 −0.1772 6.5887 −0.0242 −0.0490 6.7444 0.0271 0.0246 0.0067 0.0693 0.6512 ⎤ ⎥⎥⎦ ⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣ *Z*1 *Z*2 *Z*3 *Z*4 *Z*5 *Y*1,*t*−<sup>1</sup> *Y*2,*t*−<sup>1</sup> *Y*3,*t*−<sup>1</sup> *Y*4,*t*−<sup>1</sup> ⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦, *Zt* = *tribas* ⎡ ⎢⎢⎢⎢⎣ 7.3483 −2.1911 −8.1641 0.4300 −3.6214 −1.5728 −2.5029 −10.1020 0.5066 7.0828 −5.3810 −9.9016 7.7806 −2.1473 5.0866 −1.1201 10.2142 6.6401 6.7203 4.0480 −2.9485 8.1185 7.2550 8.2102 −8.3387 ⎤ ⎥⎥⎥⎥⎦ ⎡ ⎢⎢⎢⎢⎣ 1 *Y*1,*t*−<sup>1</sup> *Y*2,*t*−<sup>1</sup> *Y*3,*t*−<sup>1</sup> *Y*4,*t*−<sup>1</sup>⎤ ⎥⎥⎥⎥⎦ (23)
