*Article* **Unorganized Machines to Estimate the Number of Hospital Admissions Due to Respiratory Diseases Caused by PM10 Concentration**

**Yara de Souza Tadano 1,\* , Eduardo Tadeu Bacalhau <sup>2</sup> , Luciana Casacio <sup>2</sup> , Erickson Puchta <sup>3</sup> , Thomas Siqueira Pereira <sup>4</sup> , Thiago Antonini Alves <sup>4</sup> , Cássia Maria Lie Ugaya <sup>5</sup> and Hugo Valadares Siqueira <sup>3</sup>**


**Abstract:** The particulate matter PM10 concentrations have been impacting hospital admissions due to respiratory diseases. The air pollution studies seek to understand how this pollutant affects the health system. Since prediction involves several variables, any disparity causes a disturbance in the overall system, increasing the difficulty of the models' development. Due to the complex nonlinear behavior of the problem and their influencing factors, Artificial Neural Networks are attractive approaches for solving estimations problems. This paper explores two neural network architectures denoted unorganized machines: the echo state networks and the extreme learning machines. Beyond the standard forms, models variations are also proposed: the regularization parameter (RP) to increase the generalization capability, and the Volterra filter to explore nonlinear patterns of the hidden layers. To evaluate the proposed models' performance for the hospital admissions estimation by respiratory diseases, three cities of São Paulo state, Brazil: Cubatão, Campinas and São Paulo, are investigated. Numerical results show the standard models' superior performance for most scenarios. Nevertheless, considering divergent intensity in hospital admissions, the RP models present the best results in terms of data dispersion. Finally, an overall analysis highlights the models' efficiency to assist the hospital admissions management during high air pollution episodes.

**Keywords:** PM10; health risks; extreme learning machine; echo state network; neural networks

#### **1. Introduction**

World Health Organization (WHO) estimates that 91% of the world's population lives in places where air pollution levels exceed the advised limits. This exposure has as a consequence 4.2 million deaths per year due to stroke, heart disease, lung cancer and chronic respiratory illness [1].

In the last decades, the air pollution consequences in the environment and health have been the subject of deep researches [2–4], including the relation between air pollution and human health [5–8] and, specifically, the study of particulate matter (PM) impacts on the respiratory diseases [9–11]. The public health system is currently the main concern for the global governance majority, receiving huge money investments and boosting researches in

**Citation:** Tadano, Y.d.S.; Bacalhau, E.T.; Casacio, L.; Puchta, E.; Pereira, T.S.; Antonini Alves, T.; Ugaya, C.M.L.; Siqueira, H.V. Unorganized Machines to Estimate the Number of Hospital Admissions Due to Respiratory Diseases Caused by PM10 Concentration. *Atmosphere* **2021**, *12*, 1345. https://doi.org/ 10.3390/atmos12101345

Academic Editors: Hsiao-Chi Chuang and Alina Barbulescu

Received: 17 July 2021 Accepted: 6 October 2021 Published: 14 October 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

operational areas. Therefore, several works have been applied to develop mathematical models to improve predicting the diseases caused by PM air concentration.

Generalized Linear Models (GLM) [10–14] and Generalized Additive Models (GAM) [15,16] are statistical regression models usually used to assess air pollution consequences on human health. However, a minimum of data is required to assure that regression models will be able to capture the relationship between the inputs (predictors) and the output (response variable) [17]. For developing countries, as lack of data is a reality, solving the problem using regression models is challenging [18]. For this reason, other models and methods have been applied; since the problem can be seen as a nonlinear mapping task, the Artificial Neural Networks (ANN) approach is the most attractive approach for solving estimation problems. The ANN have been used to solve air pollution mapping tasks [19–22], and they have become increasingly popular over the past decade for predicting the air pollutant's impact on human health [10,17,18,23–25]. Araujo et al. [17] and Kassomenos et al. [24] have shown that the ANN had better performance than linear approaches like the GLM when dealing with nonlinear mapping problems. In this context, Tadano et al. [26] proposed to use two models, known as Unorganized Machines (UM): the echo state networks (ESN) and the extreme learning machines (ELM), to predict hospital admissions. Based on this work, this paper presents a full extension of these models, adding several neural networks variations applied to an enlarged and updated set of instances.

ELM and ESN are ANN architectures used to deal with static nonlinear mapping problems, and are reliable when applied to multiclass classification and, mainly, time series forecasting [27–31]. Thus, the main contribution of this research is an epistemological study that predicts the impact of PM10 (particulate matter with an aerodynamic diameter less than 10 μm) daily mean concentrations on hospital admissions due to respiratory diseases using versions of the UM: the addition of regularization parameter applied to increase the generalization capability of the models [32] and the use of the Volterra filter to capture nonlinear patterns of the neural information [33]. To evaluate the performance of the proposed methods, three cities from São Paulo State, Brazil (Campinas, Cubatão and São Paulo city) were considered.

Based on the overall analysis produced, we expect to understand how air pollution affects the health system, especially during global sanitary crises scenarios, avoiding hospital collapse.

This work is organized as follows: Section 2 presents the ELM and ESN standard models, the regularization parameter and nonlinear output layer strategies; Section 3 describes the addressed databases; Section 4 shows the computational results and critical analysis regarding the models' performances; Section 5 presents the main conclusions and future works.

#### **2. Unorganized Machines**

Unorganized machines are a designation used as a general term to classify the modern neural network paradigms that unify two kinds of ANN: the echo state networks (ESNs) and the extreme learning machines (ELMs) [27].

In this work, these two architectures are employed to predict the hospital admissions due to respiratory diseases caused by air pollution. Moreover, other models based on the variations and extensions of these models are used [33,34].

#### *2.1. Extreme Learning Machines*

The extreme learning machine (ELM) is a feedforward neural network composed of a single hidden layer, similar to the structure of multilayer perceptron (MLP) [28]. Figure 1 illustrates the architecture.

**Figure 1.** Extreme Learning Machine.

According to Figure 1, the vector **u***<sup>n</sup>* represents all input information: PM10 concentration; relative humidity; ambient temperature; the different weekdays; and holidays. This vector **u***<sup>n</sup>* is associated with the matrix **W***<sup>h</sup>* through weights of the hidden layer that can be randomly determined. The unique output layer (readout) **W***out* is composed of parameters of a linear combiner that are calculated using the Moore-Penrose generalized inverse operator which shall be defined below. Finally, similar to a single-hidden layer multilayer perceptron (MLP), the ELM is also a single hidden layer feedforward neural network, being *yn* the output information that indicates the number of hospital admissions.

The activation of the artificial neurons within the hidden layer are given by Equation (1):

$$\mathbf{x}\_n^h = f\_n^h(\mathbf{W}^h \mathbf{u}\_n + \mathbf{b}),\tag{1}$$

being **u***<sup>n</sup>* = [*un*, *un*−1, ... , *un*−*K*−1] *<sup>T</sup>* the vector that contains the *<sup>K</sup>* input signals, **<sup>W</sup>***<sup>h</sup>* <sup>∈</sup> R*N*×*<sup>K</sup>* the linear input coefficients, **b** the vector that represents the biases of the hidden units and *f <sup>h</sup>*(.)=(*f <sup>h</sup>* <sup>1</sup> (.)), *<sup>f</sup> <sup>h</sup>* <sup>2</sup> (.), ... , *<sup>f</sup> <sup>h</sup> <sup>N</sup>*(.) the activation functions of the hidden neurons. Then, Equation (2) presents the network outputs calculation:

$$y\_n = \mathbf{W}^{out} \mathbf{x}\_{n\prime}^h \tag{2}$$

where **W***out* is the output matrix.

The output layer (readout) adjustment is the main advantage of ELM models. This strategy is applied only once, considering the error signal [35,36]. Moreover, in dissonance with the traditional feedforward neural networks, when the intermediate activation functions are continuously differentiable, these models can choose the weights of the hidden layer randomly [36–38]. Huang et al. demonstrate that ELMs are universal approximators [39].

These structures are composed of a simple training process, mainly requiring the calculation of the parameters of a linear combiner using the Moore-Penrose generalized inverse operator, as in Equation (3) [36,37,40,41]:

$$\mathbf{W}^{out} = (\mathbf{X}\_h^T \mathbf{X}\_h)^{-1} \mathbf{X}\_h^T \mathbf{d}\_\prime \tag{3}$$

where **<sup>X</sup>***<sup>h</sup>* <sup>∈</sup> <sup>R</sup>*Ts*×*<sup>N</sup>* is the matrix composed of the intermediate layer outputs and *Ts* is the training sample numbers, (**X***<sup>T</sup> <sup>h</sup>* **<sup>X</sup>***h*)−1**X***<sup>T</sup> <sup>h</sup>* is the pseudoinverse of **<sup>X</sup>***<sup>h</sup>* and **<sup>d</sup>** ∈ *<sup>R</sup>Ts*×<sup>1</sup> is the vector composed of desired outputs.

#### *2.2. Echo State Networks*

Echo state networks (ESN) are recurrent neural models known by an effortless training process: the dynamical reservoir (intermediate layer) is fixed, i.e., there is no iterative adjustment. In this sense, the synaptic weights of the reservoir do not use the error function derivatives. Thus, only the output layer is effectively adapted [42]. The adaptation process applies a linear regression scheme similar to the ELM training process, considering that a linear combiner is often applied to the output layer. The neural network structure of ESN can be seen as a general case of ELM because the reservoir presents recurrent loops. Figure 2 illustrates the structure.

**Figure 2.** Echo state networks.

Figure 2 shows that the network structure is slightly similar to the ELM model presented in Figure 1, except by the additional input layer (**W***in*), defined as a linear matrix, and feedback loops in the intermediate layer (hidden layer).

Equation (4) expresses the activation of the internal neurons. This activation represents the network states which are influenced by the previous state and the present input:

$$\mathbf{x}\_{n+1} = f(\mathbf{W}^{in}\mathbf{u}\_{n+1}\mathbf{W}\mathbf{x}\_n) \tag{4}$$

where *f*(.)=(*f*1(.), *f*2(.), ... , *fN*(.)) gives the activation functions of all neurons within the reservoir, **<sup>W</sup>***in* <sup>∈</sup> <sup>R</sup>*N*×*<sup>K</sup>* is the input weight matrix and **<sup>W</sup>** <sup>∈</sup> <sup>R</sup>*N*×*<sup>N</sup>* is the recurrent weight matrix.

The linear combinations of the reservoir signals produce the ESN outputs by (5):

$$\mathbf{y}\_{n+1} = \mathbf{W}^{out} \mathbf{x}\_{n+1} \tag{5}$$

where **<sup>W</sup>***out* <sup>∈</sup> <sup>R</sup>*O*×*<sup>N</sup>* is the output weight matrix, and *<sup>O</sup>* the number of outputs. The parameters of the **W***out* are determined by Moore-Penrose generalized inverse described in Section 2.1.

Fundamentally, the network model, besides a stable behavior, should present an internal memory that preserves the input signals history formed in the dynamical reservoir [29,35,43]. Both features are contemplated by echo state property (ESP) [29,35,43].

Jaeger et al. suggest in [29] to simplify the weight matrix *W*, denoting *wij* as 0, 0.4 and −0.4 values with probabilities 0.95, 0.025 and 0.025, respectively. On the other hand, Ozturk et al. (2006) suggest a new design for the dynamical reservoir [44] that considers eigenvalues uniformly spreading in the weight matrix. Both approaches are applied in this work.

Having described the unorganized machines in the standard forms, the following subsections describe the variations and extensions which design structures of new models also applied to the proposed problem.

#### *2.3. Regularization Parameter*

Primarily proposed by Huang et al. (2011), the regularization strategy aims to improve the model's generalization capability, inducing the solutions obtained by a parameter applied to the Mean Square Error (MSE) cost function. The parameter *C* is chosen from a validation set of samples, assuming *<sup>C</sup>* <sup>=</sup> <sup>2</sup>*λ*, with *<sup>λ</sup>* discretized in the interval [−25, 26] [32]. The strategy is performed during the interactive process, where all parameters are tested, and only one is selected according to the best MSE validation, via Expression (6):

$$\mathbf{W}^{\rm out} = \left(\frac{\mathbf{I}}{\mathbf{C}} + \mathbf{X}\_h^T \mathbf{X}\_h\right)^{-1} \mathbf{X}\_h^T \mathbf{d}\_\prime \tag{6}$$

being *C* the regularization parameter and **I** the identity matrix.

Trying to improve generalization capability given by the parameter *C*, Kulaif et al. (2013) developed a local search, denoted golden search, to determine better values for the parameter *C*. The strategy is grounded in two main concepts: significant modifications are obtained in the final solutions if any small parameter variations occur; the function given by each small interval associated with the parameter *C* and the validation error shall be supposedly quasi-convex [45]. This strategy is also applied in this work.

#### *2.4. Nonlinear Output Layer*

Boccato et al. (2011) proposed a variation of nonlinear output layer in ESNs, the Volterra filtering structure [46]. The main concern is to prove the linear dependence between the dynamical echo states, preserving the training process simplicity for the networks. The output signals can be computed through linear combinations of polynomial terms, as in Equation (7) [27]:

$$y\_{i,n} = h^0 + \sum\_{p=1}^{M} h\_p^1 \mathbf{x}\_{p,n} + \sum\_{p=1}^{M} \sum\_{q=1}^{M} h\_{p,q}^2 \mathbf{x}\_{p,n} \mathbf{x}\_{q,n} + \sum\_{p=1}^{M} \sum\_{q=1}^{M} \sum\_{r=1}^{M} h\_{p,q,r}^3 \mathbf{x}\_{p,n} \mathbf{x}\_{r,n} + \dots \tag{7}$$

where *xi*,*<sup>n</sup>* is the output of the *i* − *th* neuron of the reservoir (or the *i* − *th* echo state) at *n* − *th* time instant, *h<sup>m</sup>* the linear combiner coefficient with *m* = 1, ... , *M*, and *M* the polynomial expansion order.

Similar to Equation (3), the training process simplicity is preserved due to the linear dependence of the outputs regarding the filter parameters. In terms of least squares, Equation (7) guarantee the closed-form solution, allowing the Moore-Penrose inverse operation [47].

However, according to Boccato et al. (2011), the application of a Volterra filter might have as consequence the uncontrollable growth of free parameters and inputs numbers. To prevent these problems, a compression technique known as Principal Component Analysis (PCA) must be applied. Interestingly, the use of PCA is also suitable to avoid the redundancy between echo states [29,48]. In recent years, Chen et al. extended this idea to the ELMs, considering the same premises of the former work [48,49].

All parameters associated with the proposed models: the number of neurons, Volterra Filter orders, the weight values, and the number of simulations, shall be described in Section 4.

#### **3. Case Studies**

To evaluate the approach, three cities of São Paulo state, Brazil, with different characteristics, were considered: São Paulo, Campinas and Cubatão. The data set of daily PM10 concentration [μg/m3], relative humidity [%], and ambient temperature [◦C], were obtained on the Environmental Sanitation Technology Company website [50].

The Brazilian National Health System provides data about the daily hospital admissions due to respiratory diseases (RD). The data set considered in this study, available in [51], comprises the International Classification of Diseases 10 (ICD-10)-J00 to J99. In this work, the database was organized as a daily format and separated by the ICD-10 diagnosis.

According to the Brazilian Institute of Geography and Statistics (IBGE) [52], São Paulo City, the largest city in Brazil, has almost 12 million people (data of 2010) in 1500 km2, which is 7398.26 inhabitants per km2. The average climate is tropical, about 28 ◦C in summer and 12 ◦C in winter [50]. This study considers the period from January 2014 until December 2016. The total number of hospital admissions for respiratory diseases during the studied period, for São Paulo city, was 159,683 occurrences. With regards to the PM10 concentration, only four out of twelve air quality monitoring stations had PM10 data. In addition, only one station presented less than 100 days of lack of data. To deal with this problem, data from another similar station were used to replace them.

Campinas City is the third most populous city of São Paulo State, with a population of approximately 1,1 million people (data of 2010) spread in 795.7 km2, a demographic density of 1359.6 inh/km2 [52]. The climate is tropical with dry winter and rainy summer with an average of 37 ◦C during summertime. For this city, the data set considered data from January 2017 to December 2019, comprising 15,464 hospital admissions for respiratory diseases. In this case, two of three air quality monitoring stations presented PM10 data, however, one had no data for 2019. So, the only station with less missing data (145 days lack) was used.

Cubatão has an estimated 118,720 inhabitants with 142.8 km2 and 831 inh/km<sup>2</sup> [52]. In the past, it was one of the most global polluted cities because of its large industrial park and for being surrounded by mountains, which makes the air dispersion hard. In the 1980s, the United Nations considered Cubatão the most polluted city in the world. After that, a government, industries and community effort controlled 98% of the air pollutants level in the city [53]. The current experiments considered the data from January 2017 to December 2019, a total of 802 hospital occurrences. For this city, all three air quality monitoring stations had PM10 available data. However, only the station with more available data was used, with 158 missing days.

A tendency to decrease hospital admissions on the weekends and holidays is a usual situation. For this reason, the day of the week and holidays were considered as two categorical variables [54]. Thus, in addition to the PM10 daily mean concentrations, ambient temperature (*T*) and relative humidity (*RH*), the day of the week identifications (1 for Sunday to 7 for Saturday), and a binary flag (*h*) to recognize if the day is a holiday, were used.

Another important feature is the lag effect of air pollution on human health [10,17,26,55]. A common practice is to consider the effect up to seven days after exposure to air pollution, where lag 0 is the effect on the same day of the exposure, and lag 7 is the effect after seven days of the exposure [54].

Table 1 presents the descriptive statistics for the target (respiratory diseases-RD) and the inputs: PM10 concentration, temperature and relative humidity, for each city. All these variables are differed by average, standard deviation and minimum and maximum values.


**Table 1.** Descriptive statistics for the variables.

Note that the cities have different patterns for the target. São Paulo hospitalizations have a wide dispersion, with 9 to 409 daily hospital admissions. Campinas ranges from 3 to 37, while Cubatão, the smallest studied city, has a maximum of eight hospitalizations. It is necessary to highlight that the databases comprise only data from the public health system, not considering data from health insurance and private units.

The maximum daily PM10 concentration for Cubatão (148 μg/m3) draws attention, because it is almost thrice the WHO 24-hours average limit of 50 μg/m<sup>3</sup> (Table 1) [56]. Despite that, the hospital admissions are very low (daily maximum of occurrences) since a significant part of the workers of Cubatão live in São Paulo, which is around 63 km far. The hospital admissions might also depend on the air pollutants dispersion pattern and the local population. São Paulo and Campinas maximum daily PM10 concentrations are lower than Cubatão, but they are also above the WHO limit of 50 μg/m<sup>3</sup> (São Paulo-maximum daily of 97 μg/m3; Campinas-maximum daily of 84 μg/m3) [56].

Since the data set described is large with high variability, it may contain multicollinearity or near-linear dependence among the variables. Multicollinearity occurs when two or more inputs (independent variables) are highly correlated affecting the estimate precision. [57]. To evaluate the data set, the Variance Inflation Factor (VIF) shall be used to diagnose the multicollinearity. VIF is calculated by an inflation of the regression coefficient for a independent variable, assessing its correlation to the dependent variables, and modeling the future relation between them. Then, the VIF for each *jth* factor can be calculated as:

$$\text{VIF}\_{\text{j}} = \frac{1}{1 - R\_{\text{j}}^2} \quad \text{\text{\textdegree}} \tag{8}$$

where *R*<sup>2</sup> *<sup>j</sup>* is the multiple determination coefficient obtained from regressing each independent variable on the others. If VIF exceeds 5, it is an indicator of multicollinearity [57].

In this work, R Studio (R version 4.1.0 (2021-05-18)—"Camp Pontanezen" Copyright (C) 2021 The R Foundation for Statistical Computing Platform: *x*8664 − *w*64 − *mingw*32/*x*64(64 − *bit*)) was used to calculate VIF. The results are presented in Table 2, showing no multicollinearity between the inputs of each case study.

**Table 2.** VIF test results for multicollinearity.


In the next section, the proposed models are applied to the presented data, producing a fulfilled analysis of the numerical results obtained.

#### **4. Results and Critical Analysis**

The following items describe all models developed to obtain the numerical results in order to evaluate the approach's effectiveness:


The experimental procedure follows the steps summarized in Figure 3:

**Figure 3.** Neural networks appliance steps.

The process begins by collecting the data in the mentioned repositories. Before the insertion of the samples in the neural networks, a normalization procedure is performed due to the limits of the activation function saturation [58]. After the training samples are inserted in the model in order to adjust their free parameters, observing the decrease of the output error. During this process, cross-validation is performed to increase the system generalization capability.

When the training ends, the test samples are inserted in the ANN after the input normalization. The neural response is stored, the normalization is reversed and, finally, the model output is available, which allows the calculation of the models' error. In this work, all models codes were developed in the MATLAB language.

In the training step, the parameters were defined as follows:


This work addressed three error metrics to evaluate the solutions quality: Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE), given by (9)–(11), respectively:

$$\text{RMSE} = \sqrt{\frac{1}{N} \sum\_{n=1}^{N} (d\_n - y\_n)^2} \,\text{}\tag{9}$$

$$\text{MAE} = \frac{1}{N} \sum\_{n=1}^{N} |d\_n - y\_n|\_{\prime} \tag{10}$$

$$\text{MAPE} = \frac{1}{N} \sum\_{n=1}^{N} \left| \frac{d\_n - y\_n}{d\_n} \right| \times 100 \,\text{\AA} \tag{11}$$

where *dn* is the actual value, *yn* is the neural model response and *N* is the total number of samples.

Tables 3–5 present the computational performances achieved by the nine proposed models for each lag, considering each city. The results present the number of neurons (NN) used in the best performance and the error metrics: RMSE, MAE and MAPE. However, as it can be seen in Table 3 for Cubatão, the error metrics MAPE was not considered due to the expressive number of "zeros" for the actual value (*dn*). In the tables, the best results obtained for each error metric and the best model are highlighted in purple. Furthermore, the models highlighted in italic bold with stars are the models which obtained statistically similar results to the best one. This statistical test is described below.

A specific result analysis shows that ELM(RP) had the best results for the all calculated metrics for Cubatão in lag 2. Besides, ELM obtained the best results for Campinas, considering the error metrics RMSE and MAPE in lag 3, but for MAE, ELM(RP) achieved the best results in lag 0. For São Paulo, ELM obtained the smallest error values for different lags: RMSE in lag 2 and MAE in lag 1. Finally, ELM(RP) presented the smallest error metric MAPE in lag3.

Note that the best results obtained by the models sometimes were not replicated for all error metrics. This behavior was evident for São Paulo and Campinas, since the best lag and the best model were not always the same. Similar behavior can be observed in [17,59].

The pairwise Wilcoxon test was applied to evaluate if the results are statistically different considering the RMSE with 30 independent simulations [60]. In Tables 3 and 4, the models highlighted in bold with star tag achieved a *p*-value higher than 0.05, which means that there is no statistical difference between their results and the best one. For this reason, these models can be considered similar, in terms of performance, to the models that obtained the best results. For Campinas, the standard ELM and all ESNs presented equivalent performances, despite the numerical values being contrasts. For Cubatão, ELM and ELM(RP) results were also similar. At long last, for São Paulo the test did not show any statistical similarity among the models.

Figures 4–6 show the boxplot graphic regarding the RMSE values for each city and the lag associated with the best result.


**Table 3.** Results for Cubatão (Number of neurons-NN, RMSE and MAE for each model and lag).


**Table 4.** Results for Campinas (Number of neurons-NN, RMSE, MAE and MAPE for each model and lag).


**Table 5.** Results for São Paulo (Number of neurons-NN, RMSE, MAE and MAPE for each model and lag).

**Figure 4.** Boxplot graphic regarding the RMSE values for Cubatão-Lag 2.

Considering Cubatão, observe that the smallest dispersion was obtained by ELM (RP) model, which also presented the smallest average value, corroborating the observation from Table 3. The inclusion of the Volterra filter increases the dispersion and the average values for all standard models, representing a significant degree of deterioration in performance.

**Figure 5.** Boxplot graphic regarding the RMSE values for Campinas-Lag 3.

In Campinas' case, only the ELM performances will be considered in this specific analysis since all ESN obtained similar results according to the Wilcoxon test. However, Figure 5 illustrates all models to avoid any curiosity. The RP inclusion decreases the dispersion, while the Volterra filter showed an opposite behavior. Despite that, the best performance in terms of best results regarding 30 simulations was favorable to the use of Volterra filter instead of the RP (note the bottom value in the boxplot). Since the generation of the neurons' weights were random, the algorithms must run at least 30 times, and this fact directly implied a long tail for the boxplots, as can be seen in the Volterra models. Moreover, the best result obtained by ELM does not mean the best performance in terms of dispersion.

**Figure 6.** Boxplot graphic regarding the RMSE values for São Paulo-Lag 2.

For São Paulo, the general behavior of the standard models was similar to Campinas. The standard ELM achieved better general errors, even when considered the median value. The inclusion of the RP reduced the dispersion, but it decreased the probability of obtaining better results for the error metrics. On the other hand, the Volterra filter showed a worse performance in terms of dispersion. However, despite the ELM best results for error metric, the model presented a bad dispersion, including an outlier.

Table 6 presents a ranking of best error metric results considering all neural models in ascending order of development. Note that the draws regarding the winners mean that there was no statistical difference between the models. The last column represents the final ranking considering the three cities' results.


**Table 6.** Ranking of the models' performance.

The standard ELM was the best estimator in all cases as regards the error metric results, but for Cubatão, the results obtained by ELM(RP) were the same. The second and third positions show ESN O. and ESN J. models, respectively. However, despite the main contribution of RP is to increase the models' generalization capability, its use reduced the dispersion of the results, i.e., the models' predictability increased, except for Cubatão. Moreover, the ELM(RP) ranking position was deteriorated by the Campinas results, since all ESNs presented the same statistical performance. Dismissing these aspects, the model could be the second best.

Although the inclusion of the Volterra filter did not improve the performances, the idea of its application was to capture nonlinear patterns among the signals from the hidden layer. Despite the literature presents good performances for this method in correlated tasks [33], its use is not recommended in this case. Similarly, the inclusion of the reservoir designed by Jaeger or Ozturk et al. is not adequate to the problem.

Regarding the number of neurons in the hidden layers (dynamic reservoir), one can see miscellaneous neurons, with a high degree of variation. For Cubatão, the pattern noted was the models used hundreds of neurons in most cases. Interestingly, ESN J. and ESN O. often addressed up to 70 neurons. Moreover, it can be seen in Campinas' case, that the RP models used up to 35 neurons in all cases. Considering São Paulo, the ELM versions tended to use less than 25 neurons, similar to ESN J. Volt, ESN O. and ESN O. Volt. The others models addressed hundreds of neurons. This is a strong indication that a sweep in the neuron amount is needed because a clear pattern regarding this parameter was not found. Even considering the results of the models that presented a *p*-value large than 0.05, the number of neurons was variable.

In summary, the unorganized machines are particular cases of classic neural models, which the hidden weights are not adjusted. On one hand, the user may lose part of the approximation capability due to this characteristic; on the other hand, there are gains in terms of training effort and stability for the output values during the training, avoiding discrepancies. An important aspect is that these methodologies can be outperformed, depending on the problem. Regarding the use of RP or Volterra filter, the literature indicates that these strategies may increase the mapping capability of the neural models. However, this work showed that in specific cases these approaches did not present efficiency.

Figures 7–9 present the best evolution of the output response in comparison to the actual values.

**Figure 7.** The number of hospital admissions by day of the test set for ELM lag 2-Cubatão (observed versus estimated values).

Figure 7 shows that the prediction task seems to be more difficult when the output has a small range and many "zero" observations. In this case, as the overestimation was small, given that the observed values are zero, it did not interfere in hospital management. Otherwise, in Figure 8, since there were no "zero" observations, the ELM estimations could be considered a suitable performance, except in abrupt cases.

Finally, in Figure 9, ELM reached the smallest RMSE, but comparing with the observed data, it was more difficult to predict the abrupt decrease of hospital admissions occurred around day 70. On the other hand, the ELM(RP) could follow this tendency, but it over and underestimated the number of hospital admissions in many cases. These behaviors are directly related to the number of neurons used by each model since a reduction in this number limited the model approximation capability.

Regarding the best error metric to be used, RMSE seems to be a good strategy, since the error metric was reduced during the neural models training (adjustment) [17,18,61].

**Figure 8.** The number of hospital admissions by day of the test set for ELM lag 3-Campinas (observed versus estimated values).

**Figure 9.** The number of hospital admissions by day of the test set for ELM lag 2-São Paulo (observed versus estimated values).

Table 7 presents a summary of some notable studies showing the association between air pollutants concentration, morbidity (Hospital Admissions or Hospital Emergency) and mortality. This brief description relates the authors, geographic area, considered inputs and predicted variables, the applied methods, metrics, time base, and the best MAPE and RMSE observed for each study. Although these studies present suitable estimations and relevant contributions, they proposed different models, and applied to diverse worldwide places, using specific inputs to predict health effects. For this reason, a comparative analysis of these studies' performances is unfair, as Katri and Tamil [62] previously observed. However, some important aspects can be highlighted.

Two studies [62,63] did not use MAPE or RMSE as error metrics. Khatri and Tamil [62] aimed to compare the performance for peak and non-peak class prediction. The authors used percentage difference in this study and applied MLP, without any consideration about other methods' performance. Shakerkhatibi et al. [64] used other metrics (Delong Method) to compare the predictions using MLP and Conditional Logistic Regression.


**Table 7.** Summary of studies presenting air pollutant's associations with morbidity and mortality using ANN.


AUC-area

 under curve.

Considering the variety of applied methods (Table 7), and emphasizing the use of MLP, the performance comparison between ANN and regression models has proved the ANN superior performance. Inspired by these all aspects, the paper's authors believe that this present work, which explores the ELM and ESN models with variations from the RP and the Volterra filter to estimate hospital admissions due to respiratory diseases caused by air pollutants concentration, is a relevant contribution. However, given the harmful effects of PM on human health, and comparing the considered input variables used in the other studies, this work has some limitations, such as the use of only one air pollutant (PM10) and the lack of comparison with a statistical regression modeling.

#### **5. Conclusions**

This work predicted the hospital admissions due to respiratory diseases caused by the particulate matter PM10 concentrations using the extreme learning machines (ELM) and the echo state networks (ESN) in the standard forms and applying the variations from the regularization parameter (RP) and the Volterra filter. The estimates considered daily PM10 concentration, relative humidity, ambient temperature as inputs and predicted the daily hospital admissions for respiratory diseases.

Numerical results indicated the superior performance of the standard models, pointing to ELM as the best predictor for most scenarios. However, regarding Campinas city and the RMSE error metric, a statistical test demonstrated that ESN models were statistically similar when compared to the best one. Besides, a graphic analysis showed that the models with the inclusion of RP strategy presented a reduced dispersion, considering the abrupt variations in hospital admissions, while the Volterra filter showed an opposite behavior, indicating that its application was not suitable for this specific problem. Finally, completing the critical analysis, a ranking of performances classified the models regarding the error metrics for each city. This ranking rewarded the models with statistical similarity rather than models with good dispersion, highlighting the standard models in the first positions.

The application of Unorganized Machines to three different cities was essential to evaluate their good performance in predicting air pollution impacts on human health. An additional graphic analysis of the output response in comparison to the actual values, for the best models, evidenced the good performance of the neural networks to estimate the hospital admissions. This contribution may help governmental bodies and policymakers on the management of hospital planning, mainly during air pollution unfavorable climate periods. Moreover, the good performance of the models confirms the link between all input variables and the output values, verifying that the particulate matter, temperature and relative humidity are fundamental to obtain a good estimation.

A limitation of this study is the lack of large data sets that could bring more uniform performances between the studied cities. As a consequence of the lack of monitoring data, other pollutants variations such as PM2.5 cannot be studied.

Considering the continental dimension of Brazil and the characteristics of the different region's climates, it would be paramount to study all regions (states), a hard task due to the lack of monitoring all over the country. Further works shall consider hybrid modeling or ensembles, the use of deseasonalization techniques, and the appliance of other artificial neural networks. Since the ELM is admittedly susceptible to the neurons number changes in the hidden layer and the ESN model is considered robust in this regard, a comparison study should be conducted pointing to the training time required between these models.

**Author Contributions:** Conceptualization, Y.d.S.T., E.T.B., L.C., and H.V.S.; methodology, H.V.S. and Y.d.S.T.; software, H.V.S., T.S.P., E.P., and T.A.A.; formal analysis, Y.d.S.T., E.T.B., and L.C.; investigation, H.V.S., E.P., T.A.A., and Y.d.S.T.; resources, H.V.S.; data curation, E.T.B. and L.C.; writing—original draft preparation, E.T.B., L.C., and H.V.S.; writing—review and editing, Y.d.S.T. and C.M.L.U.; visualization, H.V.S. and T.A.A.; supervision, Y.d.S.T. and H.V.S.; project administration, H.V.S.; funding acquisition, H.V.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the National Council for Scientific and Technological Development (CNPq), grant number 405580/2018-5, and the APC was funded by DIRPPG/UTFPR/PG.

**Acknowledgments:** The authors thank the Brazilian agencies Coordination for the Improvement of Higher Education Personnel (CAPES)-Financing Code 001, Brazilian National Council for Scientific and Technological Development (CNPq), processes number 40558/2018-5, 315298/2020-0, and Araucaria Foundation, process number 51497, and Federal University of Technology-Parana (UTFPR) for their financial support.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

