2.2.1. Classical Analysis

The classical monitoring strategy for air quality uses individual time series, descriptive statistics, box plots, autocorrelation analysis, etc., to determine if any of the values fall outside of the limits and to analyse trends [35,36]. In general, classical statistical analysis seeks to describe the distribution of a measurable variable (descriptive statistics) and to determine the consistency of a sample drawn from an initial population (inferential statistics). In addition, classical analysis is based on repetition; one must measure properties of objects and try to predict the frequency of occurrence of results when the measuring operation is repeated at random or stochastically.

This type of analysis determines the empirical frequency distribution that yields the absolute or relative frequency of the occurrence of the different possible results of the repeated measurement of a property of an object (discrete case). Instead, if the case is an infinitely repeated and arbitrarily precise measurement and every outcome is diffferent, the relative frequency of a single outcome would not be very instructive; the distribution function is used, which, for every numerical value *x* of the measured variable, yields the absolute (or relative) frequency of the occurrence of all values smaller than *x* [37].

### 2.2.2. Statistical Process Control

By applying statistical process control (SPC) methods to the monitoring of a system, it is possible to detect outliers. This study is concentrated on significantly high and low measurements, even in situations where the values do not exceed the established limit. These methodologies can be used to study individual observations, using individual or average charts.

The dataset should be partitioned into rational subgroups, minimising the probability of large differences between subgroups [38]. The formation of rational subgroups is important, because variation within subgroups can be clustered and the presence of special causes of variability can be easily detected. However, sometimes it is not practical to use rational subgroups; for example, when repeated measurements in the process differ only by laboratory or analytical error. Even when automated inspection is used, because every unit manufactured is analysed.

The rational subgroups represent the way of collecting the data. Usually, they should be gathered so that each of them shows only the inherent variation that is natural for the process (*common cause variation*). Because they contribute to identifying any other source of variation (*special cause variation*) that may badly affect the process, the subgroups should avoid special-cause variation where possible. Moreover, the limits on a control chart, which mark the boundary to identify if a process is too volatile, are calculated on the basis of the variability within each subgroup. For this, only subgroups that reproduce the common cause variation in a process should be selected.

Once the data are correctly structured, a normality test has to be done. If the hypothesis of normality is rejected, there are two possibilities: use modified classical techniques to non-normal distributions [39], or transform the data to normalise the dataset [40]. The second option applies a Box-Cox transformation [41], which smoothes the data structure. The most widely used and known transformation is the Box-Cox transformation, defined as follows:

$$X\_j^{(\lambda)} = \begin{cases} \frac{X\_j^{\lambda} - 1}{\lambda}, & \text{if } \lambda \neq 0\\ \log(X\_j), & \text{if } \lambda = 0 \end{cases}$$

where *λ* maximises the profile likelihood function of the data *Xj*.

A classical analysis process can be divided in two main stages: the learning stage, when a test of normality is performed and atypical measurements are deleted from the data; and the control stage, when the trends are analysed to encounter *out-of-control* situations. At the first one, it is when the centreline (CL) and control limits are defined. Specifically, the CL is defined with the control sample and represents the objective value. In addition, the warning limits are set at a distance of ±2*σ* from it, and control limits at ±3*σ*, with *σ* being the standard deviation of the process [42].

Shewhart control charts have been the most widely used due to their good performance in detecting large changes in a process. However, because these charts use the most recent samples, they do not efficiently detect small or progressive changes in a process. In this regard, complementary rules are needed; multiple authors have defined different rules to detect specific deviations [43,44] and to complement the initial control rules [45]. The use of these supplementary rules makes Shewhart's control charts more sensitive and leads to an improvement in one's capacity to detect non-random patterns.

A widely used way to quantify the potential of a control chart with supplementary rules is through the average run length (ARL). The ARL, in control charts, is the average number of points that should be analysed before showing an alert warning that the process is not under control. When this occurs, the efficient thing to do would be to detect it as soon as possible. On the contrary, when the process is statistically stable, it would be appropriate to have few false alarms. This term is directly related to a Type I error (also known as *α*) and a Type II error (also known as *β*), which also describe the sensitivity of the method, and it is highly related to the number of false alarms. For that reason, it must be contemplated that if the capacity of this methodology to detect out-of-control situations is high, there will be a lot of false alarms [43].

### 2.2.3. Functional Data Analysis

The functional data analysis works with the observations that come from a continuous random process that is evaluated at discrete points [46]. Starting from vector samples, the dataset will be transformed into a functional sample. The first step is to construct the most appropriate curves from the initial points that come from the discrete values measured in the study. This process, known as smoothing, converts the vector values into continuous functions over time. This structure of data is essential in the air pollution context because it is taking into account all the values in the day as a set. In this way, a day in which NO2 values are obtained with a lot of variability but which has an average similar to the other days, is not detected from any vectorial approach. Functional analysis would identify these types of days as candidates for outliers. Furthermore, in several similar studies in which they also tried to detect outliers with data from certain gases (see [23,47]), the superiority of functional approaches was demonstrated.

In a situation where the initial observations *x*(*tj*) are contained in a set of *np* points, *tj* ∈ R represents the time steps and *np* is the number of observations (*j* = 1, 2, . . . , *np*). They can be watched as the individual values of the function *x*(*t*) ∈ X ⊂ *F*, *F* being a functional space. The estimation of *x*(*t*) takes into account a functional space *F* = *span*{*φ*1,..., *φnb*}, where *φk* is the set of basis functions (*k* = 1, 2, . . . , *nb*) and *nb* is the number of basis functions required to build a functional sample. There are several types of basis in statistics, but the most used one is the Fourier basis [26,48]. Moreover, with periodic data such as we have in this study, Fourier bases are the most appropriate [49]. Smoothing consists of finding a solution to the regularisation problem [46],

$$\min\_{\mathbf{X}\in\mathcal{F}}\sum\_{j=1}^{n\_P} \{z\_j - x(t\_j)\}^2 + \lambda\Gamma(\mathbf{x})\tag{1}$$

with *zj* = *x*(*tj*) <sup>+</sup> *j* being the result of observing x at the point *tj*;  *j* the random noise with zero mean, Γ being a penalising operator focused on obtaining the simplest possible solutions; and *λ* being a parameter that defines the level of the regularisation. The initial step is the following expansion

$$\mathbf{x}(t) = \sum\_{k=1}^{n\_b} c\_k \phi\_k(t) \tag{2}$$

where {*ck*}*nb k*=1 are coefficients that multiply the selected basis functions. The smoothing problem can be written as follows:

$$\min\_{\mathcal{L}} \{ (z - \Phi \mathcal{c})^T (z - \Phi \mathcal{c}) + \lambda \mathcal{c}^T \mathcal{R} \mathcal{c} \} \tag{3}$$

with a vector of observations *z* = (*<sup>z</sup>*1, ... , *znp* )*T*; a vector of coefficients of the expansion *c* = (*<sup>c</sup>*1, ... , *cnb* )*T*; a (*np*, *nb*)-matrix Φ whose elements are <sup>Φ</sup>*jk* = *φk*(*tj*); and a (*nb*, *nb*)-matrix **R** whose elements are:

$$R\_{kl} = \langle D^2 \Phi\_{\mathbf{k}}, D^2 \Phi\_l \rangle\_{L\_2(T)} = \int\_T D^2 \Phi\_{\mathbf{k}}(t) D^2 \phi\_l(t) dt \tag{4}$$

Finally, the problem is solved [46] according to:

$$\mathcal{L} = (\Phi^\prime \Phi + \lambda \mathcal{R})^{-1} \Phi^\prime z \tag{5}$$

As soon as the data are in functional form, they can be analysed to identify pollution episodes and detect outliers. The functional data allow us to identify whether different periods of time such as days, weeks or months are above the mean function and how much they are deviating. Moreover, it permits the elimination of outliers which are not real; they are due to system fails. The depth concept provides a way for ordering a set of data, contained in a euclidian space, according their proximity to the centre of the sample.

The concept of depth appeared in multivariate analyses, and was created to measure the centrality of a point among a cloud of them [50,51]. Over the years, this concept began to be introduced into functional data analysis [52]. In this field, depth represents the centrality of a certain curve **x***i* and the mean curve is the centre of the sample. Two depth measurements very common in the context of functional data are: Fraiman-Muniz depth (FMD) [25] and H-modal depth (HMD) [52].

Therefore, the identification of outliers with a functional approach is possible with the calculation of depths. In this case, elements that have different behavioural patterns than the rest will be taken into account. The concept of depth makes it possible to work with observations, defined in a given time interval, in the form of curves, instead of having to summarise the observations of the curve into a single value, such as the mean. This method of outlier detection is based on depth measures and

centrality: an element that is far from the centre of the sample will have a low depth. Thus, the curves with the least depth are the functional outliers.

Firstly, the cumulative empirical distribution function *Fn*,*<sup>t</sup>*(*xi*(*t*)) of the values of the curves {*xi*(*t*)}*ni*=<sup>1</sup> in a certain time *t* ∈ [*a*, *b*] it is contemplated. It can be defined as

$$F\_{n,t}(\mathbf{x}\_i(t)) = \frac{1}{n} \sum\_{k=1}^n I(\mathbf{x}\_k(t) \le \mathbf{x}\_i(t)) \tag{6}$$

with *<sup>I</sup>*(·) being a indicator function. Subsequently, the FMD for any curve *xi* within a set of curves *x*1,..., *xn* is calculated as

$$FMD\_n(\mathbf{x}\_i(t)) = \int\_a^b D\_n(\mathbf{x}\_i(t))dt\tag{7}$$

where *Dn*(*xi*(*t*)) is the depth of the curve *xi*(*t*), ∀*t* ∈ [*a*, *b*], expressed as

$$D\_n(\mathbf{x}\_i(t)) = 1 - \left| \frac{1}{2} - F\_{n,t}(\mathbf{x}\_i(t)) \right| \tag{8}$$

On the other hand, for HMD the functional mode is the element or curve most densely surrounded by the other curves in the dataset. HMD is defined as

$$HMD\_{\rm ll}(\mathbf{x}\_{i}, h) = \sum\_{k=1}^{n} K\left(\frac{||X\_{i} - X\_{k}||}{h}\right) \tag{9}$$

with a kernel function *K* : *R*<sup>+</sup> → *R*+, a bandwidth parameter *h* and · as the norm in a functional space. Among all the norms, in the most cases, it is used the norm *L*2, expressed as

$$\|\|\mathbf{x}\_{i}(t) - \mathbf{x}\_{j}(t)\|\|\_{2} = \left(\int\_{a}^{b} (\mathbf{x}\_{i}(t) - \mathbf{x}\_{j}(t))^{2} dt\right)^{1/2} \tag{10}$$

In addition, also exist several options for the kernel functions *<sup>K</sup>*(·). A widely used one is the truncated Gaussian kernel, expressed as

$$K(t) = \frac{2}{\sqrt{2\pi}} \exp\left(-\frac{t^2}{2}\right), \ t > 0 \tag{11}$$

In this paper, the depth chosen to identified outliers is the HMD. The bandwidth *h* is the value that leaves, below it, 15% of the data coming from the distribution of {*xi*(*t*) − *xt*(*t*)2, *i*, *j* = 1, ... , *n*} [15], and the cut-off C is selected, specifically, to have a 1% Type I error [50], according to

$$\Pr(HMD\_n(\mathbf{x}\_i(t)) < \mathbb{C}) = 0.01, \ i = 1, \ldots, n \tag{12}$$

The cut-off C has to be estimated because the distribution of the functional depth is not known. There are several ways to carry out this estimation; however, the bootstrapping method is the most appropriate for the purpose of this research [52,53]. The steps to follow are:

