*3.1. Classical Analysis*

To perform the classical monitoring strategy on air quality the individual time series, descriptive statistics, box plots and autocorrelation analysis were calculated to determine if any of the values fell outside of the limits, and to analyse trends. The descriptive statistical parameters of the dataset are shown in Table 1:

**Table 1.** Summary descriptive statistics of hourly NO2 concentrations from Blanchardstown air quality monitoring station in Dublin, Ireland. The statistical quartiles (Q1, Q2, Q3) and the interquartile range (IQR) are also displayed. Take into account that 0 μg/m<sup>3</sup> represents a missing or wrong value.


The descriptive statistical parameters in Table 1 show that the limit values are not exceeded. The next step of classical data analysis is present a time series of the hourly data in 2013 (Figure 1), ranging from the maximum value 153.48 μg/m<sup>3</sup> to the minimum value 0 μg/m3. From here it is possible to say that in any moment, the hourly upper limit (200 μg/m3) is not exceed and that the data have a high variability.

**Figure 1.** Individual time series of hourly NO2 concentrations from Blanchardstown air quality monitoring station in Dublin, Ireland. Software: Python [57].

Figure 2 presents a boxplot which graphically characterises the data groups of the NO2 concentration by quartiles. The diagram graphically displays the values of the first quartile (9.56 μg/m3); third quartile (42.33 μg/m3); the interquartile range (32.77 μg/m3); and some, in red, that are considered atypical.

**Figure 2.** Box-plot of hourly NO2 concentrations from Blanchardstown air quality monitoring station. The central and blue line represents the median, and the end of the whiskers are the quartiles (25% for the lower part and 75% for the upper part). The red dots represent the outliers. Software: Python [57].

Figure 3 presents the frequency of hourly concentrations of NO2, which, as can be seen, are biased by 0 values. Another weakness of this analysis is that, when data are poorly collected or no data are available, only two options remain: either delete these observations (data are lost) or replace them with 0 values.

Figure 4 shows the normal probability plot of the data, again affected by 0 values. A Kolmogorov-Smirnov test and Anderson Darling test were applied to compare NO2 concentrations to a standard normal distribution [54]. The null hypothesis is that the values have a standard normal

distribution. The alternative hypothesis is that the values do not have that distribution. The results obtained for both tests were p-values very close to 0, so, with a 5% significance level, statistical evidence of the non-normality of the data has been found. The test statistic is: max(*F*(*x*) − *<sup>G</sup>*(*x*)), where F(x) is the empirical cumulative distribution function and G(x) is the standard normal cumulative distribution function.

**Figure 3.** Frequency of hourly NO2 concentrations from Blanchardstown air quality monitoring station. Comparison of data distribution with normal. Software: Python [57].

**Figure 4.** Normal probability plot of hourly NO2 (QQ-plot) concentrations from Blanchardstown air quality monitoring station. Software: Python [57].

Other tests have been performed to check whether the data approaches any type of distribution: normal, generalised extreme value or Weibull and Rayleigh, but none have been acceptable with a null hypothesis at 5% significance. From the classic analysis of the data it must be concluded that there are no data that are outside the limit values. This classical method is limited to a time series analysis with regard to the assessment of trends (Figure 1), and although it allows for the identification of the main parameters within the data and how the data are distributed, is an incomplete method because it provides us with information that is too simple and does not take into account the correlation between hourly observations.

### *3.2. Statistical Process Control*

### 3.2.1. Control I-MR Charts with Individual Mean

To analyse the data using the SPC method, an individual-moving range chart (IMR chart) of hourly NO2 concentrations was made. With the examination of the results shown in Figure 5, it can be observed that the number of false alarms, i.e., outliers, is significant. This problem is attributable to:


**Figure 5.** Individual-moving range chart (IMR chart) X/R with mobile range of hourly NO2 concentrations from Blanchardstown air quality monitoring station. Software: R-programming [56].

By performing an autocorrelation analysis, it can be observed from Figure 6 that the data are very autocorrelated. This is very common in environmental data and shows that the autocorrelation has 24-hour cycles and decreases with time.

**Figure 6.** Sample autocorrelation function of hourly values for NO2 concentrations from Blanchardstown air quality monitoring station. Software: Python [57].

In Figure 6 the correlation of all data for the year is shown, while Figure 7 only shows the data of the first 86 h in order to see in more detail, the 24 h cycles. Due to the non-normality of the data and the data's autocorrelation, the control chart has a large number of false alarms. Therefore, SPC is not a very suitable method with which to detect outliers for NO2 concentration.

**Figure 7.** Sample autocorrelation function of hourly values for NO2 concentrations from Blanchardstown air quality monitoring station over the initial 86 h period. Software: R-programming [56].

3.2.2. Control Charts with Daily Rational Subgroups

The study of datasets choosing days as the rational subgroup of the X/s chart (every day is summed up by one point), is not under control due to the non-normality and the autocorrelation (see Figures 8 and 9). Although the chart is not under control, and there is much variability, none of the 365 days exceed the limit value (100 μg/m3).

**Figure 8.** Xbar-chart of hourly NO2 concentrations with the daily rational subgroup of data. Software: R-programming [56].

**Figure 9.** S-chart of hourly NO2 concentrations with the daily rational subgroup of data. Software: R-programming [56].
