Next Article in Journal
A Novel Dynamic Three-Level Tracking Controller for Mobile Robots Considering Actuators and Power Stage Subsystems: Experimental Assessment
Next Article in Special Issue
Particle Filter for Randomly Delayed Measurements with Unknown Latency Probability
Previous Article in Journal
Radio Frequency IDentification for Meat Supply-Chain Digitalisation
Previous Article in Special Issue
Temperature Sensor Denoising Algorithm Based on Curve Fitting and Compound Kalman Filtering
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Robust Estimation of Carbon Monoxide Measurements

1
Facultad de Ingeniería y Ciencias Aplicadas, Universidad de Las Américas, Quito 170125, Ecuador
2
Departamento de Matemática Aplicada a las Tecnologías de la Información y las Comunicaciones, ETS de Ingeniería y Sistemas de Telecomunicación, Universidad Politécnica de Madrid, 28031 Madrid, Spain
*
Author to whom correspondence should be addressed.
Sensors 2020, 20(17), 4958; https://doi.org/10.3390/s20174958
Submission received: 20 July 2020 / Revised: 14 August 2020 / Accepted: 19 August 2020 / Published: 2 September 2020

Abstract

:
This paper presents a robust analysis of carbon monoxide (CO) concentration measurements conducted at the Belisario air-quality monitoring station (Quito, Ecuador). For the analysis, the data collected from 1 January 2008 to 31 December 2019 were considered. Additionally, each of the twelve years analyzed was considered as a random variable, and robust location and scale estimators were used to estimate the central tendency and dispersion of the data. Furthermore, classic, nonparametric, bootstrap, and robust confidence intervals were used to group the variables into categories. Then, differences between categories were quantified using confidence intervals and it was shown that the trend of CO concentration at the Belisario station in the last twelve years is downward. The latter was proven with the precision provided by both nonparametric and robust statistical methods. The results of the research work robustly proved that the CO concentration at Belisario station in the last twelve years is not considered a health risk, according to the criteria established by the Quito Air Quality Index.

1. Introduction

Carbon monoxide (CO) is a colorless and odorless gas that when found in the air in large concentrations is harmful to both humans and animals. This gas is produced when fossil fuels are burned and, therefore, internal combustion engines and vehicles or machinery whose operating principle is based on burning fossil fuels are among the greatest sources of CO, which is an air pollutant of concern worldwide [1]. Additionally, gasoline-powered pressure washers, propane-powered forklifts, propane-powered resurfacing machines, and gasoline-powered appliances, among others, can cause CO poisoning when are not used correctly in some applications [2]. The most important source of CO is motor vehicle exhaust [3]. However, catalytic convertors have reduced automobile exhaust emissions of CO [4].
In addition, detonation of explosives employed in blasting can produce CO and people living near a blast site can be affected [2]. Furthermore, smoke-polluted environments and hookah smoking are also sources of CO exposure [2].
CO is a silent killer [4] and the type of CO poisoning that a human being can suffer will vary depending on the level of CO concentration to which they are exposed and the length of time that such exposure lasts. For example, a human being who is exposed to high levels of CO concentration for a long time may go into a coma or even die [4]. Moreover, the most common symptoms of CO poisoning include the following: headaches, dizziness, vomiting, nausea, dyspnea, chest pain, tachycardia, blurred vision, confusion, palpitations, dysrhythmias, cardiac arrest, myocardial ischemia, seizures, respiratory arrest, and coma, among others [4,5,6].
The aforementioned justifies the need to robustly analyze the information from CO measurement devices. In this sense, inferential statistical analysis plays a fundamental role, because this type analysis allows to estimate the central tendency and dispersion of the data and to determine confidence intervals for location estimates of the variables under study.
The main objective of this paper is to carry out the robust estimation [7,8,9] of a set of twelve years of CO concentration measurements performed at Belisario air-quality monitoring station in Quito (Ecuador) [10]. The time interval in which these measurements were performed is from 1 January 2008 to 31 December 2019.
Some previous research works in which the statistical analysis of CO concentrations has been carried out are the following; In order to study the quality of air in underground mines, a time series describing CO concentration in a copper ore mine in Poland, from 28 October 2014 to 28 December 2014, was analyzed [11]. To carry out the aforementioned analysis, statistical models were used. Also, several parametric distribution functions were considered, the least squares method was used to estimate parameters, the K-means algorithm was used to classify the CO concentration, and the missing observations were either filled by using interpolation based on adjacent values or the time periods corresponding with missing information were not taken into account.
The kernel principal component analysis was used [12] to extract the nonlinear mixed gas characteristics of different components, and the K-nearest neighbor algorithm was used to recognize the target gas. In [12], a gas identification and concentration detection method was presented and a multivariable relevance vector machine was used to detect the concentration of the hybrid gas. The aforementioned method was validated by using CO and methane (CH4).
An example of using probabilistic models for the analysis of air pollution variables can be found in [13]. In short, the Rasch probabilistic model was used in [13] to define a measure of atmospheric pollution, integrating pollutants such as CO, among others, and several climatic factors. The study presented in [13] was carried out in Southwest Spain and, for the analysis, data of pollutants were collected from 1 January 2016 to 31 December 2016. Furthermore, the mean value, standard deviation, and minimum and maximum values were used to assess the proposed probabilistic model.
Another example of statistical analysis of CO concentration in urban cities can be found in [14], where the study was performed in seven locations in Los Angeles Basin (California, USA), from 1955 to 1972. In this case, the statistical analysis of the trend of CO concentrations was carried out by using time series analysis, and the relationship between meteorological variables and CO concentrations was assessed.
Veterans are also affected by CO poisoning and [15] was aimed at describing the distribution and determinant factors of CO poisoning in veterans. In short, in [15] it is said that the U.S. Veterans Health Administration (VHA) provides care to over 9 million veterans and that there is a great need to study in depth the trend of CO poisoning among them. In [15], demographic variables were analyzed and compared to users of VHA care from 2010 to 2017, and the results were supported by 95% confidence intervals. Moreover, in order to test for statistical significance, the two-tailed z test for proportions was used.
Descriptive statistics can also play a key role in the preliminary analysis of CO concentration measurement data, because it can be a quick indicator of trends in deaths from poisoning. For example, in [16] descriptive statistics was used to analyze the trend in deaths due to CO poisoning in Turkey from 2008 to 2017.
Another study of the association between CO poisoning and mortality can be found in [17]. Specifically, short-term associations between CO and daily mortality because of cardiovascular diseases in China, from 2013 to 2015, were analyzed [17]. Additionally, over dispersed generalized linear models were used in [17] to estimate associations between the concentration of CO and daily mortality due to strokes and to cardiovascular and coronary heart diseases. Moreover, in [17] Bayesian hierarchical models were used to obtain national and regional average associations.
Furthermore, the robustness of the effects that CO poisoning has on cardiovascular mortality was evaluated in [17] using fitted two-pollutant models. However, the concept of robustness introduced in [17] was not in the sense of [7,8,9]. Specifically, the authors of [17] said that that the association between CO and mortality was robust if the significance of the predictor variable in a meta-regression model was very little.
An uncertainty analysis of CO measurements performed at the Izaña mountain station (Tenerife, Spain) was developed in [18]. Additionally, time series analysis was used to study the daily nighttime mean of CO concentration and, in order to perform the study, a least-squares fitting to a nonlinear function was carried out. This function consisted of a quadratic year-on-year component plus four Fourier harmonics that represented an annual cycle.
Sometimes observations of CO concentration do not follow a Gaussian distribution. Therefore, these observations cannot be analyzed using classical statistical inference methods. This happens in the present research work. But, it has also happened in research works carried out by other authors. For example, in [19] a statistical analysis of measurements of gas emissions from gasoline-powered vehicles in Irbid Directorate (Jordan), was carried out. In that paper, in order to analyze vehicle emissions of CO and other pollutants, 1000 vehicles were tested. In summary, the study performed in [19] was aimed at determining whether there were significant differences in the mean value of several emissions of pollutants which came from vehicles with different characteristics. With the purpose of conducting the above-mentioned study, nonparametric tests such as the Kruskal-Wallis test and the Mann-Whitney U test [20,21] were used [19]. Other examples of recent publications in which nonparametric tests have been used to analyze measurements of air pollution variables can be found in [22,23,24,25,26,27,28,29].
In the present paper, robust statistics [7,8,9] is used to analyze 12 years of measurement results of CO concentration at Belisario station [10], which is one of the most important stations of Quito Metropolitan Atmospheric Monitoring Network (QMAMN) [30]. QMAMN is part of the Ministry of the Environment of Ecuador and in Quito this network has nine air-quality monitoring stations, which are located in very important parts of the city.
The statistical analysis of different variables of air pollution in Quito was also carried out superficially in [30]. In fact, in [30] a robust analysis of the air pollution variables was not carried out, and the statistical tools used to analyze CO concentration were only the mean and maximum values. Therefore, it is necessary to complete what appears in [30] with a formal and rigorous study of the numerical results of the measurements of CO concentration levels in Quito. In this sense, the research work presented here could serve as a reference material to comprehensively analyze the results of the CO concentration measurements that have been carried out at the Belisario air quality monitoring station, in the last twelve years.
The objectives of this paper are the following:
(1)
Construct sets of variables that represent the 12 years of CO concentration measurements under study, the months of the year, and the hours of the day, to determine statistical parameters that establish similarities and differences between the elements of these sets of variables.
(2)
Classify CO concentration measurements by using different methods of estimating the central tendency and dispersion of the data. Specifically, classic, nonparametric, resampling, and robust methods are used.
(3)
Categorize and discriminate CO concentration measurements using confidence intervals. These confidence intervals are constructed at the 95% confidence level and are of the following types: classic, nonparametric, bootstrap, and robust confidence intervals.
(4)
Find periodicities in the sets of variables that represent the repetition of certain behaviors each time a certain time interval elapses.
Previous research papers that have been entirely focused on the use of robust statistics to analyze the behavior of air pollution variables are those shown in [31,32,33]. In addition, other research papers in which robust estimators have been used to analyze air pollution variables are [34,35].
The rest of the paper is organized as follows: Section 2 gives information about the study site and shows summary statistics. Section 3 is devoted to carrying out the data analysis by using nonparametric statistical inference techniques. Section 4 is aimed at performing the robust estimation of the CO concentration measurements. The aim of Section 5 is to perform a discussion of the results. Finally, the conclusions are given in Section 6.

2. Study Site and Summary Statistics

The study site was the Belisario station and information about this monitoring station can be found in [10,30]. According to [30], the data were collected using CO analyzers from Thermo Fisher Scientific, model 48i [36], which is a reference-level instrument that serves as a measurement standard in many countries (e.g., it is designated as a Federal Equivalent Method by the US EPA).
In this paper, each data represents a CO concentration value for each hour and said data are the result of the arithmetic mean of the CO concentrations that have been measured every 10 min of the corresponding hour represented by the data [30]. According to [37], in order to calculate the averages, 75% of the valid records were covered.
For the analysis, the data collected from 1 January 2008 to 31 December 2019 was considered, and the results of the analysis carried out refer to most of the data collected. Here, it will be analyzed whether the oscillations of the measurements are due to random variations or they indicate that the measurements are different from each other. The aforementioned will be carried out using nonparametric and robust statistics tools.
Since the data collected begins on 1 January 2008, with a sampling rate of one hour, and refers to a full 12 years, this would mean 105,193 data. However, since some data does not appear, others have negative values and one has an exceptionally high value compared to the rest, the analysis has been carried out with more than 96% of all the data; that is, only less than 4% of the total data has been lost. Negative values were removed because they cannot be valid. Nevertheless, the values equal to zero were taken into account, because these could represent valid measurements that were carried out at certain time instants. On the other hand, there was an excessively large point value that was also removed, because it was clearly seen that it could not be valid and that it also had no relationship with the rest of the values of the data set.
In this research work, there were no data scarcity problems and the time instants corresponding to missing information were not taken into account, because the robust analysis was carried out based on the information that was actually provided by the measurement instrument without need to perform any kind of interpolation, which represents one of the strengths of robust statistical inference.
With the data available, consisting of year, month, day, hour, and amount of CO concentration in milligrams per cubic meter ( mg / m 3 ), the analysis and interpretation of these will be carried out with the aim of finding relationships between said data. The variables under analysis are X k , k = 1 , ,   12 , which are the CO concentrations in 2008, 2009, and so on until 2019. That is X 1 = 2008 ,   X 2 = 2009 ,   ,   X 12 = 2019 .
Figure 1 shows the box plot diagram of the variable CO classified by years, and Figure 2 and Figure 3 show three graphs of moving averages (MAs), one graph shows the MA of all the years and two others show the MA of half of the years. This smoothing technique is used in time series studies [38,39] and will be used here to analyze the trends of the variables. Although there are different types of MA smoothing, the simplest will be used in this paper. This type of smoothing by MA consists of the following: Given a value m less than the total number of data, the mean of the data set x h ,   x h 1 ,   ,   x h m + 1 is found for each h m . In this way, each data loses its individual influence, although m 1 observations are lost. In this paper, the MA of size 720 has been considered, since 720 is the number of data that would be in a full 30-day month.
The boxplot and moving average graphs shown in Figure 1, Figure 2 and Figure 3 show that all variables (years) appear to behave similarly to each other except in 2009. In addition, Figure 1 shows that the number of observations that are extremely high, compared to available values, decrease as the years pass. Moreover, Figure 2 indicates a trend to decrease the CO concentration continuously as time passes.
In order to provide information that quickly supplies a simple description of the measurement results, Table 1 shows a statistical summary of the data. From Table 1, it can be seen that for each year there are approximately between 94% and 97% of all possible data. Also, this table shows that for all the variables the mean is higher than the median, that the skewness is positive and that the kurtosis is higher than 5, reaching values higher than 7 in some years.
The aforementioned indicates that it is very likely that all the variables under study come from heavy-tailed distributions [8,40], because, based on the information provided in Table 1, the medians are less than the means, the skewness are greater than zero, the kurtosis they are greater than 3, and it is observed that the values of the standard deviations are not small when compared with the values of the means. Furthermore, from Figure 1 it can be seen that there are many outliers.
This idea is confirmed with the boxplot graph shown in Figure 1, where abnormally high observations are presented every year. Therefore, these observations do not come from Gaussian variables [41]. Furthermore, none of the observations exceeds the desirable level of air pollution that is established by the Quito Air Quality Index (QAQI) [30]. QAQI establishes that the maximum value of the desirable level of air pollution is equal to 5   mg / m 3 . In any case, CO concentrations below 5   mg / m 3 may be considered safe or low risk for human beings. Therefore, for the case under study, it can be said that the CO concentration at Belisario station is not considered a health risk.
Finally, due to the fact that there are many observations for each of the variables, the first thing that was done was to try to carry out the statistical analysis using classical inference techniques. Therefore, attempts were made to implement different variable transformations that allowed the variables under study to fit a normal distribution [41]. In this sense, the following variable transformations were performed: sum of constants, logarithms, operations of taking nth roots, and inverse functions, among others. However, the results were not as expected, because it was not possible to adequately fit the data for one year to known random variables that were not heavy tails, and a fundamental characteristic of heavy-tailed distributions is that the central limit theorem does not work for them. Therefore, there was no way to fit any of the variables to a normal distribution. In fact, the settings that at some point seemed visually appropriate had P-values [21] less than 0.005. All this justified the use of nonparametric statistics and robust statistics in this research paper.

3. Nonparametric Statistical Inference

This section is aimed at knowing whether the samples of the variables came from the same population and had a common median. To do this, a comparison was made between all the variables aimed at testing whether the differences between the medians were due to the variability of measurements or due to random causes. With respect to the aforementioned, the variability of the observations could be produced by particular characteristics of the instants of time in which the measurements were conducted. However, random causes could be produced by weather conditions or noise introduced by measuring instruments, among other things.
In this paper, observations were made on different groups of variables and these variables were considered to be linearly independent, because the linear correlations between the variables were close to zero. In other words, the linear dependence between the variables was not strong. However, it is important to mention that in this research work the existence of nonlinear dependencies between variables was not studied, because this is out of the scope of the paper.
In this paper, in order to study whether the distributions of the variables were the same or not, the Wilcoxon rank sum test [20,21] was used to test whether the data collected in the variables under study comes from distributions with equal medians, as was also done in [22,23,31].
To carry out the hypothesis test, the null hypothesis was considered to be H 0 :   M e d i a n = M 0 , and the alternative hypothesis was H 1 :   M e d i a n   M 0 . Therefore, if the null hypothesis is assumed to be true and also that the quantities observed during all the years are stable, then half of the observations of each year will be less than M 0 and the rest of the observations will be greater than that amount. Here, the significance level was α = 0.05 and the confidence level was ( 1 α ) . Lastly, the nonparametric bilateral confidence intervals for the median were calculated as in [31,33].
The limits of the confidence intervals found in this paper are shown in Table 2, being the confidence level equal to 95%. Furthermore, the graphs of the confidence intervals that were found are shown in Figure 4.
From the information provided in Table 2 and Figure 4, it can be seen again that the amount of CO concentration per year at Belisario station tends to decrease, because as the median decreases the interval shifts to lower values. At this point, it is important to mention that the lengths of the intervals are very small due to the large number of samples available.
In addition, once the Wilcoxon rank sum test was performed, the following was verified:
(1)
The medians of the variables X 1 and X 3 are homogeneous.
(2)
The medians of the variables X 6 , X 8 and X 11 are homogeneous.
(3)
The medians of the variables X 10 and X 12 are homogeneous.
(4)
The medians of the variables X 2 , X 4 , X 5 , X 7 and X 9 do not coincide with any other.
Therefore, the amount of CO concentration per year can be grouped into four categories, which are indicated in Figure 4, separated by the black horizontal dashed lines. Specifically, the years 2008 ( X 1 ) and 2010 ( X 3 ) are in one category, the years 2009 ( X 2 ), 2011 ( X 4 ), 2012 ( X 5 ), 2014 ( X 7 ) and 2016 ( X 9 ) are in another category, the years 2013 ( X 6 ), 2015 ( X 8 ) and 2018 ( X 11 ) are in a third category, and the years 2017 ( X 10 ) and 2019 ( X 12 ) are in the fourth category.
Before concluding this section, it is important to mention that the fact that the CO concentration has been decreasing over the years could be explained by the environmental policies that have been carried out in the city of Quito in recent years. These results could indicate that these policies, among other things, could be part of the reasons why better results have been obtained.

4. Robust Estimation

In this paper, robust methods [7,8,9] were used to carry out the estimation of the central tendency and dispersion of the data in such a way that the results of the analysis were not affected by extreme values [31,32,33].
A useful technique for characterizing robust statistics is the influence curve [42]. This technique aims to measure the influence that an observation has against all other observations. In fact, if the estimators are not robust, then it may happen that the influence curves are not bounded. Therefore, when this happens, the estimator can be greatly affected by an observation that is very far from the rest of the data. With robust estimators, the influence curves are bounded and the estimators are practically insensitive to observations that deviate from the data set.
In this paper, robust estimators were applied to sample order statistics [21]. In short, the ordered sample of X 1 , ,   X n is given by X ( 1 ) X ( 2 ) X ( n ) , where the observations with the lowest value and the highest value are X ( 1 ) and X ( n ) , respectively.

4.1. Central Tendency Estimators

According to [7,8,9], the location statistics are used to indicate around which value most of the data, with which it is intended to obtain deductions, are grouped to determine the center of the distributions. In this paper, in addition to the mean and median, other statistics will be used.
The L-location estimators used in this paper were the following:
(1)
Trimean (TM) [7,43].
(2)
α -trimmed mean ( T ( α ) ) [7,8,9]
(3)
α -winsorized mean W ( α ) [7].
Also, the M-location estimators [7,8,9] used in this paper were the following:
(1)
Andrew’s wave ( T w a ( c ) ) [7,9].
(2)
Biweight ( T b i ( c ) ) [7,8].
The estimates of the above-mentioned statistics are shown in Table 3. In addition, this table also shows the following estimates: 0.2-trimmed mean, 0.3-trimmed mean, 0.2-winsorized mean, and 0.3-winsorized mean. Furthermore, Figure 5 shows classic and robust statistics of the variables, which correspond to those shown in Table 3. Figure 5 shows that there is a pronounced decrease from 2008 to 2012 and, from 2012 onwards, a stabilization is observed in all the estimates found, with a slight decrease. Note that, in general, all measures of centrality for each year fluctuate between the median and the mean.

4.2. Scale Estimators

In this paper, the variability of the data is going to be formalized through scale estimators. In accordance with [8], any statistic satisfying both the shift invariance condition and the scale equivariance condition is a dispersion estimate. The scale estimators that will be used in this paper are the following:
(1)
Sample standard deviation ( s x ) [7,8].
(2)
Mean absolute deviation ( M A D m e a n ) [7,8].
(3)
Median absolute deviation ( M A D ) [7,8].
(4)
One-half of the fourth-spread ( S R H ) [7,44].
(5)
Least median squares ( L M S ) [8].
(6)
Winsorized standard error ( s W ( α ) ) [9].
(7)
Andrew’s wave ( s ω a ( c ) ) [7].
(8)
Biweight ( S b i ( c ) ) [7,8].
(9)
Estimator based on a subrange ( C n α ) [45].
The point estimates of scale are shown in Table 4. Furthermore, Figure 6 shows the graphical representation of the point estimates of scale of the variables, which correspond to those included in Table 4.
In Figure 6, it can be seen that all the estimates are upper bounded by the standard deviation and lower bounded by the point estimator least median of squares. In addition, it is observed that the estimators of scale based on the Andrew’s wave and the biweight are very similar to each other, as was the case with the estimators of location based on the Andrew’s wave and the biweight. Moreover, there is a slight drop in the value of all the estimates from 2008 to 2012 and then they stabilize, which could indicate that the increase in the amount of CO concentration produced an increase in its variability, since that the lower limit is always zero.

4.3. Confidence Intervals

In this section, following the methodology used in [32] and suggested in [7,8], the confidence intervals were established with a location parameter, a scale parameter, and a constant related to the Student’s t distribution. Furthermore, said constant was selected following the indications given in [46,47]. In what follows, t ν , q means the q-th quantile of the Student’s t distribution with ν degrees of freedom (DOF). In this paper, the estimators shown in Section 4.1 and Section 4.2 were used to build confidence intervals. The pair of estimators were as follows [32,33]:
(1)
( X ¯ , s x ) , where X ¯ is the mean.
(2)
( M e ,   M A D ) , where M e is the median.
(3)
( M e ,   I Q R ) , where I Q R is the interquartile range.
(4)
( T ( α ) , s W ( α ) ) .
(5)
( T w a ( c ) , s w a ( c ) ) .
(6)
( T b i ( c ) , s b i ( c ) ) .
In addition, confidence intervals based on a bootstrap method were built [9,32,33]. With all of the above, eight confidence intervals were constructed for each of the twelve variables: one classic, one nonparametric, one using the bootstrap method, and five robust. In Figure 7, Figure 8 and Figure 9, these intervals are shown for three of the twelve variables that have been analyzed, specifically those corresponding to the leap years included in the study. Showing more figures would not provide relevant information.
Figure 7, Figure 8 and Figure 9 show that, in general terms, the variables present similar characteristics regarding the confidence intervals. For example, it can be seen that the classic confidence intervals are the ones that are pushed furthest towards high values, while the median-based confidence intervals are those that are shifted towards the lowest values. Note that this result is consistent with what was said in Section 2, in that it is very likely that the distributions of the variables are heavy-tailed distributions.
Furthermore, Figure 7, Figure 8 and Figure 9 show that among the median-based confidence intervals, the nonparametric intervals and the bootstrap-based intervals are very similar. Also, these figures reflect that the intervals based on the median and the median absolute deviation are the narrowest.
With respect to the confidence intervals based on Andrew’s wave and biweight, it can be said that these have similar characteristics in all the variables and that they are located between values that are to the right of the intervals based on the median and to the left of the intervals based on 0.2 -trimmed mean.
Finally, Figure 7, Figure 8 and Figure 9 also show that in all the variables the intervals based on the 0.2 -trimmed mean location estimators are the second most displaced towards high values, being these intervals those that are closest to the classic intervals.
Due to all the above, the confidence intervals based on the estimators ( T ( α ) , s W ( α ) ) and ( T b i , s b i ) were used to compare the given variables. The reasons for this decision are as follows: first, the classic intervals are unfounded because the underlying distribution is assumed to be approximately normal, which is not true; second, the results obtained with bootstrap estimators and with the point estimators ( M e ,   M A D ) and ( M e ,   I Q R ) are analogous to the results obtained with the nonparametric estimators seen in Section 3; and, third, the results obtained with the estimators based on the Andrew’s wave and on the biweight are similar, so either of the two estimators could have been chosen.
Table 5, similar to Table 2, includes the limits of the confidence intervals, with a confidence level of 95%, and their length for the estimators ( T ( α ) , s W ( α ) ) and ( T b i ( c ) , s b i ( c ) ) , for α = 0.2 and c = 9 .
The above-mentioned confidence intervals are shown in Figure 10 and Figure 11. In addition, lines have been included in this figure to try to classify the variables, analogous to the classification provided by the Wilcoxon rank sum test for the medians in Section 3. With the biweight estimators, the classification of the variables is similar to that obtained with nonparametric estimators, the only difference is that variable X 2 is grouped with variables X 1 and X 3 .
The first observation that is made is that between 2008 and 2012 the tendency to lower CO concentration values is notable, and that from 2012 to 2019 there are fluctuations with a slight downward trend. Regarding the amplitudes, it can be concluded that the confidence intervals found with biweight estimators are narrower than the confidence intervals found with α -trimmed mean and Winsorized standard deviation.

4.4. Additional Confidence Intervals

In view of the results found in Section 4.3, it was decided to analyze the same data but with two different groupings. Specifically, variables Y 1 ,   ,   Y 12 have been defined as the amount of CO concentration in each of the months of the year, with Y 1 being the amount of CO concentration in the month of January of all years, Y 2 the amount of CO concentration in the month of February of all years, and so on. On the other hand, the variables Z 1 ,   ,   Z 12 have also been defined as the amount of CO concentration grouped from two hours to two hours every day of the year, with Z 1 being the amount of CO concentration at 0:00 h and at 1:00 h, Z 2 the amount of CO concentration at 2:00 h and at 3:00 h, and so on.
The confidence intervals found for the variables Y 1 ,   ,   Y 12 and Z 1 ,   ,   Z 12 are shown in Figure 12 and Figure 13. Regarding these figures, some comments can be made. For example, Figure 12a shows that the highest values are reached in April, followed by a second step in March, May and November. In addition, the variables corresponding to the months of January, February, October and December also behave similarly, with lower values than those previously mentioned. Furthermore, the reduction in CO concentration is very appreciable in June and September. Moreover, the CO concentration values in the central summer months, that is, in July and August, are the lowest of the year. Finally, with respect to the amplitudes of the confidence intervals corresponding to these last two months, it can be said that these intervals appear to be, in general, narrower than the rest of the confidence intervals.
The aforementioned for Figure 12a can be applied quite well to Figure 12b,c, with small differences due to the fact that different estimators were used.
With respect to Figure 13, this figure shows that the only medians that are the same are those of the variables Z 4 , Z 5 and Z 10 , on the one hand, and those of Z 7 and Z 8 , on the other hand. part. The rest of the variables can be assumed different from each other and from all the others. In addition, it can be seen that the hours with the lowest CO concentration are those that correspond to the time interval that begins at 0:00 h and ends at 5:00 h. Also, the highest CO concentration values occur between 6:00 h and 9:00 h, and between 18:00 h and 19:00 h. For the rest of the hours of the day, a decrease in the concentration of CO appears in the time interval that goes from 9:00 h to 15:00 h, time of day at which the CO concentration increases again. Finally, the CO concentration begins to decrease from 21:00 h until the next morning.
The aforementioned suggests the existence of a periodic behavior, which also seems to occur when studying the behavior of CO concentration for the months of the year (see Figure 12). However, this certain periodicity in the data did not emerge when these data were analyzed for years.

5. Discussion

From an initial statistical summary, it was observed that the values of the CO concentration at the Belisario station are values that are at a desirable level of air pollution, according to the criteria established in QAQI [30]. In addition, it was observed that all the variables present many extreme observations, where said observations are on the right, that is, for high CO concentration values. Furthermore, it was also observed that all the variables present characteristics that are compatible with the possibility that they come from heavy-tailed distributions. Specifically, the variables present medians that are clearly lower than the means, the skewness is greater than zero, the kurtosis is greater than three, and the value of the standard deviation is not small compared to the value of the mean.
Subsequently, a smoothing of the data was performed to decrease the individual influence of each of the data in particular and to highlight possible trends in the data set. This smoothing was performed in the sequence formed by the data corresponding to all the years and for sequences formed for each of the years in particular. All this brought to light that there is a tendency for the values of CO concentration to decrease as the years go by. Likewise, it was also observed that the lowest values are reached in the third quarter of the year and that the highest values occur in the second and fourth quarters of the year. These results are in agreement with the general comments made in [28] about the CO concentration in Quito.
Once the smoothing of the data was performed, an attempt was made to fit all the variables to parametric distributions, through different transformations. This was done with the aim of being able to carry out a statistical analysis applying classical inference techniques, since many observations are available for each variable. However, adequate results were not obtained.
Therefore, due to the impossibility of using classical inference, the study had to be carried out using hypothesis testing and both nonparametric confidence intervals and robust confidence intervals.
This type of exhaustive preliminary analysis, with respect to CO concentration data, is not very frequent, because in general the authors tend to assume independence between the variables, to eliminate outliers and to approximate the remaining data with known parametric distributions. For example, in [11] independence between observations was assumed, peaks in signal amplitude were detected, empirical tails of these peaks were calculated, and theoretical tails of known distributions were fitted to the empirical tails. All of this was done in [11] using classical methods. In [12], kernel principal component analysis was applied to raw data to extract non-linear characteristics from it, and then the K-nearest neighbor algorithm was used for recognition tasks.
On the other hand, in [13] statistical summaries of the data were shown, where the measures of central tendency and dispersion of the data were the following: the mean, the standard deviation, the minimum value, and the maximum value. Then, it focused on the use of the Rach model to define a coherent variable and the interrelation between variables. However, in [15] a statistical summary of the data was not shown, but the analysis was performed using classic confidence intervals at the 95% confidence level and the analysis used the two-tailed Z test for proportions. Additionally, in [16] the preliminary analysis of the data was not shown either, but the descriptive analysis of the data was done in terms of frequency and percentage.
Nevertheless, in [17] the statistical summary of the data was shown, where the measures of central tendency and dispersion used were the mean, standard deviation, range, median, and interquartile range. In addition, for the analysis, the posterior mean and the 95% posterior interval were included. Moreover, Bayesian hierarchical models were used to obtain national-average associations.
In [18], although an initial summary data statistic was not shown, it did explain in detail, exhaustively, the methodology used to discard the data that were not significant for the type of analysis of uncertainty of CO concentration that was performed in that paper. However, the statistical analysis presented in [19] did include statistical summaries of the data, where the authors relied on the mean and standard deviation. Furthermore, in [19] the authors demonstrated that the distribution of the raw data was not normal. Therefore, with the latter, they justified the use of nonparametric statistical methods to carry out the analysis of the CO concentration and other air pollutants.
Regarding the type of nonparametric analysis that was carried out in this research, it can be said that the Wilcoxon rank sum test was used here to compare the medians of the distributions of the variables under study, basing this test on the statistics of the order of samples and on the sign test. Another added value of the present study is that with the nonparametric confidence intervals constructed, the variables could be grouped into different categories, establishing similarities and differences between the data.
Once the nonparametric analysis was carried out, the categorization and discrimination of the data was conducted robustly, because this provides a more in-depth analysis of the characteristics of the CO concentration at the place where the measurements were performed. Here, for the analysis, different robust location and scale statistics were found and, some of them, were used to determine robust confidence intervals. Specifically, the following point estimators were used: the mean, the median, and the trimean. In addition, families of α -trimmed mean estimators, α -winsorized mean estimators, Andrews wave-based estimators, and biweight-based estimators were used, which are defined by the proportion of values not taken into account for the estimation. Here, it was observed that for all the years the point estimates of location were practically limited between the mean and the median. In addition, it was observed that the amount of CO concentration, although all its values were in the range of desirable values according to QAQI, decreased markedly between 2008 and 2012, and that from 2012 onwards the decrease in CO concentration was, in general, much lighter but with year-on-year rises and falls.
On the other hand, the point estimators of scale that were used were the following: the standard deviation, the mean absolute deviation, the median absolute deviation, the one-half of the fourth-spread, and the least median squares. Likewise, regarding the families of scale estimators, biweight midvariance estimators, estimators based on subranges, estimators based on the Andrew’s wave, and estimators based on the Winsorized standard deviation were used. For the estimator families, values that are mentioned in the specialized literature on the subject as suitable values were chosen. The graphical representation of the scale estimators showed that these were bounded inferiorly by the least median squares and superiorly by the standard deviation. Additionally, there is a decrease from 2008 to 2012 and a stabilization from that year until 2019. The decrease is due to the decrease in the number of extreme observations and their value.
The exhaustive robust analysis that has been carried out here on the CO concentration constitutes another added value of the study. Specifically, the technical report presented in [30] does not make an in-depth statistical analysis of the variables of air pollution in Quito. In fact, in [30] only the mean and maximum values are used to analyze the behavior of the CO concentration. Therefore, the research carried out here can be used as reference material to explain how the behavior of the CO concentration in Quito has been from 1 January 2008 to 31 December 2019.
Similar research papers in which robust analysis of other air pollution variables has been performed are those shown in [31,32,33]. The results obtained in this paper are in agreement with those obtained in [31,32,33]. Examples of other research papers that are further from the topic discussed here, but that also have employed some of the robust analysis tools used in this paper are [34,35]. In all cases, the importance of the use of robust methods in the analysis of air pollution variables was highlighted.
The robust bilateral confidence intervals were found using six pairs of robust estimators, three with point estimates and three others with families of estimators from which particular values were selected. In addition, bootstrap confidence intervals were found. Here, it was observed that the confidence intervals at 95% more displaced towards higher values were the classic intervals, because they have their center in the mean, while the confidence intervals more displaced to the left were those that have their center in the median. Likewise, among the confidence intervals centered on the median, the nonparametric intervals and those found by the bootstrap method were wider than those found with robust estimators.
The confidence intervals based on the Andrew’s wave and the biweight were very similar, because the location and scale estimators found in these families were also very similar. On the other hand, the confidence intervals based on α -trimmed mean location estimators, which have the Winsorized variance as variance, produced intervals between the biweight intervals and the intervals based on the Andrew’s wave, on the left, and the classic intervals, on the right.
Due to all the above, the confidence intervals based on the estimators ( T ( α ) , s W ( α ) ) and ( T b i , s b i ) were used to compare the given variables. Again, when the variables were compared using confidence intervals, a downward trend in the CO concentration was observed between 2008 and 2012. Moreover, from 2012 onwards, fluctuations were observed with a slight tendency to decreasing the CO concentration values. Overall, the biweight-based confidence intervals were somewhat narrower than those found with α -trimmed mean. Furthermore, the classifications of the variables found with biweight were similar to those found with nonparametric estimators, the difference was that the variable X 2   ( 2009 ) was added to the category formed by X 1   ( 2008 ) and X 3   ( 2010 ) .
To complete the study, the proposed robust confidence interval analysis technique was also applied to clusters consisting of CO concentration measurements of months and clusters of CO concentration measurements in groups of two hours. Here, the variables were classified and it was noted that there was a certain periodicity in both the months and the hours of the day. In this sense, it is observed that the lowest confidence intervals corresponding to the analysis of the months are in the third quarter. Moreover, with respect to the hours of the day, it is observed that there is a certain periodicity, showing minimums in the early morning hours and maximums in the early hours of the working day and in the early hours of the night.
An additional contribution of this study is that the observed periodicities have been shown in terms of robust confidence intervals at the 95% confidence level, categorizing the range of values of the possible periodic wave and measuring differences between categories with the measurement precision provided by robust statistical methods. In addition, it is important to mention that these periodicities are not fixed, but are subject to seasonal variations and even to the character of the day in particular. For example, when considering the CO concentration between 2:00 and 3:00, which is where the lowest CO concentration of the day occurs, said concentration will be different if it is measured in different months. Specifically, the amplitude of the possible periodic signal is not the same if it is measured in April, where the CO concentration is higher, as if it is measured in August, where the concentration is lower.
It is possible that the aforementioned variations in amplitude are due to the time periods in which different activities are carried out in the city. Therefore, the highest concentration of CO when analyzed for the hours is not the same if the day is a holiday or a working day. Furthermore, this means that the signal frequency is not fixed, but is also modulated.
Before concluding this section, it is important to mention that the in-depth analysis of the possible periodic waveform that the CO concentration could have, both for the months and for the hours of the day, has not been included. Therefore, this is a task that remains pending to be carried out in future research work.

6. Conclusions

This paper was aimed at performing the robust statistical analysis of CO concentration measurements taken at the Belisario air quality monitoring station (Quito, Ecuador) from 1 January 2008 to 31 December 2019. This is the first time that this type of analysis has been carried out at this monitoring station and its results show that said concentration tends to decrease year after year. Therefore, the measures that the city authorities have been taking in the last twelve years are giving satisfactory results.
The analysis carried out in [30] is an analysis focused on general environmental issues in the city of Quito, which could be strengthened by the in-depth study carried out in this paper. This highlights some of the possible uses of the results obtained in this research work. In this sense, it is important to highlight that in this paper the measurements were classified according to the criteria established by the Quito Air Quality Index to classify air pollution. Additionally, sets of variables were constructed, the variables were categorized, and similarities and differences were also established between the variables. All of this was performed with the precision provided by both nonparametric and robust statistical methods. In this sense, the robust analysis methodology of the CO concentration developed in this paper presents an exhaustive way of carrying out the analysis of measurements of this air pollution variable. Furthermore, one of the advantages of this methodology is that it does not require a large amount of data to carry out the analysis, as has already been demonstrated in [31,32].
In [30], it is mentioned that the main sources of air pollution in Quito are the means of vehicular transportation, which is aggravated by large traffic jams and all the industries that use bunker and fuel oil, highlighting thermo-electric power plants. Moreover, in [30] it is also mentioned that Quito is a narrow and long city, whose central part is located on the slopes of the Pichincha volcano and all travelers who have to travel from one side of the city to the other have to pass through the center of the city, generating traffic jams and consequent air pollution. On the other hand, volcanic eruptions are also sources of air pollution.
What has been said in the previous paragraph shows that, although the exhaustive robust analysis carried out in this paper showed that air pollution due to CO has been decreasing in recent years, it is necessary to improve the urban dynamics of the city. For example, it is proposed that the city comply with quality standards designed specifically for each of its most critical points, in terms of air pollution. In addition, although the quality of the means of transport in Quito have improved significantly, it is proposed to look for more efficient and less polluting means. Likewise, it is proposed to design elements that protect citizens from air pollution while walking on the sidewalks, build more urban parks as air pollution filters and keep citizens informed at all times about the level of air pollution in the city, both in the region through which they travel every day and in the area where they live. All this is in total agreement with what was said in the research work presented in [31].
Finally, based on the time intervals chosen to perform the analysis and represent the results of the research, it was observed that there is a certain periodicity in the CO concentration, both for the months and for the hours of the day. Nevertheless, this periodicity does not occur when the analysis is carried out for the twelve years under study. Therefore, this implies that modeling the possible periodicity of this type of signals is a very complex research topic, where behavior patterns that vary in amplitude, duration and instants of time in which they appear come to light. Trying to model this type of behavior of the CO concentration using mathematical tools is part of a future research work of the authors.

Author Contributions

W.H. and A.M. created the methodology of formal data analysis and the tools to implement this methodology. In addition, W.H. and A.M. performed the statistical analysis of the data, the validation of the results, and the writing of the article. It is important to say that the authorship was limited to those who have contributed substantially to the work reported. All authors have read and agreed to the published version of the manuscript.

Funding

This research has been supported by CEDIA-Ecuador (under the research project CEPRA XII-2018-13), Universidad de Las Américas, Quito, Ecuador (under the research project ERa.ERI.WHP.18.01), and Universidad Politécnica de Madrid, Spain.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. United States Environmental Protection Agency. “Basic Information about Carbon Monoxide (CO) Outdoor Air Pollution” Carbon Monoxide (CO) Pollution in Outdoor Air. Available online: https://www.epa.gov/co-pollution/basic-information-about-carbon-monoxide-co-outdoor-air-pollution (accessed on 10 March 2020).
  2. Bleecker, M.L. Carbon Monoxide Intoxication. In Handbook of Clinical Neurology, 3rd ed.; Lotti, M., Bleecker, M.L., Eds.; Occupational Neurology; Elsevier: Amsterdam, The Netherlands, 2015. [Google Scholar]
  3. WHO. Environmental Health Criteria 213. Available online: https://apps.who.int/iris/bitstream/handle/10665/42180/WHO_EHC_213.pdf;jsessionid=EDD9973CA052DEDE6B3AB10501FA5E18?sequence=1. (accessed on 11 March 2020).
  4. Blumenthal, I. Carbon monoxide poisoning. J. R. Soc. Med. 2001, 94, 270–272. [Google Scholar] [CrossRef] [PubMed]
  5. Kao, L.W.; Nañagas, K.A. Toxicity associated with carbon monoxide. Clin. Lab. Med. 2006, 26, 99–125. [Google Scholar] [CrossRef] [PubMed]
  6. Omaye, S.T. Metabolic modulation of carbon monoxide toxicity. Toxicology 2002, 180, 139–150. [Google Scholar] [CrossRef]
  7. Hoaglin, D.C.; Mosteller, F.; Tukey, J.W. Understanding Robust and Exploratory Data Analysis; John Wiley & Sons: Hoboken, NJ, USA, 2000. [Google Scholar]
  8. Maronna, R.A.; Martin, R.D.; Yohai, V.J. Robust Statistics: Theory and Methods; John Wiley & Sons: England, UK, 2006. [Google Scholar]
  9. Wilcox, R. Introduction to Robust Estimation and Hypothesis Testing, 3rd ed.; Academic Press: Waltham, MA, USA, 2012. [Google Scholar]
  10. Belisario. Secretaría de Ambiente del Municipio del Distrito Metropolitano Quito. Available online: http://www.quitoambiente.gob.ec/ambiente/index.php/belisario (accessed on 23 March 2020).
  11. Hebda-Sobkowicz, J.; Gola, S.; Zimroz, R.; Wyłomanska, A. Identification and statistical analysis of impulse-like patterns of carbon monoxide variation in deep underground mines associated with the blasting procedure. Sensors 2019, 19, 2757. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  12. Xu, Y.; Zhao, X.; Chen, Y.; Zhao, W. Research on a mixed gas recognition and concentration detection algorithm based on a metal oxide semiconductor olfactory system sensor array. Sensors 2018, 18, 3264. [Google Scholar] [CrossRef]
  13. Moral, F.J.; Rebollo, F.J.; Valiente, P.; López, F. Modeling of atmospheric pollution in urban and rural sites using a probabilistic and objective approach. Appl. Sci. 2019, 9, 4009. [Google Scholar] [CrossRef] [Green Version]
  14. Tiao, G.C.; Box, G.E.P.; Hamming, W.J. A statistical analysis of the los angeles ambient carbon monoxide data 1955–1972. J. Air Pollut. Control Assoc. 1975, 25, 1129–1136. [Google Scholar] [CrossRef]
  15. Oda, G.; Ryono, R.; Lucero-Obusan, C.; Schirmer, P.; Holodniy, M. Carbon monoxide poisoning surveillance in the veterans health administration. BMC Public Health 2019, 19, 190. [Google Scholar] [CrossRef] [PubMed]
  16. Can, O.; Sayili, U.; Aksu Sayman, O.; Faruk Kuyumcu, O.; Yilmaz, D.; Esen, E.; Yurtseven, E.; Erginoz, E. Mapping of carbon monoxide related death risk in Turkey: A ten-year analysis based on news agency records. BMC Public Health 2019, 19, 9. [Google Scholar] [CrossRef] [Green Version]
  17. Liu, C.; Yin, P.; Chen, R.; Meng, X.; Wang, L.; Niu, Y.; Ling, Z.; Liu, Y.; Liu, J.; Qi, J.; et al. Ambient carbon monoxide and cardiovascular mortality: A nationwide time-series analysis in 272 cities in China. Lancet Planet. Health 2018, 2, e12–e18. [Google Scholar] [CrossRef] [Green Version]
  18. Gomez-Pelaez, A.J.; Ramos, R.; Gomez-Trueba, V.; Novelli, P.C.; Campo-Hernandez, R. A statistical approach to quantify uncertainty in carbon monoxide measurements at the Izaña global GAW station: 2008–2011. Atmos. Meas. Tech. 2013, 6, 787–799. [Google Scholar] [CrossRef] [Green Version]
  19. All-Momani, T.M.; Al-Nasser, A.D. Statistical analysis of air pollution caused by exhaust gases emitted from gasoline vehicles. Dirasat Pure Sci. 2006, 33, 93–102. [Google Scholar]
  20. Hollander, M.; Wolfe, D.A.; Chicken, E. Nonparametric Statistical Methods, 3rd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2014. [Google Scholar]
  21. Gibbons, J.D.; Chakraborti, S. Nonparametric Statistical Inference, 5th ed.; Chapman & Hall/CRC Press: Boca Raton, FL, USA, 2011. [Google Scholar]
  22. Hernandez, W.; Mendez, A.; Diaz-Marquez, A.M.; Zalakeviciute, R. PM2.5 concentration measurement analysis by using nonparametric statistical inference. IEEE Sens. J. 2020, 20, 1084–1094. [Google Scholar] [CrossRef]
  23. Hernandez, W.; Mendez, A.; Zalakeviciute, R.; Diaz-Marquez, A.M. Analysis of the information obtained from PM2.5 concentration measurements in an urban park. IEEE Trans. Instrum. Meas. 2020, 69, 6296–6311. [Google Scholar] [CrossRef]
  24. Mukherjee, A.; Brown, S.G.; McCarthy, M.C.; Pavlovic, N.R.; Stanton, L.G.; Lam Snyder, J.; D′Andrea, S.; Hafner, H.R. Measuring spatial and temporal PM2.5 variations in Sacramento, California, communities using a network of low-cost sensors. Sensors 2019, 19, 4701. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  25. Borghi, F.; Spinazzè, A.; Campagnolo, D.; Rovelli, S.; Cattaneo, A.; Cavallo, D.M. Precision and accuracy of a direct-reading miniaturized monitor in PM2.5 exposure assessment. Sensors 2018, 18, 3089. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  26. Wang, S.; Van der, A.R.J.; Stammes, P.; Wang, W.; Zhang, P.; Lu, N.; Fang, L. Carbon dioxide retrieval from TanSat observations and validation with TCCON measurements. Remote Sens. 2020, 12, 2204. [Google Scholar] [CrossRef]
  27. Shokr, M.; El-Tahan, M.; Ibrahim, A.; Steiner, A.; Gad, N. Long-term, high-resolution survey of atmospheric aerosols over egypt with NASA’s MODIS data. Remote Sens. 2017, 9, 1027. [Google Scholar] [CrossRef] [Green Version]
  28. Baire, M.; Melis, A.; Lodi, M.B.; Tuveri, P.; Dachena, C.; Simone, M.; Fanti, A.; Fumera, G.; Pisanu, T.; Mazzarella, G. A wireless sensors network for monitoring the carasau bread manufacturing process. Electronics 2019, 8, 1541. [Google Scholar] [CrossRef] [Green Version]
  29. Tang, C.-S.; Wu, T.-Y.; Chuang, K.-J.; Chang, T.-Y.; Chuang, H.-C.; Candice Lung, S.-C.; Chang, L.-T. Impacts of in-cabin exposure to size-fractionated particulate matters and carbon monoxide on changes in heart rate variability for healthy public transit commuters. Atmosphere 2019, 10, 409. [Google Scholar] [CrossRef] [Green Version]
  30. Díaz, V. Informe Calidad del Aire 2017, Secretaría de Ambiente del Distrito Metropolitano de Quito. Available online: http://www.quitoambiente.gob.ec/ambiente/index.php/informes#informecalidad-del-aire-017 (accessed on 26 March 2020).
  31. Hernandez, W.; Mendez, A.; Diaz-Marquez, A.M.; Zalakeviciute, R. Robust analysis of PM2.5 concentration measurements in the ecuadorian park la carolina. Sensors 2019, 19, 4648. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  32. Hernandez, W.; Mendez, A.; Zalakeviciute, R.; Diaz-Marquez, A.M. Robust confidence intervals for PM2.5 concentration measurements in the ecuadorian park la carolina. Sensors 2020, 20, 654. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  33. Hernandez, W.; Mendez, A.; Gonzalez-Posadas, V.; Jiménez-Martín, J.L. Robust analysis of the information obtained from a set of 12 years of SO2 concentration measurements. IEEE Access 2020. [Google Scholar] [CrossRef]
  34. Cavaliere, A.; Carotenuto, F.; Di Gennaro, F.; Gioli, B.; Gualtieri, G.; Martelli, F.; Matese, A.; Toscano, P.; Vagnoli, C.; Zaldei, A. Development of low-cost air quality stations for next generation monitoring networks: Calibration and validation of PM2.5 and PM10 sensors. Sensors 2018, 18, 2843. [Google Scholar] [CrossRef] [Green Version]
  35. Munir, S. Analysing temporal trends in the ratios of PM2.5/PM10 in the UK. Aerosol Air Qual. Res. 2017, 17, 34–48. [Google Scholar] [CrossRef]
  36. Thermo ScientifiTM Model 48i CO Analyzer, Thermo Fisher Scientific. Available online: https://www.thermofisher.com/order/catalog/product/48I#/48I (accessed on 26 March 2020).
  37. EPA-454/B-17-001. Quality Assurance Handbook for Air Pollution Measurement Systems; Ambient Air Quality Monitoring Program; U.S. Environmental Protection Agency, Office of Air Quality Planning and Standards, Air Quality Assessment Division, RTP: Research Triangle Park, NC, USA, 2017; Volume II.
  38. Box, G.E.P.; Jenkins, G.M.; Reinsel, G.C.; Ljung, G.M. Time Series Analysis: Forecasting and Control, 5th ed.; John Wiley & Sons: Hoboken, NJ, USA, 2016. [Google Scholar]
  39. Brockwell, P.J.; Davis, R.A. Introduction to Time Series and Forecasting, 2nd ed.; Springer: New York, NY, USA, 2002. [Google Scholar]
  40. Bryson, M.C. Heavy-tailed distributions: Properties and tests. Technometrics 1974, 16, 61–68. [Google Scholar] [CrossRef]
  41. Papoulis, A.; Unnikrishna Pillai, S. Probability, Random Variables, and Stochastic Processes, 4th ed.; McGraw-Hill Higher Education: New York, NY, USA, 2002. [Google Scholar]
  42. Hampel, F.R. The influence curve and its role in robust estimation. J. Am. Stat. Assoc. 1974, 69, 383–393. [Google Scholar] [CrossRef]
  43. Tukey, J.W. Exploratory Data Analysis; Addison-Wesley: Reading, MA, USA, 1977. [Google Scholar]
  44. Rock, N.M.S. ROBUST: An interactive FORTRAN-77 package for exploratory data analysis using parametric, robust and nonparametric location and scale estimates, data transformations, normality tests, and outlier assessment. Comput. Geosci. 1987, 13, 463–494. [Google Scholar] [CrossRef]
  45. Croux, C.; Rouseeuw, P.J. A class of high-breakdown scale estimators based on subranges. Commun. Stat. Theory Methods 1992, 21, 1935–1951. [Google Scholar] [CrossRef]
  46. Dixon, W.J.; Tukey, J.W. Approximate behavior of the distribution of winsorized t (Trimming/Winsorization 2). Technometrics 1968, 10, 83–98. [Google Scholar] [CrossRef]
  47. Mosteller, F.; Tukey, J.W. Data Analysis and Regression: A Second Course in Statistics; Addison-Wesley: Reading, MA, USA, 1977. [Google Scholar]
Figure 1. Box plot diagram of all years. The red circles represent the outliers.
Figure 1. Box plot diagram of all years. The red circles represent the outliers.
Sensors 20 04958 g001
Figure 2. Moving average of the time series consisting of the values of the CO concentration from 1 January 2008 to 31 December 2019.
Figure 2. Moving average of the time series consisting of the values of the CO concentration from 1 January 2008 to 31 December 2019.
Sensors 20 04958 g002
Figure 3. Moving averages of half of the years.
Figure 3. Moving averages of half of the years.
Sensors 20 04958 g003aSensors 20 04958 g003b
Figure 4. Confidence intervals for the median of each variable. The dashed lines are used to establish the separations between the different categories in which the variables under study are grouped.
Figure 4. Confidence intervals for the median of each variable. The dashed lines are used to establish the separations between the different categories in which the variables under study are grouped.
Sensors 20 04958 g004
Figure 5. Graphical representation of the location estimates for the twelve years under study. Location estimators: mean, median, trimean, 0.2-trimmed mean, 0.3-trimmed mean, 0.2-winsorized mean, 0.3-winsorized mean, Andrew’s wave, and biweight.
Figure 5. Graphical representation of the location estimates for the twelve years under study. Location estimators: mean, median, trimean, 0.2-trimmed mean, 0.3-trimmed mean, 0.2-winsorized mean, 0.3-winsorized mean, Andrew’s wave, and biweight.
Sensors 20 04958 g005
Figure 6. Graphical representation of the scale estimates for the twelve years under study. Scale estimators: sample standard deviation ( S x ), mean absolute deviation ( M A D m e a n ) , median absolute deviation ( M A D ) , one-half of the fourth-spread ( S R H ), least median squares ( L M S ), estimator based on a subrange ( C n α ), winsorized standard error ( s W ( 0.2 ) ), Andrew’s wave ( s ω a ( 2.4 π ) ), and biweight ( S b i ( c ) ).
Figure 6. Graphical representation of the scale estimates for the twelve years under study. Scale estimators: sample standard deviation ( S x ), mean absolute deviation ( M A D m e a n ) , median absolute deviation ( M A D ) , one-half of the fourth-spread ( S R H ), least median squares ( L M S ), estimator based on a subrange ( C n α ), winsorized standard error ( s W ( 0.2 ) ), Andrew’s wave ( s ω a ( 2.4 π ) ), and biweight ( S b i ( c ) ).
Sensors 20 04958 g006
Figure 7. 95% confidence intervals ( C I 0.95 ) for X 1   ( 2008 ) : classic, nonparametric, bootstrap, and robust confidence intervals.
Figure 7. 95% confidence intervals ( C I 0.95 ) for X 1   ( 2008 ) : classic, nonparametric, bootstrap, and robust confidence intervals.
Sensors 20 04958 g007
Figure 8. 95% confidence intervals ( C I 0.95 ) for X 5   ( 2012 ) : classic, nonparametric, bootstrap, and robust confidence intervals.
Figure 8. 95% confidence intervals ( C I 0.95 ) for X 5   ( 2012 ) : classic, nonparametric, bootstrap, and robust confidence intervals.
Sensors 20 04958 g008
Figure 9. 95% confidence intervals ( C I 0.95 ) for X 9   ( 2016 ) : classic, nonparametric, bootstrap, and robust confidence intervals.
Figure 9. 95% confidence intervals ( C I 0.95 ) for X 9   ( 2016 ) : classic, nonparametric, bootstrap, and robust confidence intervals.
Sensors 20 04958 g009
Figure 10. ( T ( 0.2 ) , s W ( 0.2 ) ) 95% confidence intervals: X 1   ( 2008 ) , X 2   ( 2009 ) , X 3   ( 2010 ) , X 4   ( 2011 ) , X 5   ( 2012 ) , X 6   ( 2013 ) , X 7   ( 2014 ) , X 8   ( 2015 ) , X 9   ( 2016 ) , X 10   ( 2017 ) , X 11   ( 2018 ) , and X 12   ( 2019 ) .
Figure 10. ( T ( 0.2 ) , s W ( 0.2 ) ) 95% confidence intervals: X 1   ( 2008 ) , X 2   ( 2009 ) , X 3   ( 2010 ) , X 4   ( 2011 ) , X 5   ( 2012 ) , X 6   ( 2013 ) , X 7   ( 2014 ) , X 8   ( 2015 ) , X 9   ( 2016 ) , X 10   ( 2017 ) , X 11   ( 2018 ) , and X 12   ( 2019 ) .
Sensors 20 04958 g010
Figure 11. ( T b i ( 9 ) , s b i ( 9 ) ) 95% confidence intervals: X 1   ( 2008 ) , X 2   ( 2009 ) , X 3   ( 2010 ) , X 4   ( 2011 ) , X 5   ( 2012 ) , X 6   ( 2013 ) , X 7   ( 2014 ) , X 8   ( 2015 ) , X 9   ( 2016 ) , X 10   ( 2017 ) , X 11   ( 2018 ) , and X 12   ( 2019 ) .
Figure 11. ( T b i ( 9 ) , s b i ( 9 ) ) 95% confidence intervals: X 1   ( 2008 ) , X 2   ( 2009 ) , X 3   ( 2010 ) , X 4   ( 2011 ) , X 5   ( 2012 ) , X 6   ( 2013 ) , X 7   ( 2014 ) , X 8   ( 2015 ) , X 9   ( 2016 ) , X 10   ( 2017 ) , X 11   ( 2018 ) , and X 12   ( 2019 ) .
Sensors 20 04958 g011
Figure 12. 95% confidence intervals for the months: Y 1 (January), Y 2 (February), Y 3 (March), Y 4 (April), Y 5 (May), Y 6 (June), Y 7 (July), Y 8 (August), Y 9 (September), Y 10 (October), Y 11 (November), and Y 12 (December).
Figure 12. 95% confidence intervals for the months: Y 1 (January), Y 2 (February), Y 3 (March), Y 4 (April), Y 5 (May), Y 6 (June), Y 7 (July), Y 8 (August), Y 9 (September), Y 10 (October), Y 11 (November), and Y 12 (December).
Sensors 20 04958 g012
Figure 13. 95% confidence intervals for the groups of every two hours of the day: Z 1   ( 0 : 00 1 : 00 ) , Z 2   ( 2 : 00 3 : 00 ) , Z 3   ( 4 : 00 5 : 00 ) , Z 4   ( 6 : 00 7 : 00 ) , Z 5   ( 8 : 00 9 : 00 ) , Z 6   ( 10 : 00 11 : 00 ) , Z 7   ( 12 : 00 13 : 00 ) , Z 8   ( 14 : 00 15 : 00 ) , Z 9   ( 16 : 00 17 : 00 ) , Z 10   ( 18 : 00 19 : 00 ) , Z 11   ( 20 : 00 21 : 00 ) , and Z 12   ( 22 : 00 23 : 00 ) .
Figure 13. 95% confidence intervals for the groups of every two hours of the day: Z 1   ( 0 : 00 1 : 00 ) , Z 2   ( 2 : 00 3 : 00 ) , Z 3   ( 4 : 00 5 : 00 ) , Z 4   ( 6 : 00 7 : 00 ) , Z 5   ( 8 : 00 9 : 00 ) , Z 6   ( 10 : 00 11 : 00 ) , Z 7   ( 12 : 00 13 : 00 ) , Z 8   ( 14 : 00 15 : 00 ) , Z 9   ( 16 : 00 17 : 00 ) , Z 10   ( 18 : 00 19 : 00 ) , Z 11   ( 20 : 00 21 : 00 ) , and Z 12   ( 22 : 00 23 : 00 ) .
Sensors 20 04958 g013
Table 1. Summary statistics of the CO concentration measurements.
Table 1. Summary statistics of the CO concentration measurements.
YearCountMean
( m g / m 3 )
Median
( m g / m 3 )
Standard Deviation
( m g / m 3 )
SkewnessKurtosisMinimum
( m g / m 3 )
Maximum
( m g / m 3 )
2008   ( X 1 ) 84830.98590.86000.56401.23315.044504.6000
2009   ( X 2 ) 83730.94680.83000.52971.33945.623804.6200
2010   ( X 3 ) 85000.94940.86000.47091.34005.849703.7200
2011   ( X 4 ) 83980.81260.72000.43581.25535.455304.1700
2012   ( X 5 ) 84850.68810.60000.36221.37825.629502.9900
2013   ( X 6 ) 82660.65890.58000.36201.54477.015703.2300
2014   ( X 7 ) 84770.70380.62000.36681.55006.781703.3000
2015   ( X 8 ) 84670.66670.58000.36611.64977.42240.01004.4200
2016   ( X 9 ) 84620.73170.64000.41061.48156.575203.4500
2017   ( X 10 ) 84080.63520.55000.39451.40586.068603.1400
2018   ( X 11 ) 83930.66000.59000.36201.30275.810003.2500
2019   ( X 12 ) 84570.63740.55000.36541.32815.473402.8500
Total101,1690.75660.65000.44011.52456.678504.6200
Table 2. Confidence interval limits for the median of each variable α = 0.05 .
Table 2. Confidence interval limits for the median of each variable α = 0.05 .
VariableLower Limit
( m g / m 3 )
Upper Limit
( m g / m 3 )
X 1   ( 2008 ) 0.850.88
X 1   ( 2009 ) 0.820.84
X 3   ( 2010 ) 0.850.87
X 4   ( 2011 ) 0.710.73
X 5   ( 2012 ) 0.600.61
X 6   ( 2013 ) 0.570.58
X 7   ( 2014 ) 0.610.63
X 8   ( 2015 ) 0.570.59
X 9   ( 2016 ) 0.630.65
X 10   ( 2017 ) 0.540.56
X 11   ( 2018 ) 0.580.60
X 12   ( 2019 ) 0.540.56
Table 3. Point estimates of location.
Table 3. Point estimates of location.
YearMean
( m g / m 3 )
Median
M e
( m g / m 3 )
Trimean
T M
( m g / m 3 )
0.2-Trimmed Mean
T ( 0.2 )
( m g / m 3 )
0.3-Trimmed Mean
T ( 0.3 )
( m g / m 3 )
0.2-Winsorized Mean
W ( 0.2 )
( m g / m 3 )
0.3-Winsorized Mean
W ( 0.3 )
( m g / m 3 )
Andrew’s Wave
T w a ( 2.4 π )
( m g / m 3 )
Biweight
T b i ( 9 )
( m g / m 3 )
2008   ( X 1 ) 0.98590.86000.89500.96890.93880.92030.88850.92920.9261
2009   ( X 2 ) 0.94680.83000.86000.89050.86500.88350.85400.88780.8859
2010   ( X 3 ) 0.94940.86000.88000.97190.94610.89740.87600.89910.8988
2011   ( X 4 ) 0.81260.72000.74500.79230.73480.76480.73770.77080.7686
2012   ( X 5 ) 0.68810.60000.62500.65340.61700.64010.62280.64650.6446
2013   ( X 6 ) 0.65890.58000.60000.65650.63140.61180.59700.61500.6142
2014   ( X 7 ) 0.70380.62000.64000.69340.67160.65660.63450.66010.6590
2015   ( X 8 ) 0.66670.58000.59750.62800.57860.61350.59430.61200.6122
2016   ( X 9 ) 0.73170.64000.66500.71990.64450.68000.66230.68360.6822
2017   ( X 10 ) 0.63520.55000.57250.62120.61110.58540.57050.59210.5909
2018   ( X 11 ) 0.66000.59000.60750.67140.67890.62040.60350.62280.6221
2019   ( X 12 ) 0.63740.55000.57500.60770.53180.59320.57110.59250.5907
Table 4. Point estimates of scale.
Table 4. Point estimates of scale.
Year s x
( m g / m 3 )
M A D m e a n ( m g / m 3 ) M A D
( m g / m 3 )
S R H
( m g / m 3 )
L M S
( m g / m 3 )
s W ( 0.2 )
( m g / m 3 )
s w a ( 2.4 π ) ( m g / m 3 ) s b i ( 9 )
( m g / m 3 )
C n 0.2
( m g / m 3 )
2008   ( X 1 ) 0.56400.43390.32000.34000.29500.33090.50890.51370.4555
2009   ( X 2 ) 0.52970.40430.29000.32000.27000.30830.46600.47110.4191
2010   ( X 3 ) 0.47090.35480.26000.27000.24500.26580.41020.41520.3826
2011   ( X 4 ) 0.43580.33620.25000.27000.23500.26020.39400.39700.3644
2012   ( X 5 ) 0.36220.27500.20000.21000.18000.20270.31410.31810.2733
2013   ( X 6 ) 0.36200.27020.19000.21000.17500.19570.30470.30940.2733
2014   ( X 7 ) 0.36680.27460.20000.21000.18000.20250.30930.31410.2733
2015   ( X 8 ) 0.36610.26970.18000.19500.17000.19030.29250.29950.2551
2016   ( X 9 ) 0.41060.30720.22000.24000.20500.22550.34860.35320.3098
2017   ( X 10 ) 0.39450.29820.22000.23500.19500.22190.34160.34610.2915
2018   ( X 11 ) 0.36200.27440.20000.21500.19000.20830.32140.32440.2915
2019   ( X 12 ) 0.36540.27720.19000.21000.17500.20330.31590.32040.2733
Table 5. 95% confidence intervals ( C I 0.95 ) and confidence interval lengths: ( T ( 0.2 ) , s W ( 0.2 ) ) and ( T b i ( 9 ) , s b i ( 9 ) ) .
Table 5. 95% confidence intervals ( C I 0.95 ) and confidence interval lengths: ( T ( 0.2 ) , s W ( 0.2 ) ) and ( T b i ( 9 ) , s b i ( 9 ) ) .
Variable C I 95 Lower LimitUpper LimitLength
X 1 ( T ( 0.2 ) , s W ( 0.2 ) ) 0.95710.98060.0235
( T b i ( 9 ) , s b i ( 9 ) ) 0.91510.93700.0219
X 2 ( T ( 0.2 ) , s W ( 0.2 ) ) 0.87950.90150.0220
( T b i ( 9 ) , s b i ( 9 ) ) 0.87580.89600.0202
X 3 ( T ( 0.2 ) , s W ( 0.2 ) ) 0.96250.98130.0188
( T b i ( 9 ) , s b i ( 9 ) ) 0.89000.90760.0177
X 4 ( T ( 0.2 ) , s W ( 0.2 ) ) 0.78300.80160.0186
( T b i ( 9 ) , s b i ( 9 ) ) 0.76010.77710.0170
X 5 ( T ( 0.2 ) , s W ( 0.2 ) ) 0.64620.66050.0144
( T b i ( 9 ) , s b i ( 9 ) ) 0.63780.65130.0135
X 6 ( T ( 0.2 ) , s W ( 0.2 ) ) 0.64950.66350.0141
( T b i ( 9 ) , s b i ( 9 ) ) 0.60750.62090.0133
X 7 ( T ( 0.2 ) , s W ( 0.2 ) ) 0.68630.70060.0144
( T b i ( 9 ) , s b i ( 9 ) ) 0.65230.66570.0134
X 8 ( T ( 0.2 ) , s W ( 0.2 ) ) 0.62120.63470.0135
( T b i ( 9 ) , s b i ( 9 ) ) 0.60580.61860.0128
X 9 ( T ( 0.2 ) , s W ( 0.2 ) ) 0.71190.72790.0160
( T b i ( 9 ) , s b i ( 9 ) ) 0.67460.68970.0151
X 10 ( T ( 0.2 ) , s W ( 0.2 ) ) 0.61330.62910.0158
( T b i ( 9 ) , s b i ( 9 ) ) 0.58350.59830.0148
X 11 ( T ( 0.2 ) , s W ( 0.2 ) ) 0.66400.67890.0149
( T b i ( 9 ) , s b i ( 9 ) ) 0.61510.62900.0139
X 12 ( T ( 0.2 ) , s W ( 0.2 ) ) 0.60050.61490.0144
( T b i ( 9 ) , s b i ( 9 ) ) 0.58380.59750.0137

Share and Cite

MDPI and ACS Style

Hernandez, W.; Mendez, A. Robust Estimation of Carbon Monoxide Measurements. Sensors 2020, 20, 4958. https://doi.org/10.3390/s20174958

AMA Style

Hernandez W, Mendez A. Robust Estimation of Carbon Monoxide Measurements. Sensors. 2020; 20(17):4958. https://doi.org/10.3390/s20174958

Chicago/Turabian Style

Hernandez, Wilmar, and Alfredo Mendez. 2020. "Robust Estimation of Carbon Monoxide Measurements" Sensors 20, no. 17: 4958. https://doi.org/10.3390/s20174958

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop