1. Introduction
Ambient air pollution is a major environmental stressor, posing a huge but modifiable health burden, particularly in urban environments. In a recent study, particulate matter with a diameter less than 2.5 μm (PM
2.5) was estimated to result in 8.9 million (95% confidence interval (CI): 7.5–10.3) premature deaths globally; more than the number of deaths from cigarette smoking [
1]. These deaths were categorized into five cause categories, which have been convincingly associated with air pollution: ischemic heart disease (IHD), stroke, chronic obstructive pulmonary disease (COPD), lung cancer, and lower respiratory infections (LRIs). Other research has also associated elemental carbon (EC), nitrogen dioxide (NO
2), and Ozone (O
3), amongst other pollutants, with premature mortality and a wide spectrum of diseases, including adverse birth outcomes, respiratory outcomes in children and adults, and cardiometabolic outcomes [
2]. The health burden of air pollution is not fully elucidated and could increase as more evidence emerges on the adverse effects of various pollutants on new health outcomes such as autism, cognitive decline, neurodegenerative diseases such as dementia, Alzheimer’s, and Parkinson’s disease, diabetes, and obesity [
3]. Higher air pollution exposures and the associated adverse health effects are unequally distributed, with lower socioeconomic classes and ethnic minorities suffering the most overall [
4,
5].
To study the myriad of health effects of air pollution and devise adequate air quality guidelines, standards, and exposure mitigation strategies, human exposure to air pollution must be first assessed. The assessment of ambient air pollution, and subsequently the assignment of human exposures, can be done using a wide variety of methods, which are broadly classified as measurement, modeling, or the use of air pollution exposure surrogates, such as proximity to major roadways or traffic density within certain distances from a residence [
2]. Air pollution measurements from fixed-site reference-grade and regulatory monitors may be considered as the gold standard, as they offer direct observations (rather than estimations) of pollutant concentrations and subsequent exposures and, importantly, undergo stringent quality assessment and control. These regulatory monitors can measure multiple pollutants with a high degree of accuracy and high temporal resolution. However, due to their high costs and maintenance requirements, they are only present in limited quantities and have a low spatial coverage [
6]. In addition, their locations are selected based on regulatory purposes, rather than scientific ones [
7] (e.g., siting based on random selection may be considered the gold standard for data collection and analysis). This hinders the ability to characterize the profile of urban air pollution (exposures) and its well-established spatial variability, which can vary by five times within a single city block [
8,
9]. Moreover, due to the lack of an established study design when installing these devices, the measurements collected from these sources may not be adequate for estimating exposures and scaling them to larger geographies. Capturing the large variability of air pollution within urban areas is critical, as it underpins the ability of epidemiological studies to pinpoint adverse health effects and risk assessment studies to identify hotspots and the distribution of pollutants and attributable health burdens by, for example, socioeconomic class or ethnicity. Despite these limitations, numerous studies have had to rely on insufficient spatial data from fixed-site and regulatory monitors over the years, due to lack of feasible alternative methods.
In recent years, an important advancement in air pollution measurement technology has occurred: the development and deployment of low-cost and portable air quality sensors. Low-cost air quality sensors are flooding the market, and sensors are being used more often for numerous applications, which continue to expand [
10] and can somewhat mimic application of regulatory and reference-grade monitors. The emergence of low-cost sensors has generated great interest among researchers and community members hoping to better understand the intersection of local air quality and health through broader geographic deployment. Low-cost sensors were also identified as an integral aspect of a ‘changing paradigm of air pollution monitoring’, consisting of a shift from reliance on government and regulatory monitoring to the use of low-cost, portable, easy-to-operate sensors; made possible by advances in microfabrication techniques, microelectromechanical systems, and energy efficient radios and sensor circuits [
11]. Low-cost sensors can result in spatially denser monitoring and resolved pollution measurements, which can advance research, policy, and practice, and more effectively direct programs and resources to address local pollution, health, and environmental justice. Other potential applications include air pollution warnings, epidemiological studies, and model validation. Some studies investigated data fusion methods by combining low-cost sensor measurements with well-established air quality models and evaluated the resulting air pollution maps [
12,
13]. Low-cost sensors also offer data at fine temporal resolution, and such data may be used to study the correlations between air pollution concentration and acute or short-term health effects. Despite the increasing popularity of these sensors, only a few studies have examined their performance (i.e., accuracy, precision, reliability, and reproducibility) under real-world conditions and against regulatory air quality monitors. This is a crucial and under-researched field; given the proliferation of low-cost sensors and their data, and the associated interpretation challenges [
14]. The validity and uncertainty of low-cost sensor measurements over a range of different meteorological and aerosol loading environments need to be better quantified, and this was flagged as a research gap [
15].
To understand the state of the research in this field, we conducted a literature review including twenty-five recent studies, as summarized in
Table S1. As shown in
Table S1, low-cost sensors, and the corresponding reference monitors used for comparison, measured either one or multiple gaseous and particulate pollutants, most commonly nitrogen oxide (NO), NO
2, O
3, carbon monoxide (CO), carbon dioxide (CO
2), sulfur dioxide (SO
2), PM
2.5, particulate matter with a diameter of less than 10 μm (PM
10), and particulate matter with a diameter of less than 1 μm (PM
1). Many studies also measured meteorological parameters that may impact the performance of the low-cost sensors, such as relative humidity (RH), temperature (Temp), dew point (DP), atmospheric pressure (P), wind speed (WS), and wind direction (WD). One study measured ambient light (AL) [
16]. Most studies evaluated low-cost sensor performance through co-location with a regulatory or a refence-grade monitor, without applying additional calibration methods. Some studies suggested that RH can hinder the accuracy of optical particle sensors, because of the detection of water droplets in addition to particulate matter [
17], and cautioned against using low-cost sensors for measuring particulate matter in high RH conditions [
10,
18,
19,
20]. Many sensors failed in periods of high and sustained RH [
21]. Sensor performance seemed to be negatively affected at higher air pollution concentrations [
19], including during sand and dust storms, with a trend toward underestimating at those levels and when RH was >75%. Moreover, low-cost air quality sensors can suffer from a degraded response over time, leading to the drift of measured concentrations; while gaseous sensors of a specific pollutant can suffer from cross-sensitivities to other pollutants, generating false sensor responses [
22,
23]. Research also suggests that the r
2 values for low-cost sensors have a wide range, when evaluated against field reference monitors; for example, between 0.4–0.8 [
22]. These variabilities, uncertainties, and unknowns can result in a lack of confidence in data quality and in users not knowing if, and which, low-cost sensors may fit their intended applications. Quantifying the performance of sensors in real-world conditions is critical to ensure sensors will be used in a manner commensurate with their data quality [
15].
In the face of their limitations and remaining knowledge gaps, low-cost sensors present new opportunities for ubiquitous monitoring and hold a lot of promise, due to their small size, low-cost, and ease of use. Their deployment can provide a more complete assessment of the spatiotemporal variability of urban pollution when, compared to traditional monitoring, and identify hotspots and concentrations affecting personal and community exposures. They may also allow communicating the state of air quality, through for example, established air quality categories and indices with health implications for the public [
24,
25,
26], such as the Air Quality Index (AQI); used by the U.S. Environmental Protection Agency (USEPA) to allow information about air quality and its impacts on health to be relayed to the public, so they can avoid harmful situations [
27].
In this work, we explore the performance and calibration of 12 commercial low-cost sensors co-located at a regulatory (reference) air quality monitoring site in Dallas, Texas, for 18 continuous months; the longest assessment duration, to the best of the authors’ knowledge. AQY version 1 (AQY1) sensors, manufactured by Aeroqual, New Zealand, were selected for this study. We assessed how well the raw and calibrated low-cost air quality sensors’ readings matched readings from the reference monitor, and whether meteorological factors impacted the sensors’ performance. This work adds to a growing, but limited, body of literature assessing the performance of low-cost sensors in real-world environments, with one of the longest assessments. As shown in
Table S1, only a handful of studies have assessed the performance of AQY sensors, and particulate matter was the pollutant most studied. In this study, our data spanned a realistic range of concentrations and meteorological variables, captured by 12 sensors operated at the same location. We used the well-established co-location method and inspected precision, bias, and mean error for four criteria pollutants with adverse health effects: O
3, NO
2, PM
2.5, and PM
10. We also systematically investigated the impact of the meteorological factors, Temp, RH, WS, and WD on accuracy parameters. Finally, we compared the low-cost sensor readings with the reference monitor’s readings using AQI categories; as such, investigating this potential option for utilizing and communicating data from low-cost sensors.
2. Materials and Methods
2.1. Low-Cost Sensors and Pollutants Evaluated
We evaluated 12 low-cost air quality sensors of the same type: AQY1. These units were first released in June 2018 [
28], and at the time of purchase (August 2018), they were the latest version on the market. The research team looked at a few different ‘low-cost’ air quality sensors, and the Aeroqual AQY1 devices were selected for use in the project based on multiple criteria. First, the cost of the sensors, approximately USD 4000/device, was in the mid-range of costs for low-cost sensor options, with some options costing as little as a few hundred dollars and other over USD 10,000. The units also monitored multiple pollutants, including PM and gases, which was a positive factor that led to the final selection of the AQY1 units.
The AQY1 units report the minute-by-minute concentrations of four criteria pollutants: NO2 and O3, both measured in parts per billion (ppb), and PM2.5 and PM10, both measured in microgram per cubic meter (ug/m3). The AQY1 units contain two separate sensor boards to measure pollutants: one for measuring gases, and the other for measuring particles. The units also collect information on RH and Temp at the same resolution as the pollutant data. The AQY1 units come fully assembled out of the box and are ready to be plugged into a power source and used.
To calibrate and evaluate the performance of the low-cost sensors, the 12 units were all co-located at the same reference site in Hinton, Dallas, where a high-cost regulatory air quality monitor was operating continuously. The Hinton monitor is operated by the City of Dallas for the Texas Commission on Environmental Quality (TCEQ) [
29] and records all pollutants measured by the AQY1 units, in addition to Temp, RH, and wind data (WS and WD) [
30], which we used to investigate the impact of meteorological variables on the low-cost sensor performance. All data from the Hinton, Dallas reference site is available from the TCEQ’s website at
https://www.tceq.texas.gov/cgi-bin/compliance/monops/daily_summary.pl?cams=401, accessed on 14 September 2019. We obtained and used the RH, Temp, WS, and WD to investigate whether the performance of the low-cost sensors was affected by meteorology. This co-location procedure follows co-location calibration protocols recommended by the USEPA, and the AQY1 user guide [
28]. By locating the low-cost air quality sensors at the same site as the reference (regulatory) monitor, the data from the two could be compared under real-world conditions, to assess the performance of the low-cost sensors. In addition, calibration factors can be calculated for the low-cost sensors to increase the accuracy of their data and achieve a better fit with the data measured by the regulatory air quality monitor. Furthermore, the AQY1 user guide recommends that at the regulatory co-location site there should ideally be some hourly values for O
3 > 60 ppb, NO
2 > 40 ppb, and PM
2.5 > 50 ug/m
3. These conditions were also met in this study.
Except as noted below, both the low-cost sensors and the reference air quality monitor operated continuously for a period of 18 months, and their measurements were timestamped and later matched using the timestamp to conduct the analyses. As the reference air quality monitor only reported hourly averages for the four pollutants, the minute-by-minute readings of the AQY1 monitors were converted into hourly averages. This approach allowed managing the noise in the data, by averaging the minute-by-minute readings, as reported by the AQY1 monitors, to hourly averages, comparable to the regulatory monitor. No further assessment or correction for noise was undertaken in this study.
Measurements for this study were taken continuously between 11 February 2019 and 31 August 2020, which we used as the start and end dates for all our analyses. Each AQY1 unit had both Wi-Fi and cellular capabilities, to allow for connection and periodic data transfer to a proprietary Aeroqual Cloud system.
There are essentially three options for secure data retrieval by users. First, the user can login into the Cloud system, choose a date range for download, and instantly download data from single or multiple AQY1 units over the selected time frame. This may be considered a manual user process. Second, the user can choose to have the Cloud auto-generate daily/weekly/or monthly reporting emails for each AQY1 unit. This is considered an automated process. Finally, is the option we used: the user can use an application program interface (API). Through the API, the user can send a request to the Cloud including a beginning and end date (selected time frame) and the device IDs the user is interested in. The response to this request will be the requested data, which can then be reformatted as needed for storage and analysis. We did not use the option of auto-generating periodic reports by e-mail, and although we were downloading data periodically, we did not conduct periodical checks or analysis of the data. We recommend that users utilize this automatic retrieval option and have a plan in place to check the data, which may have helped in better managing data quality control and assurance. Each AQY1 unit also has an onboard computer, which includes a memory card where the data is automatically stored if connection to the Aeroqual Cloud system is lost. Once a connection has been re-established, the unit will upload all saved data to the Aeroqual Cloud system.
Beyond the cellular capabilities mentioned above, the AQY1 units have Wi-Fi capabilities, which is only relevant if the user is in close proximity to the unit. Through the Wi-Fi capability, the data from the AQY1 unit can be retrieved directly from the device itself via the device’s internal web server. Any Wi-Fi enabled device can connect directly to the AQY unit and download the data. For this project, all data were collected using the Cloud system’s API feature and a source code developed in Python to download the data to a local database for analysis.
The Cloud system also includes a function to calibrate each unit, which in our case was done through co-location at the reference monitor site (Hinton). This function allows a user to upload data from the reference monitor, for comparison with the AQY1 unit’s data, and automatically calculates new calibration factors (as discussed in
Section 2.3). The user then needs to manually apply the new calibration factors to the AQY1 unit’s data in the cloud system. From that point onwards (after manually applying the new calibration factors), these factors are implemented in the cloud system, until the next time new reference monitor data is uploaded or new calibration factors are manually entered by the user.
We recommend that future users utilize the automated daily/weekly/or monthly reporting emails for each AQY1 unit, periodically check the quality of the data, and replace sensors or recalibrate as needed. In addition, there is no clear guidance on how much data are needed for calibration, and it took us some time to decide on the 1-month of data for calibration, as will be discussed next. The process described above, in addition to needing 1-month of data for each calibration led to relatively large amounts of missing data in the calibrated dataset.
Over the course of data collection for this study, a total of 11 sensor boards required replacement, due to failure and their limited lifetime. Data collected during these events (between the time the sensor was reported as faulty until it was replaced) was excluded from the comparison to the reference air quality monitor. Additionally, some data were missing due to power loss at the site, which sometimes required an in-person power re-set and did not happen immediately (data collection was in Dallas, Texas, while the research team was in College Station, Texas, ≈ 180 miles away). The percentage of data lost due to power losses, in addition to data collected between the time the sensor was reported as faulty until it was replaced and calibrated (which was sometimes not immediate, as described above), was approximately 20% of the overall data and was not included in the analysis. When conducting data analysis, both the unavailable data, due to faulty sensor replacement and need for a new calibration, and missing data due to power loss, were treated as ‘Not Available’ (N/A).
2.2. Site Set-Up and Instrumentation
The reference air quality monitor had the following parameters: USEPA Site Number was 481130069, located at 1415 Hinton Street, 75235, with the following site coordinates: latitude: 32°49′12″ North (32.8200660°), longitude: −96°51′36″ West (−96.8601230°). The low-cost air quality sensors were placed approximately 7 inches apart, sensor to sensor, near the regulatory monitoring station’s inlet for gases, which is shown circled in red in
Figure 1. The location of each sensor’s inlet is shown in
Figure S1. The distance from the regulatory monitoring station’s inlet for gases to the AQY1 monitors was between 15 and 25 feet, and both inlets were at approximately the same height of 10 feet. However, the regulatory monitoring station’s instruments for PM
2.5 and PM
10 were mounted on a ground level cement pad, approximately 29 feet from the AQY1 monitors, and at a height difference of 7 feet 5 inches (
Figure S2). This was the only possible installation, due to the site’s set-up and space availability.
Table 1 shows the details of the instruments used in the AQY1 units and the reference monitor at Hinton, highlighting the different gases and particle measurement methods, which contributed to the difference in readings.
2.3. Linear Calibration
Each of the 12 units had a unique ID and was investigated separately. We expected that each sensor would have a different performance, and as such, we planned on conducting the calibration and performance assessment separately for each sensor (i.e., the calibration factors were calculated separately for each sensor and not for all sensors together). The device IDs for the 12 units were: AQY1-BA-479A; AQY1-BA-480A; AQY1-WilburSpare-07; AQY1-WilburSpare-08; AQY1-WilburSpare-09; AQY1-WilburSpare-10; AQY-BA-353; AQY-BA-431; AQY-BA-432; AQY-BA-464; AQY-BA-480; and AQY-BA-481.
The raw and calibrated data from the low-cost sensors were compared against the reference monitor’s data, separately for each pollutant and each AQY1 unit, at 1-h intervals. The data calibration was conducted as follows. The two data sets (the low-cost sensors versus the reference monitor’s data) were plotted against each other in a scatter plot. The slope and offset of the linear least squares fit line of the data was then calculated, and these parameters were used to calculate the new gain and offset calibration factors for each pollutant and each AQY1 unit (as distinct from the default gain (1) and offset (0)). The formulas used to calculate the new gain and offset for the AQY1 units were
These factors were then entered into the Aeroqual Cloud system [
31], which applies the calibration factors to the raw values using the equation below.
As the equation outlines, the offset represents a shift in each raw data point, either positive or negative, and the gain was a multiplier for the value, after the shift from the offset. When first installed, a new unit has default calibration values of 1 (gain) and 0 (offset). Applying the calibration equation with these default values does not alter the reported data, which are therefore treated as raw data in our analyses.
We calibrated each unit after it had collected a minimum of 1 months’ worth of data, with the first calibration occurring in February 2019. In February 2019, all units had been at the Hinton site and had collected data for one month. The literature does not establish firm recommendations as to the amount of data needed before a calibration can be conducted or between maintenance calibration(s); however, more data may provide for a better calibration. The manufacturer (Aeroqual) suggests a minimum of 3 days, when using hourly data [
31]. After the calibration interval is complete, the new calibration values (both gain and offset) were calculated using the data from the AQY1 unit and the reference monitor at the same location using slopes and intercepts from this data. After they were calculated, the new calibration values (both gain and offset) were entered into the Aeroqual Cloud system and applied, and the data from that point forward were considered to be calibrated data. In September 2019, the units were calibrated again using the same methodology as above, as we had originally planned to move them to different sites across Dallas for another field study, but this plan was halted due to delays in the field study (not discussed here, but part of the bigger project). As such, every unit was calibrated twice, in February and September 2019, and again if a sensor had to be replaced due to its limited lifetime. Units AQY-BA-480 and AQY-BA-481 were only calibrated in September 2019, as they were purchased later than all other units, in June 2019, as back-up units for the other field study planned.
Due to the nature of the study, and the question that was being asked, we did not repeat calibrations of the low-cost sensors on a set schedule, and did not specifically conduct maintenance calibrations (i.e., regularly re-calibrating on a set schedule, for example, every three months). Therefore, in this study, we did not address the drift or change in low-cost sensor performance over time and do not make recommendations as to how calibration should be conducted or maintained, as this was not the objective of our work. Researchers interested in accessing the data for further analysis can reach out with their request to HK or KJ on the research team.
2.4. Data Analysis
To assess the performance of the AQY1 units against the refence monitor, we conducted an exploratory data analysis, regression analysis, and analysis of covariance (ANCOVA). The reference air quality monitor’s data were considered the ‘True Values’ for pollutants, and free from error. Data obtained from AQY1 units were labeled as ‘Raw Data’, while calibrated data were labeled as ‘Calibrated Data’. We briefly describe these next. All analyses were conducted using R and JMP (SAS product).
The exploratory data analysis was conducted using multiple summary statistics and graphics, where we used both the raw and the calibrated data for each pollutant and each AQY1 unit separately, to assess the performance of the low-cost sensors. We calculated and present the descriptive (summary) statistics for the whole datasets using the raw data from all low-cost sensors, the calibrated data from all low-cost sensors, the reference monitor’s readings, and the differences between the low-cost sensors and the reference monitor’s readings. We also plotted the time series to visually elucidate the trends over time. Finally, we compared and assessed the differences between the readings from the low-cost sensors and the reference monitor using the mean average percentage error (MAPE). We selected this metric because it is easy to interpret. Its calculation is shown below:
where N is the number of observations,
is the reference monitor’s reading, and
is the raw or calibrated reading from the low-cost sensor.
Measurements from the reference air quality monitor (assumed to be free from measurement error) were plotted on the x-axis, and measurements from the low-cost sensors (which are subject to measurement error) were plotted on the y-axis in scatter plots. As the performance of each AQY1 unit was expected to be different, the accuracy (regression) analysis was carried out separately for each sensor and each pollutant. We assessed systematic bias by inspecting the slope of the estimated regression line and the intercept. A deviation from the slope of 1 indicates a proportional discrepancy between the reference monitor and a low-cost monitor and indicates that a low-cost monitor is subject to a proportional systematic error. A non-zero intercept represents an absolute discrepancy or an absolute systematic error. We also calculated the root mean square error (RMSE) for the regression line, as follows:
where N is the number of observations,
is the reference monitor’s reading, and
is the raw or calibrated reading from the low-cost sensor.
We also obtained meteorological data from the Hinton site and investigated how the performance of each unit was affected depending on meteorological conditions: Temp, RH, WD, and WS. To analyze the potential impact of meteorological conditions on the performance of the low-cost sensors, the differences between the low-cost sensor data and the reference monitor’s data were analyzed using the analysis of covariance (ANCOVA) model, with the Device ID as a categorical factor and meteorological variables as covariates (continuous variables), to assess the effects of meteorological variables on measurement errors in the low-cost sensor data.
2.5. Comparison with the United States Environmental Protection Agency’s Air Quality Index Categories
In addition to the above analysis, which relied on comparing the absolute air pollutant readings from the low-cost sensors and the reference air quality monitor, we also conducted an analysis of the performance of the low-cost sensors using the AQI categories put forward by the USEPA [
27]. The AQI is an index value, which runs from 0 to 500, and is calculated using the concentration measurements of the pollutant of interest. It is applicable to the four pollutants measured by the low-cost sensors: NO
2, O
3, PM
2.5, and PM
10. The AQI is split into six categories, each with a different level of health concerns, ranging from ‘Good’, which corresponds to little or no health-related risk, to ‘Hazardous’, which corresponds to an emergency level health concern.
The equation used to calculate the AQI uses a time-averaged value of the measured concentration of each pollutant. The time used to calculate the average concentration in the AQI equation varies by pollutant. For NO
2, a 1-h average is used, for O
3, either a 1-h or an 8-h average is used, and for both PM
2.5 and PM
10, a 24-h average is used [
32]. The equation used to calculate the AQI is shown below.
where CONC_i = the average value of the pollutant over the corresponding period of time as above; CONC_lo = the concentration value at the low end for the given AQI level; CONC_hi = the concentration value at the high end for the given AQI level; AQI_hi = the maximum AQI index value for the given CONC_i; AQI_lo = the minimum AQI index value for the given CONC_i (United States Environmental Protection Agency, 2020b).
The AQI_hi, AQI_lo, CONC_hi, and CONC_lo values required in the above equation were provided by the USEPA and are presented in
Table S2. We used the equation above, and values shown in
Table S2, to calculate the AQI levels using readings from both the low-cost air quality sensors and the reference monitor and compared the two.
4. Discussions
The need for low-cost air quality monitors arose in response to the fixed nature, high calibration requirements, and purchase and maintenance costs of traditional air pollution monitors, as well as the high spatial variability of urban air pollution, which is not captured by reference monitors. Despite the proliferation of low-cost sensors and their data, there is still a lack of clarity and inconsistency about how these perform in comparison to regulatory monitors, and how their performance might be affected by meteorological factors. The literature calls for further research in this area, in real-world conditions, as opposed to in laboratories, and over long periods of time that include realistic ranges of air pollutant concentrations and meteorological conditions. This study responded to this call and adds to a growing body of evidence assessing the performance of low-cost air quality sensors.
In this study, our collected data spanned a realistic range of pollution concentrations and meteorological variables; over a period of 18 months, captured by 12 sensors from the same manufacturer and of the same type (model), and operated at the same location. The assessment of the performance of the AQY1 low-cost sensors used here is an important factor, as researchers and practitioners determine their overall usefulness in specific research projects and for specific applications. Overall, our findings showed that the performance of the AQY1 monitors varied greatly by device and pollutant, and to a minor extent, was affected by temperature, relative humidity, and wind speed, as will be discussed next. The AQY1 sensors seemed to perform best when measuring O
3 (e.g., R
2 from 0.36 to 0.97, and this generally improved with calibration), followed by PM
10 (e.g., R
2 from 0.36 to 0.54, with mixed results after the calibration), while they performed poorly when measuring NO
2 (e.g., R
2 from 0.00 to 0.58, with mixed results after the calibration) and PM
2.5 (e.g., R
2 from 0.20 to 0.39, with mixed results after the calibration, and generally deteriorating performance). Studies that specifically investigated the sensors manufactured by Aeroqual (AQY) in the past also suggested that the best performance was for O
3, followed by PM
2.5 and PM
10, and lastly NO
2 [
33,
34,
35], in line with our findings. We can only comment on this specific low-cost sensor type and expect different performances for different sensors, as shown in
Table S1.
The wide range of R
2 and varying performance in our study is also shown in the literature. As shown in
Table S1, the degree of accuracy against reference-grade or regulatory monitor readings was highly variable (large ranges for e.g., R
2) and is not directly comparable from study to study, due to the different pollutants, concentration ranges, sensor types, field location, and context-specific factors, such as meteorological conditions and calibration methods. The performance even varied from unit to unit of the same make, and this high heterogeneity is problematic when interpreting and comparing findings across the body of evidence. This finding was replicated in our study and in more recent studies, which suggested that the calibration models improve when individual sensor performance is accounted for.
In this study, we used a simple linear regression calibration method, which improved the performance of the low-costs sensors across certain parameters but not others. We explored the performance difference before and after calibration by inspecting the slope of the estimated regression line, the intercept, the coefficient of variation, and the RMSE. Overall, the reviewed studies found that data calibration improved the performance of low-cost sensors [
19], but sometimes only certain performance parameters were inspected and reported and not others. Our inspection of different parameters suggests that the picture is mixed, and for PM, general deteriorations in performance were seen, which may be partly because of the set-up of the site (discussed next). Our calibration method, however, was also basic. Some novel calibration methods, such as the artificial neural network (AAN) calibration used in Spinelle, Gerboles, Villani, Aleixandre, and Bonavitacola [
36] and Spinelle, Gerboles, Villani, Aleixandre, and Bonavitacola [
37] further lowered the bias and seemed to help solve cross-sensitivity issues from which a major part of sensors suffer, such as when measuring O
3 and NO
2. While beyond the scope of our study, future research could investigate the difference in performance when calibration is conducted using linear regression versus more complex methods, such as multi-variate linear regression (MLR), machine learning and artificial neural network techniques, and other novel methods such as segmented model and residual treatment calibration (SMART) (see
Table S1).
A particular issue was with the many negative raw NO
2 values which the sensors reported (76,046 records or 46%), which were corrected in the calibrated datasets but poorly impacted on the calibrated values and the agreement between the calibrated and the reference monitors’ data. It is, however, important to note that the AQY1 monitors do not directly measure NO
2. Instead, NO
2 data is calculated based on the difference between the O
3 and O
x sensors in the monitors, using the equation NO
2 = O
x − 1.1 × O
3, as per the manufacturer [
31]. These results highlight the importance of recording and assessing both the raw and calibrated low-cost sensor measurements, as the added value of the calibration is complex and varies by pollutant and device. Owing to the logistical limitations at the Hinton site, the reference monitor’s instruments for PM
2.5 and PM
10 were mounted on a ground level cement pad, approximately 29 feet from the AQY1 monitors, and with a height difference of 7 feet 5 inches. We also think this difference in sampling location may have contributed to the worse performance in the PM assessment and perhaps the worse calibration results. We recommend that co-location is as exact as possible in future studies, but were not able to achieve this in the current study.
In ancillary analyses, some results showed that a good performance of a low-cost sensor in the laboratory is not indicative of a good performance in the real-world, and some authors suggested that it is necessary to perform a field calibration of each individual sensor and to do so periodically (at least once ~3 months) [
6,
13,
38]. Some studies showed a gradual drift in the sensor readings as time passed. An example would be a drift to a lower PM concentration, which may be attributed to dust accumulating on a fan, reducing flow rate [
15], and a drift towards higher O
3 concentrations, which varied in magnitude depending on calibration [
37]. We did not investigate these issues in the present study, as our focus was to assess and better understand the performance of the AQY1 sensors for a later application, but these are important issues, and resulted in weaknesses in our work.
The effects of meteorological data on performance were not uniform and the results again varied across different monitors. Measuring O
3 seemed to be affected by temperature and RH, with a negative trend, except for a handful of sensors, while the effect of WS was more mixed. There was also an indication of an effect of Temp on the NO
2 errors, which was more prominent in the calibrated data analysis (note, as above, that the calibration removed the many negative NO
2 values). While the effect was mixed, for some monitors the error was higher (after calibration) at lower temperatures. The observation that the errors follow these trends of meteorological parameters is problematic, as NO
2 is expected to be higher at lower temperatures, which could be associated with restricted atmospheric dispersion and/or to changes in traffic exhaust emission characteristics and emission source strength at low temperatures [
39]. On the other hand, we expected O
3 to be higher at higher temperatures, as it requires sunlight intensity and solar radiation to form, and higher temperatures may be indicative of more sunlight. As for PM, we observed a negative bias in general (i.e., the low-cost monitors seemed to underestimate PM
2.5 concentrations on average).
Although the bias did not seem to vary significantly over the range of meteorological variables, the precision of the low-cost monitors seemed to decrease as the Temp or RH increased. There are no climate-controls on the low-cost sensors, while the reference monitor is climate-controlled (e.g., humidity control), and as such direct comparisons are challenging. As the AQY1 monitors measure PM using an optical particle counter and a light scattering method, humid conditions might have impacted the measurements. According to the AQY1 user guide, ‘light scattering is susceptible to humidity artefacts which over-report particulate levels due to ‘fogging’ where the particles are encapsulated by moisture and appear larger to the sensor than they actually are’. The AQY1 user guide mentioned that this effect is corrected for by way of a humidity correction algorithm; however, we still observed that the precision of the low-cost sensors decreased as RH increased; an effect which was more prominent in the calibrated datasets, and which may have been larger had there been no control for it by the manufacturer. Other studies [
40] suggested an overestimation of particle concentrations when RH is high, potentially explained by the operational nature of optical particle counters and the detection and interpretation of water droplets as PM [
17,
20,
41,
42]. There are, however, studies showing negligible effects of meteorological variables on PM readings [
10,
16], and that the biases to RH and Temp varied across each sensor model and node; demonstrating that each sensor response is unique [
10], as we found.
Overall, the time series patterns of pollutant concentrations measured by the AQY1 monitors followed the time series trends from the traditional reference monitor, although no formal analysis was conducted, and this deduction was based on a simple visual inspection of time series plots, such as those shown in
Figures S3–S6, and others not shown here. Similarly to our deduction, other studies in air quality sensor performance evaluation [
34], air quality sensor performance evaluation [
35], and air quality sensor performance evaluation [
33] showed that low-cost sensors seem to track diurnal variations well. As in the general performance of the AQY1 monitors, the time series from the low-cost sensors seemed to best follow the reference monitor’s data in the case of O
3, NO
2, and PMs. As such, although the absolute air pollution readings from the AQY1 monitors deviated from the reference monitors’ readings, the AQY1 monitor readings better tracked the reference monitors’ air-quality trends over time. Future research should formally assess this by conducting Granger causality and cointegration tests, and dynamic regression analyses. Other potential comparison analyses could be conducted using relative differences (instead of absolute differences) or machine learning tools, but this was outside the scope and the resources available for this study.
The strengths of our study are the long assessment duration, as compared to the literature (see
Table S1); the assessment of 12 identical sensors, which elucidated the uniqueness of each of their responses; the assessment of four pollutant criteria, across both raw and calibrated data using an established and commonly employed co-location and calibration methodology; and the systematic investigation of the effects of meteorological variables on performance, inspecting both bias and precision to determine accuracy, and reporting these parameters.
The weaknesses of our study are the amount of missing NO
2 data from the reference monitor and the negative value recordings, which are not practically sensible. Another limitation was the missing data from the AQY1 monitors, which was mainly due to sensor failure and the need to replace those sensors and reinstall the units in the field, and then manually upload the reference monitor’s data, recalculating the calibration factors and applying them. After a sensor had been replaced, the calibration factors were reset, and the co-location calibration had to be conducted again for the new sensor. There were two logistical and planning issues which increased the amount of missing data from the AQY 1 monitors: first, new sensors were shipped from overseas, which introduced some delay in the old sensors being replaced. The project researchers also needed to travel 180 miles to the measurement site for the replacement, which was complicated by COVID-19 protocols. These issues occurred frequently, and we learned the importance of having back-up sensors and a local, on the ground, contact for maintenance, where possible. We are applying these lessons in our next study phase. Our data also show that it is important for users to properly calibrate the low-cost sensors and continuously monitor the data once they are installed. Monitoring can lead to early detection of low-performing or faulty monitors, which can be replaced for better performance. While we did not investigate sensor drift in this particular study, research suggests that calibration must be conducted periodically, because the sensitivity of sensors changes over time [
15,
19,
36,
37]. This is a limitation of our study, which calibrated at the outset when sensors had collected a full month of data, at another intermediate point (when we expected to move the sensors in the field for another study), and when calibration was needed again, for example due to sensor failure and replacement.
Future studies should evaluate drift over time, and the frequency of re-calibration that is needed for optimal results, depending on that drift. We also recommend that calibration is done in real-world conditions, as laboratory calibrations may not be transferable to real-world applications [
6,
38], and on a set schedule, to be determined based on the drift. A good guide to thinking through issues of drift and regular re-calibration can be found in Williams, Kilaru, Snyder, Kaufman, Dye, Rutter, Russell, and Hafner [
43]. Our study is limited in this regard, partly due to time and financial constraints, but we welcome collaboration and data exchange with other researchers to investigate these issues.
In terms of application, the deviations between the low-cost sensors’ readings and the reference monitor’s reading may be deemed as important, suggesting that there is still much work and developments needed to improve this emerging technology. However, the general tracking of diurnal patterns is promising. In addition, the low-costs sensors seemed to perform better if air pollution levels were binned in the AQI categories, rather than presented as absolute continuous numbers. Again, even when using categories, the performance was best for O3, and worst for PM2.5. More data are needed, especially at the higher AQI bins, which can be obtained by sampling for longer periods of times or in different locations, in order to better understand the agreement and disagreement in the different bin categories. Since the AQI is a calculated index value, which uses averaged concentrations over a specific period, the AQI comparison may provide an alternative framework for interpreting data from the low-cost sensors and improving their utility. The absolute low-cost sensor readings were not expected to match the reference monitor readings with the current state of technology. Therefore, using values such as the AQI levels for comparison allowed us to assess how the AQY1 monitors performed in categorizing air pollution levels in well-established categories, which have been used for public and stakeholder communication. This can be done after installing and calibrating the low-cost sensors and depending on the amount of data deemed appropriate before conducting a calibration, in addition to maintenance calibrations.
Overall, there seemed to be important deviations between the air pollution concentrations from the AQY1 low-cost sensors versus the reference monitor, which should be considered carefully in low-cost sensor applications. We also noted that the performance seemed to vary by device, indicating that no overall conclusion can be made. Based on our results, we do not recommend using the AQY1 monitors to report on absolute air pollution concentrations, or for comparing these measurements directly with measurements from a reference monitor, air quality guidelines (such as the World Health Organization’s), or to ascertain if air quality standards are being met (such as the U.S. Environmental Protection Agency’s), as the state of the technology has not developed sufficiently to accurately support this application. Low-cost sensors, however, allow broad deployment and the tracking of air quality trends, to compare air pollution levels, ideally binned into categories.