Attention-Based Distributed Deep Learning Model for Air Quality Forecasting
Abstract
:1. Introduction
- Slow training speed: This issue is caused by the huge amount of data that needs to be trained on a centralized deep learning architecture. In previous research, the authors used a centralized deep learning approach for training. This problem with slow training speed can be bothersome when the data changes. Therefore, models need to be retrained for more accurate predictions.
- Noisy data: The second disadvantage of these methods is that they dismiss noise effect. Indeed, noise effect is another potential challenge to evaluating air quality and meteorological data. Noise affects the accuracy and performance of the forecasts because the algorithms do not extract the optimal feature and information from pollutant and meteorological data.
- No real-time traffic data: The third challenge is that these models do not count traffic information in their set of features during the training phase. It is widely recognized that traffic flow is a significant source of air pollutants that could damage air quality. Not considering traffic information while predicting air quality can lead to nonrealistic forecasting.
- The review of new deep learning and machine learning approaches for air quality prediction alongside attention mechanism.
- The implementation of YOLOv5 model with DeepSort algorithm to collect traffic information data on the road.
- An extensive feature engineering to find the most important features while predicting both particles and . Knowing a feature’s importance is an essential step in prediction tasks. Machine learning models have been widely used in the literature to find the most relevant features in a given dataset. In this research, we made a comparative study with four different machine learning algorithms to perform this task.
- The development of an innovative attention-based convolutional BiLSTM autoencoder model to predict air quality. The 1D-CNN layers extract deep spatial correlation and local patterns features from the air pollutants. The autoencoder model encodes the meteorological and the traffic data. The Bi-LTSM layer interprets the obtained features, then passes them to several attention layers to make the final optimal prediction.
- The evaluation of the proposed framework which is based on two specific steps. Firstly, we trained the deep learning framework in centralized architecture using a single training server. This trained model has been compared with seven state-of-the-art algorithms, including two attention-based deep learning approaches. Secondly, we trained the proposed model using a parallelization training approach. We evaluated the improvement in its accuracy and the reduction of training time during this phase.
- The creation of a deployment pipeline to run online inference based on the proposed deep learning model. We noticed that most of the previous studies only focused on offline learning and inference. In our research, we are not only considering offline prediction but also online prediction based on real-time data.
2. Data Collection and Methods
2.1. Pollution Particles and Meteorological Data Collection
2.2. Traffic Data Collection—Car Counting
Algorithm 1: Input Data verification |
Input: Car /*Object that is tracked by Deep Sort Algorithm. */ 1. IF Car > diff /*diff: difference reference line and object’s center point. */ 2. FOR each entry tl TL do 3. IF Car in TL /*TL: Track List */ 4. THEN pass 5. ELSE 6. THEN Add Car to TL 7. END 8. RETURN length(TL) END Car Counting |
3. Proposed Framework
3.1. Data Preprocessing
3.2. Data Correlation and Analysis
3.3. Features Importance Assessment
3.4. Deep Learning Model
3.4.1. One-Dimensional CNN for Local Features Mining
3.4.2. Deep Autoencoders Model for Traffic and Meteorological Data Encoding
3.4.3. BiLSTM Layers for Spatiotemporal Features Representation
3.4.4. Attention Layers
3.4.5. Distributed Training Approach
4. Experimental Results and Discussion
4.1. Experiments Environment Setup
Algorithm 2: Centralized training process |
Input: Historical Data Output: Error Rates 1. data ← MissingValuesFunc (Input) 2. data ← NormalizationFunc (data) 3. learning rate α = 4. Initialize F(x) = Proposed model for N pollutants 5. Split data into train, test and validation sets 6. For epoch = 1 to N do 7. proposed_model ← fit_model (train, epoch, learningRate) 8. For t ← 1 … T do 9. Receive instance 10. ← forecast_model (f_model, ) 11. ← 12. Error_rate = evaluate_fit (, ) 13. RETURN Error_rate END |
4.2. Experiment Results
4.2.1. Step 1
- forecasting results analysis: The experimental results show that our proposed framework outperformed the baseline models both on short- and long-term predictions. As can be seen, for the forecasting results based on both 1 h (short-term) and 48 h (long-term) time steps, the proposed attention-based model obtained the lowest MAE (5.02 and 22.59, respectively, for short-term and long-term predictions), RMSE (7.48 and 28.02) and SMAPE (17.98 and 39.81) among all the models. This indicates strong agreement between observed and predicted values. The recurrent neural network had the lowest prediction accuracy both on short- and long-term predictions. It had the highest MAE, RMSE, and SMAPE values, which were, respectively, 8.33, 12.17, and 25.11 for the short-term prediction and 30.08, 37.21, and 45.78 for the long-term prediction. All hybrid LSTM based models performed better than the standard LSTM, which recorded 7.96, 11.31, and 24.87, respectively, for MAE, RMSE, and SMAPE evaluation metrics for the short-term prediction, and 29.11, 36.92, and 46.03 for the long-term prediction. Among the hybrid LSTM baseline models, the attention-based CNN-LSTM recorded the second lowest error rates after the proposed approach; these second lowest error rates were 5.21, 7.40, and 18.15 for MAE, RMSE, and SMAPE for the short-term prediction and 22.63, 28.05, and 39.77 for the long-term prediction. The ConvBiLSTM autoencoder model came right after as it obtained 5.34, 7.92, and 19.27 for MAE, RMSE, and SMAPE for the short-term prediction and 23.96, 28.17, and 40.17 for the long-term prediction. The ConvBiLSTM autoencoder model was followed by the attention-based LSTM algorithm with 5.83, 8.14, and 20.16 for the short-term prediction and 24.14, 29.10, and 41.05 for the long-term prediction.
- forecasting results analysis: As depicted in Table 7, our proposed model still performs better than the baseline models both on short- and long-term forecasting. It achieved the minimal MAE (7.38 and 28.89, respectively, for 1 h and 48 h future time steps), RMSE (9.71, and 25.09, respectively, for 1-h and 48-h future time steps), and SMAPE (19.57, and 34.26, respectively, for 1-h and 48-h future time steps). On the other hand, the recurrent neural network captured the highest error rates, which were 12.20, 15.38, and 29.74, respectively, for the MAE, RMSE, and SMAPE metrics through 1h time step prediction and 37.19, 43.28, and 48.17 for the 48h time step prediction. The Attention CNN-LSTM model has the second-best result, as it recorded 8.15, 9.81, and 20.48, respectively, for the MAE, RMSE, and SMAPE metrics on the 1h time step prediction. For the 48 h time step, it achieved 26.39 for MAE, 25.93 for RMSE, and 34.33 for SMAPE. The ConvBiLSTM autoencoder recorded 8.29, 11.27, and 22.79 for MAE, RMSE, and SMAPE on 1h time step prediction, and on 48 h time step, it achieved 27.39, 31.09, and 39.83 for the same evaluation metrics. The other hybrid LSTM baseline models have similar results as they both outperformed the standard LSTM model on both 1 h (short-term) and 48 h (long-term) time steps.
4.2.2. Step 2
4.3. Proposed Model Deployment and Online Inference
- Input Stream Data: The air quality, meteorological, and traffic data are collected every hour and then preprocessed and stored inside cloud servers.
- Online Worker: Since we decided to deploy our distributed model based on data parallelism, we have created several data pipelines based on Apache Spark that represent our workers. These training workers will have a model replica and a partition of the stream data. The workers will train their local replica by using the assigned data partition.
- Parameter Server: The parameter server is responsible for aggregating model updates and parameter requests coming from different online workers.
- User Interface: After the distributed training, the final model is then used to make the predictions every hour for the next 48 h and show it on the web application for the user to observe the predicted values as presented in Figure 20a. The web application also offers the option to upload a csv file containing the meteorological and road traffic features to predict air quality. This would help the government and the citizens to take all the necessary precautions to either control air pollution or protect themselves from the harmful effects of bad air quality.
5. Conclusions
Author Contributions
Funding
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Dong, M.; Yang, D.; Kuang, Y.; He, D.; Erdal, S.; Kenski, D. PM2.5 concentration prediction using hidden semi-Markov model-based times series data mining. Expert Syst. Appl. 2009, 36, 9046–9055. [Google Scholar] [CrossRef]
- Nimesh, R.; Arora, S.; Mahajan, K.K.; Gill, A.N. Predicting air quality using ARIMA, ARFIMA and HW smoothing. Model Assist. Stat. Appl. 2014, 9, 137–149. [Google Scholar] [CrossRef]
- Liu, B.; Jin, Y.; Xu, D.; Wang, Y.; Li, C. A data calibration method for micro air quality detectors based on a LASSO regression and NARX neural network combined model. Sci. Rep. 2021, 11, 21173. [Google Scholar] [CrossRef] [PubMed]
- Sayegh, A.S.; Munir, S.; Habeebullah, T.M. Comparing the Performance of Statistical Models for Predicting PM10 Concentrations. Aerosol Air Qual. Res. 2014, 14, 653–665. [Google Scholar] [CrossRef] [Green Version]
- Wang, W.; Men, C.; Lu, W.-Z.J. Online prediction model based on support vector machine. Neurocomputing 2008, 71, 550–558. [Google Scholar] [CrossRef]
- Martínez-España, R.; Bueno-Crespo, A.; Timón, I.; Soto, J.; Muñoz, A.; Cecilia, J.M. Air-pollution prediction in smart cities through machine learning methods: A case of study in Murcia, Spain. J. Univers. Comput. Sci. 2018, 24, 261–276. [Google Scholar]
- Ameer, S.; Shah, M.A.; Khan, A.; Song, H.; Maple, C.; Islam, S.U.; Asghar, M.N. Comparative Analysis of Machine Learning Techniques for Predicting Air Quality in Smart Cities. IEEE Access 2019, 7, 128325–128338. [Google Scholar] [CrossRef]
- Wang, X. Deep Learning in Object Recognition, Detection, and Segmentation Foundations and Trends R in Signal Processing. Signal Process. 2014, 8, 217–382. [Google Scholar] [CrossRef]
- Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and Efficient Object Detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 10778–10787. [Google Scholar] [CrossRef]
- Zhang, Y.; Qin, J.; Park, D.S.; Han, W.; Chiu, C.C.; Pang, R.; Le, Q.V.; Wu, Y. Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition. arXiv 2020, arXiv:2010.10504. Available online: http://arxiv.org/abs/2010.10504 (accessed on 19 December 2021).
- Li, Y.; Wu, C.Y.; Fan, H.; Mangalam, K.; Xiong, B.; Malik, J.; Feichtenhofer, C. Improved Multiscale Vision Transformers for Classification and Detection. arXiv 2021, arXiv:2112.01526. Available online: http://arxiv.org/abs/2112.01526 (accessed on 22 December 2021).
- Liu, M.; Zeng, A.; Xu, Z.; Lai, Q.; Xu, Q. Time Series is a Special Sequence: Forecasting with Sample Convolution and Interaction. arXiv 2021, arXiv:2106.09305. Available online: http://arxiv.org/abs/2106.09305 (accessed on 22 December 2021).
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
- Mengara, A.M.; Kim, Y.; Yoo, Y.; Ahn, J. Distributed Deep Features Extraction Model for Air Quality Forecasting. Sustainability 2020, 12, 8014. [Google Scholar] [CrossRef]
- Zhao, J.; Deng, F.; Cai, Y.; Chen, J. Long short-term memory—Fully connected (LSTM-FC) neural network for PM2.5 concentration prediction. Chemosphere 2019, 220, 486–492. [Google Scholar] [CrossRef] [PubMed]
- Qi, Y.; Li, Q.; Karimian, H.; Liu, D. A hybrid model for spatiotemporal forecasting of PM2.5 based on graph convolutional neural network and long short-term memory. Sci. Total Environ. 2019, 664, 1–10. [Google Scholar] [CrossRef] [PubMed]
- Wen, C.; Liu, S.; Yao, X.; Peng, L.; Li, X.; Hu, Y.; Chi, T. A novel spatiotemporal convolutional long short-term neural network for air pollution prediction. Sci. Total Environ. 2018, 654, 1091–1099. [Google Scholar] [CrossRef] [PubMed]
- Huang, C.-J.; Kuo, P.-H. A Deep CNN-LSTM Model for Particulate Matter (PM2.5) Forecasting in Smart Cities. Sensors 2018, 18, 2220. [Google Scholar] [CrossRef] [Green Version]
- Bai, Y.; Li, Y.; Zeng, B.; Li, C.; Zhang, J. Hourly PM2.5 concentration forecast using stacked autoencoder model with emphasis on seasonality. J. Clean. Prod. 2019, 224, 739–750. [Google Scholar] [CrossRef]
- Heydari, A.; Majidi, M.; Davide, N.; Garcia, A.; Keynia, F.; de Santoli, L. Air pollution forecasting application based on deep learning model and optimization algorithm. Clean Technol. Environ. Policy 2022, 24, 607–621. [Google Scholar] [CrossRef]
- Wang, B.; Yan, Z.; Lu, J.; Zhang, G.; Li, T. Deep Multi-task Learning for Air Quality Prediction. Lect. Notes Comput. Sci. Incl. Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinform. 2018, 11305, 93–103. [Google Scholar] [CrossRef] [Green Version]
- Xiao, F.; Yang, M.; Fan, H.; Fan, G.; Al-Qaness, M.A.A. An improved deep learning model for predicting daily PM2.5 concentration. Sci. Rep. 2020, 10, 20988. [Google Scholar] [CrossRef] [PubMed]
- Chang, Y.-S.; Chiao, H.-T.; Abimannan, S.; Huang, Y.-P.; Tsai, Y.-T.; Lin, K.-M. An LSTM-based aggregated model for air pollution forecasting. Atmos. Pollut. Res. 2020, 11, 1451–1463. [Google Scholar] [CrossRef]
- Arsov, M.; Zdravevski, E.; Lameski, P.; Corizzo, R.; Koteli, N.; Gramatikov, S.; Mitreski, K.; Trajkovik, V. Multi-Horizon Air Pollution Forecasting with Deep Neural Networks. Sensors 2021, 21, 1235. [Google Scholar] [CrossRef] [PubMed]
- Guo, C.; Liu, G.; Chen, C.-H. Air Pollution Concentration Forecast Method Based on the Deep Ensemble Neural Network. Wirel. Commun. Mob. Comput. 2020, 2020, 8854649. [Google Scholar] [CrossRef]
- Heess, N.; Graves, A. Recurrent Models of Visual Attention. Adv. Neural Inf. Processing Syst. 2014, 27, 1–12. [Google Scholar]
- Larochelle, H.; Hinton, G. Learning to combine foveal glimpses with a third-order Boltzmann machine. In Proceedings of the Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010, Vancouver, BC, Canada, 6–9 December 2010. [Google Scholar]
- Bahdanau, D.; Cho, K.; Bengio, Y. Neural Machine Translation. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015; pp. 1–15. [Google Scholar]
- Cao, J.; Chen, Q.; Guo, J.; Shi, R. Attention-guided Context Feature Pyramid Network for Object Detection. arXiv 2020, arXiv:2005.11475. Available online: https://arxiv.org/abs/2005.11475 (accessed on 17 December 2021).
- Sinha, A.; Dolz, J. Multi-scale guided attention for medical image segmentation. arXiv 2019, arXiv:1906.02849, 1–10. Available online: https://arxiv.org/abs/1906.02849 (accessed on 17 December 2021).
- Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. In Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Virtual Conference, 2–9 February 2021; pp. 1–15. [Google Scholar]
- Lin, J.; Su, Q.; Yang, P.; Ma, S.; Sun, X. Semantic-Unit-Based Dilated Convolution for Multi-Label Text Classification. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018. [Google Scholar] [CrossRef] [Green Version]
- Dynamic Memory Networks for Visual and Textual Question Answering. PMLR 2015, 48, 2397–2406.
- Chaudhari, S.; Mithal, V.; Polatkan, G.; Ramanath, R. An Attentive Survey of Attention Models. ACM Trans. Intell. Syst. Technol. 2021, 12, 1–32. [Google Scholar] [CrossRef]
- Lee, J.B. Attention Models in Graphs: A Survey. ACM Trans. Knowl. Discov. Data 2019, 13, 1–25. [Google Scholar] [CrossRef] [Green Version]
- Wang, F. Survey on the Attention Based RNN Model and Its Applications in Computer Vision, Computer Vision and Pattern Recognition. Available online: https://doi.org/10.48550/arXiv.1601.06823 (accessed on 16 December 2021).
- Dairi, A.; Harrou, F.; Khadraoui, S.; Sun, Y. Integrated Multiple Directed Attention-Based Deep Learning for Improved Air Pollution Forecasting. IEEE Trans. Instrum. Meas. 2021, 70, 1–15. [Google Scholar] [CrossRef]
- Zou, X.; Zhao, J.; Zhao, D.; Sun, B.; He, Y.; Fuentes, S. Air Quality Prediction Based on a Spatiotemporal Attention Mechanism. Mob. Inf. Syst. 2021, 2021, 6630944. [Google Scholar] [CrossRef]
- Liu, B.; Yan, S.; Li, J.; Qu, G.; Li, Y.; Lang, J.; Gu, R. An Attention-Based Air Quality Forecasting Method. In Proceedings of the 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), Orlando, FL, USA, 17–20 December 2018; pp. 728–733. [Google Scholar] [CrossRef]
- Chen, H.; Guan, M.; Li, H. Air Quality Prediction Based on Integrated Dual LSTM Model. IEEE Access 2021, 9, 93285–93297. [Google Scholar] [CrossRef]
- Chen, Z.; Yu, H.; Geng, Y.-A.; Li, Q.; Zhang, Y. EvaNet: An Extreme Value Attention Network for Long-Term Air Quality Prediction. In Proceedings of the 2020 IEEE International Conference on Big Data (Big Data), Atlanta, GA, USA, 10–13 December 2020; pp. 4545–4552. [Google Scholar] [CrossRef]
- AIR KOREA. Available online: https://www.airkorea.or.kr/web (accessed on 30 July 2020).
- Korea Meteorological Administration. Available online: https://web.kma.go.kr/eng/index.jsp (accessed on 30 July 2020).
- Jocher, G. “YOLOv5,” YOLOv5 is a family of object detection architectures and models pretrained on the COCO dataset, and represents Ultralytics open-source research into future vision AI methods. Available online: https://github.com/ultralytics/yolov5 (accessed on 30 November 2021).
- Huang, G. Missing data filling method based on linear interpolation and lightgbm. J. Phys. Conf. Ser. 2021, 1754, 012187. [Google Scholar] [CrossRef]
- Normalization|Data Preparation and Feature Engineering for Machine Learning|Google Developers. Available online: https://developers.google.com/machine-learning/data-prep/transform/normalization (accessed on 10 February 2022).
- Coefficient, C.; Two, B.; Variables, R. 5 Pearson Correlation Coefficient. Noise Reduct. Speech Process. 2009, 2, 1–2. [Google Scholar] [CrossRef]
- Banfield, R.E.; Hall, L.O.; Bowyer, K.; Kegelmeyer, W. A Comparison of Decision Tree Ensemble Creation Techniques. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 29, 173–180. [Google Scholar] [CrossRef] [PubMed]
- Dietterich, T.G. An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization. Mach. Learn. 2000, 40, 139–157. [Google Scholar] [CrossRef]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
- Brauwers, G.; Frasincar, F. A General Survey on Attention Mechanisms in Deep Learning. IEEE Trans. Knowl. Data Eng. 2021, 4347, 1–20. [Google Scholar] [CrossRef]
- Athira, V.; Geetha, P.; Vinayakumar, R.; Soman, K.P. DeepAirNet: Applying Recurrent Networks for Air Quality Prediction. Procedia Comput. Sci. 2018, 132, 1394–1403. [Google Scholar] [CrossRef]
- Cai, J.; Dai, X.; Hong, L.; Gao, Z.; Qiu, Z. An Air Quality Prediction Model Based on a Noise Reduction Self-Coding Deep Network. Math. Probl. Eng. 2020, 2020, 3507197. [Google Scholar] [CrossRef]
- Li, S.; Xie, G.; Ren, J.; Guo, L.; Yang, Y.; Xu, X. Urban PM2.5 Concentration Prediction via Attention-Based CNN–LSTM. Appl. Sci. 2020, 10, 1953. [Google Scholar] [CrossRef] [Green Version]
- Pan, Q.; Darabos, C.; Moore, J. Performance Evaluation: Metrics, Models and Benchmarks; Springer: Berlin/Heidelberg, Germany, 2008; Volume 7246. [Google Scholar]
Reference | Method | Dataset | Result |
---|---|---|---|
[1] | Hidden Semi-Markov models | EPA Air Quality System in Cook County, Illinois (PM2.5 and meteorological data, 2000–2001 period) | Prediction accuracy of 100% with solar radiation, cloudiness, temperature, pressure, humidity, wind speed, dewpoint as input parameters |
[2] | ARIMA, ARFIMA and HW smoothing | AQI from Chandigarh including RSPM, SPM, SO2, NO2 | ARIMA (RMSE: 18.20; MAE: 15.69; MAPE: 26.86) HW (RMSE: 30.12; MAE: 25.52; MAPE: 37.00) |
[3] | LASSO regression combined to a nonlinear autoregressive model with exogenous inputs (NARX) | 1-AQI data from a national monitoring station in Nanjing (from 14 November 2018 to 11 June 2019) 2-Self data collection containing 234,717 raws. The data includes PM2.5, PM10, CO, NO2, SO2, O3. | R-square: PM2.5 (0.933); PM10 (0.918); CO (0.899); NO2 (0.90); SO2 (0.941); O3 (0.936) RMSE: PM2.5 (8.687); PM10 (13.208); CO (0.156); NO2 (7.715); SO2 (4.874); O3 (12.190) MAE: PM2.5 (5.951); PM10 (8.981); CO (0.098); NO2 (4.806); SO2 (2.464); O3 (7.788) MAPE: PM2.5 (0.146); PM10 (0.146); CO (0.095); NO2 (0.177); SO2 (0.131); O3 (0.397) |
[4] | Multiple Linear Regression Model (MLRM) Quantile Regression Model (QRM) Generalized Additive Model (GAM) Boosted Regression Trees 1way and 2way | Air quality pollutants data from the city of Makkah (PM10, CO, SO2, NO2, humidity, temperature, wind speed) | Mean Bias Error (MBE): GAM (−39.9); MLRM (−29.3); QRM (−1.4); BRT1 (−43.9); BRT2 (−41.1) MAE: GAM (74.3); MLRM (80.0); QRM (61.0); BRT1 (75.6); BRT2 (80.4) MAPE: GAM (33.1); MLRM (35.7); QRM (27.2); BRT1 (33.7); BRT2 (35.8) RMSE: GAM (120.1); MLRM (123.8); QRM (95.6); BRT1 (121.1); BRT2 (125.6) |
[5] | Online SVM | Air pollutant data in Hong Kong | Testing set MAE: 19.2902 RMSE: 25.8993 WIA: 0.7880 |
[6] | Bagging algorithm Random Committee Random Forest Random Forest KNN | Air quality data in the region of Murcia (NO, NO2, SO2, NOX, PM10, Benzeno, Toluene, and Xileno) | Alcantarilla city results: Random Forest Year 2013: MAE (7.65); RMSE (10.20) Year 2014: MAE (7.33); RMSE (9.77) |
[7] | Decision tree regression Random forest regression Gradient boosting regression ANN multi-layer perceptron regression | The dataset consists of five cities of China which include Guangzhou, Chengdu, Beijing, Shanghai, and Shenyang | MLP Results: Shanghai: MAE (13.84); RMSE (0.03) Guangzhou: MAE (12.2); RMSE (0.045) Chengdu: MAE (9.8); RMSE (0.108) Shenyang: MAE (13.65); RMSE (0.062) Beijing: MAE (21.79); RMSE (0.0806) |
[14] | Convolutional Bi-Directional LSTM autoencoder model | Air quality data from South Korea (PM2.5, PM10, NO2, CO, O3, SO2, temperature, humidity, and wind speed) | PM2.5 MAE: 50.7 RMSE: 6.93 SMAPE:18.27 PM10 MAE: 5.83 RMSE: 7.22 SMAPE: 17.27 |
[15] | LSTM-Fully connected neural network | AQI dataset with 36 monitoring stations in Beijing (from 1 May 2014 to 3 April 2015) | 1–6 h Prediction: MAE: 23.97 RMSE: 35.82 7–12 h: MAE: 38.34 RMSE: 56.03 13–24 h: MAE: 47.13 RMSE: 65.60 25–48 h: MAE: 50.13 RMSE: 69.84 |
[16] | Hybrid learning framework based on a Graph Convolutional network and a LSTM model | Hourly scaled dataset of pollutants (PM2.5, PM10 NO2, CO, O3, SO2), from 76 stations over Beijing, Tianjin and Hebei. | +1 h prediction: IA: 0.98 MAE: 13.72 RMSE: 22.41 +72 h prediction IA: 0.92 MAE: 24.21 RMSE: 38.83 |
[17] | A spatiotemporal convolutional LSTM extended model | Hourly PM2.5 concentration data collected at 1233 air quality monitoring stations in Beijing and the whole China from 1 January 2016 to 31 December 2017. | RMSE: 12.08 MAE: 5.82 MAPE: 17.09 |
[18] | Deep neural network model that integrates the CNN and LSTM | PM2.5 dataset of Beijing | MAE: 14.63446 RMSE: 24.22874 Pearson Correlation: 0.959986 IA: 0.97831 |
[19] | Stacked autoencoder model | The data was collection from three monitoring stations in Beijing (Station I, Wangshouxigong; Station II, Nongzhanguan; Station III, Shunyixincheng) | Station 1 (Spring): MAE: 8.01 RMSE: 10.28 R-square: 0.880 |
[20] | LSTM model combined with multi-verse optimization algorithm | AQI data from Iran. It composed of wind speed, air temperature, NO2, and SO2 for five months. The data was collected from May–September 2019 with a time step of 3 h | With data type (1): Month of September for NO2 prediction RMSE: 0.0545 MAE: 0.0465 MAPE: 17.4011 |
[21] | Deep multi-task learning framework based on residual GRU | KDD CUP of Fresh Air | RMSE: 1.85 MAE: 1.15 |
[22] | Weighted LSTM extended model | Daily pollutants concentration and meteorological data from Beijing–Tianjin–Hebei | RMSE: 40.67 MAE: 26.10 Total accuracy index: 0.59 Spatial anomaly correlation: 0.9524 Temporal correlation coefficient: 0.9930 |
[23] | Aggregated LSTM neural network | The data was collected in Taiwan from 2012 to 2017. It contains 17 attributes based on air pollutants and meteorological information. | RMSE: 0.44 MAE: 0.91 MAPE: 16.3 |
[24] | RNN and LSTM models | Air quality data collected at Skopje | 6 h prediction SimpleRNN + Dense with ReLU: MSE: 0.0007 RMSE: 0.0273 |
[25] | Deep ensemble model which combines RNN, LSTM, and GRU networks | PM2.5 concentration and meteorological data collected at 3 stations in Shanghai (From 1 January 2010 to 31 December 2015) | Group 1: MAE: 6.72 MAPE: 19.60% |
[37] | VAE with multiple directed attention mechanism | Data from the United States Environmental Protection Agency which includes NO2, SO2, CO, and O3 | Case of NO2 prediction in Pennsylvania: RMSE: 12.373 MAE: 10.370 R-square: 0.831 Explained variance: 0.955 MAPE: 13 Mean bias error (MBE): −2.47718 Relative MBE: −3.21039 |
[38] | LSTM model based on a spatiotemporal attention mechanism | Beijing air quality dataset (from 1 January 2018 to 31 December 2018) | 1 h-Prediction: RMSE: 12.23 R-square with different set of features: R-square: 0.78 |
[39] | Seq2Seq model with attention mechanism | Beijing air quality data from April 2017 to March 2018 | Olympic Center: RMSE: 38.119 R-square: 0.493 Dongsi: RMSE: 59.508 R-square: 0.337 |
[40] | Integrated dual LSTM with attention mechanism | Beijing air quality data from 2013 to 2018 | RMSE: 14.36 MAE: 8.39 MAPE: 35.78 R-square: 0.89 IA: 0.93 |
[41] | Extreme value attention network based on encoder and decoder framework | Air quality data from Fuzhou (2 November 2017 to 2 October 2018) and Beijing (1 January 2018 to 31 December 2018) | Fuzhou dataset: RMSE: 17.9914 MAE: 9.3818 Beijing dataset RMSE: 3.2606 MAE: 2.1867 |
1D-Convolution | Settings |
---|---|
Convolution Layer | Kernel Size = (15, 1), Filter = 20, Stride = 1 |
Max Pooling Layer | Pool-size = (2, 1), Stride = 2 |
Dropout | 0.20 |
Convolution Layer | Kernel Size = (10, 1), Filter = 40, Stride = 1 |
Max Pooling Layer | Pool-size = (2, 1), Stride = 2 |
Dropout | 0.20 |
Convolution Layer | Kernel Size = (5, 1), Filter = 80, Stride = 1 |
Max Pooling Layer | Pool-size = (2, 1), Stride = 2 |
Dropout | 0.20 |
Components | Specs |
---|---|
Server CPU | AMD Ryzen 7 2700X 3.7 GHz 8 Cores |
Graphic Cards | NVIDIA GeForce RTX 2080 |
Memory | 32 GB |
Data Split | Distribution |
---|---|
Training | 70% |
Validation | 20% |
Testing | 10% |
Algorithms | Description |
---|---|
RNN [53] | Recurrent neural networks are a type of neural network that used previous outputs as inputs while having hidden states. They are mostly used in times series forecasting, speech recognition, and natural language processing. |
LSTM models [15] | LSTM models are a special type of RNN that overcome the long-term dependency problem faced by standard RNN. |
Autoencoder [54] LSTM | The autoencoder LSTM is a hybrid model which is implemented with an LSTM encoder and decoder for sequence data. This model has the same structure setup as an autoencoder but is composed of several LSTM layers. |
Convolutional LSTM [17] | This model is a hybrid framework based on CNN and LSTM models. |
ConvBiLSTM autoencoder [14] | This model is based on a convolutional BiLSTM concatenated with an autoencoder model. |
Attention LSTM [40] | This algorithm is a dual LSTM model with attention mechanism. |
Attention CNN-LSTM [55] | This model combines a one-dimensional convolutional neural network, LSTM network, and attention-based network. |
Models | Metrics | +1 h | +2 h | +4 h | +8 h | +10 h | +12 h | +24 h | +48 h |
---|---|---|---|---|---|---|---|---|---|
RNN | MAE | 8.33 | 9.49 | 11.61 | 15.87 | 20.67 | 23.50 | 26.41 | 30.08 |
RMSE | 12.17 | 13.94 | 16.78 | 21.18 | 25.43 | 30.08 | 34.59 | 37.21 | |
SMAPE | 25.11 | 28.54 | 29.91 | 36.11 | 39.09 | 41.17 | 44.96 | 45.78 | |
LSTM | MAE | 7.96 | 9.15 | 10.78 | 14.38 | 18.21 | 19.98 | 25.51 | 29.11 |
RMSE | 11.31 | 13.90 | 15.18 | 16.87 | 23.14 | 26.71 | 31.18 | 36.92 | |
SMAPE | 24.87 | 26.47 | 28.39 | 34.17 | 35.29 | 39.49 | 43.42 | 46.03 | |
Autoencoder LSTM | MAE | 6.83 | 8.96 | 9.97 | 12.24 | 15.86 | 18.76 | 22.29 | 25.87 |
RMSE | 8.94 | 11.87 | 13.83 | 15.79 | 18.48 | 23.44 | 27.53 | 30.21 | |
SMAPE | 22.39 | 25.57 | 28.09 | 32.93 | 34.28 | 35.96 | 39.49 | 44.27 | |
CNN+LSTM | MAE | 6.03 | 8.21 | 10.01 | 12.18 | 14.76 | 17.38 | 20.53 | 24.79 |
RMSE | 8.21 | 10.75 | 12.64 | 15.37 | 17.98 | 21.89 | 25.27 | 29.63 | |
SMAPE | 21.08 | 24.12 | 27.72 | 30.33 | 33.09 | 35.09 | 37.89 | 40.99 | |
Attention LSTM | MAE | 5.83 | 8.10 | 9.91 | 12.05 | 14.53 | 17.30 | 20.45 | 24.14 |
RMSE | 8.14 | 10.43 | 12.37 | 15.14 | 17.51 | 21.38 | 24.80 | 29.10 | |
SMAPE | 20.16 | 23.53 | 26.19 | 30.02 | 33.05 | 35.57 | 37.48 | 41.05 | |
ConvBiLSTM autoencoder | MAE | 5.34 | 7.97 | 9.87 | 11.62 | 14.21 | 16.99 | 19.84 | 23.96 |
RMSE | 7.92 | 9.90 | 12.18 | 15.11 | 16.89 | 20.34 | 24.12 | 28.17 | |
SMAPE | 19.27 | 23.37 | 25.33 | 29.74 | 33.14 | 36.15 | 38.90 | 40.17 | |
Attention CNN-LSTM | MAE | 5.21 | 7.15 | 9.27 | 10.99 | 14.10 | 16.13 | 19.07 | 22.63 |
RMSE | 7.40 | 9.61 | 11.94 | 14.92 | 16.32 | 19.89 | 23.93 | 28.05 | |
SMAPE | 18.15 | 21.24 | 25.09 | 28.13 | 30.24 | 33.98 | 38.22 | 39.77 | |
Proposed Model | MAE | 5.02 | 6.96 | 8.59 | 10.97 | 13.87 | 15.87 | 18.75 | 22.59 |
RMSE | 7.48 | 9.53 | 11.89 | 14.28 | 16.21 | 19.53 | 23.61 | 28.02 | |
SMAPE | 17.98 | 20.91 | 24.57 | 27.91 | 30.28 | 34.49 | 37.18 | 39.81 |
Models | Metrics | +1 h | +2 h | +4 h | +8 h | +10 h | +12 h | +24 h | +48 h |
---|---|---|---|---|---|---|---|---|---|
RNN | MAE | 12.20 | 15.75 | 19.63 | 24.73 | 26.21 | 29.87 | 35.59 | 37.19 |
RMSE | 15.38 | 19.32 | 22.68 | 25.41 | 29.37 | 34.63 | 39.44 | 43.28 | |
SMAPE | 29.74 | 33.52 | 36.37 | 40.08 | 42.19 | 45.60 | 46.32 | 48.17 | |
LSTM | MAE | 11.86 | 13.09 | 16.75 | 18.67 | 21.49 | 24.84 | 27.18 | 31.49 |
RMSE | 14.56 | 17.56 | 19.36 | 22.49 | 25.68 | 29.09 | 33.18 | 35.85 | |
SMAPE | 27.41 | 30.25 | 34.34 | 37.58 | 40.18 | 45.02 | 46.82 | 47.32 | |
Autoencoder LSTM | MAE | 11.21 | 13.25 | 16.54 | 19.05 | 21.53 | 23.31 | 26.54 | 30.97 |
RMSE | 13.67 | 16.53 | 18.11 | 21.25 | 24.57 | 27.33 | 31.85 | 34.37 | |
SMAPE | 25.89 | 28.09 | 31.52 | 35.47 | 37.69 | 40.08 | 44.16 | 45.99 | |
CNN+LSTM | MAE | 9.95 | 12.38 | 15.49 | 18.96 | 21.78 | 23.07 | 25.46 | 29.60 |
RMSE | 11.53 | 13.56 | 17.18 | 20.31 | 23.45 | 27.27 | 30.79 | 32.40 | |
SMAPE | 23.96 | 26.54 | 28.49 | 30.08 | 33.17 | 35.53 | 38.28 | 41.72 | |
Attention LSTM | MAE | 9.17 | 12.05 | 14.21 | 17.31 | 21.11 | 22.35 | 25.30 | 28.37 |
RMSE | 11.36 | 13.33 | 16.47 | 19.87 | 22.69 | 27.04 | 29.76 | 32.11 | |
SMAPE | 22.84 | 26.27 | 27.81 | 30.93 | 32.92 | 35.29 | 38.05 | 40.84 | |
ConvBiLSTM autoencoder | MAE | 8.29 | 11.25 | 13.85 | 16.67 | 19.07 | 21.48 | 25.07 | 27.39 |
RMSE | 11.27 | 13.96 | 16.34 | 18.75 | 21.49 | 26.07 | 28.96 | 31.09 | |
SMAPE | 22.79 | 25.38 | 27.75 | 31.96 | 32.51 | 34.09 | 37.69 | 39.83 | |
Attention CNN-LSTM | MAE | 8.15 | 10.19 | 11.69 | 15.23 | 17.43 | 19.39 | 23.42 | 26.39 |
RMSE | 9.81 | 12.93 | 14.70 | 17.82 | 20.28 | 22.71 | 24.80 | 25.93 | |
SMAPE | 20.48 | 23.37 | 25.97 | 29.05 | 30.97 | 30.69 | 33.13 | 34.33 | |
Proposed Model | MAE | 7.38 | 9.57 | 11.19 | 13.96 | 14.37 | 18.56 | 20.74 | 25.89 |
RMSE | 9.71 | 12.27 | 14.68 | 16.74 | 17.82 | 20.33 | 23.66 | 25.09 | |
SMAPE | 19.57 | 23.17 | 24.99 | 27.07 | 29.49 | 30.28 | 32.99 | 34.26 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Mengara Mengara, A.G.; Park, E.; Jang, J.; Yoo, Y. Attention-Based Distributed Deep Learning Model for Air Quality Forecasting. Sustainability 2022, 14, 3269. https://doi.org/10.3390/su14063269
Mengara Mengara AG, Park E, Jang J, Yoo Y. Attention-Based Distributed Deep Learning Model for Air Quality Forecasting. Sustainability. 2022; 14(6):3269. https://doi.org/10.3390/su14063269
Chicago/Turabian StyleMengara Mengara, Axel Gedeon, Eunyoung Park, Jinho Jang, and Younghwan Yoo. 2022. "Attention-Based Distributed Deep Learning Model for Air Quality Forecasting" Sustainability 14, no. 6: 3269. https://doi.org/10.3390/su14063269
APA StyleMengara Mengara, A. G., Park, E., Jang, J., & Yoo, Y. (2022). Attention-Based Distributed Deep Learning Model for Air Quality Forecasting. Sustainability, 14(6), 3269. https://doi.org/10.3390/su14063269