Enhancing Influenza Epidemics Forecasting Accuracy in China with Both Official and Unofficial Online News Articles, 2019–2020
Abstract
:1. Introduction
2. Materials and Methods
2.1. Data Collection and Processing
2.1.1. Influenza-Related Online News Articles
2.1.2. Influenza-Related Microblogs
2.1.3. Influenza-Like Illness Rates in China
2.2. Statistical Analysis
2.2.1. Descriptive Analysis
2.2.2. Model Formulation
2.2.3. Parameters Estimation and Baseline Models Comparison
- Autoregression model based on official ILI rates and fraction of flu-related microblogs [8], denoted as AR() + Mblog(),
3. Results
4. Discussion
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A
References
- World Health Organization. Global Influenza Strategy 2019–2030; World Health Organization: Geneva, Switzerland, 2019. [Google Scholar]
- Iuliano, A.D.; Roguski, K.M.; Chang, H.H.; Muscatello, D.J.; Palekar, R.; Tempia, S.; Cohen, C.; Gran, J.M.; Schanzer, D.; Cowling, B.J. Estimates of global seasonal influenza-associated respiratory mortality: A modelling study. Lancet 2018, 391, 1285–1300. [Google Scholar] [CrossRef]
- Li, L.; Liu, Y.; Wu, P.; Peng, Z.; Wang, X.; Chen, T.; Wong, J.Y.; Yang, J.; Bond, H.S.; Wang, L.; et al. Influenza-associated excess respiratory mortality in China, 2010–2015: A population-based study. Lancet Public Health 2019, 4, e473–e481. [Google Scholar] [CrossRef] [Green Version]
- Yang, X.; Liu, D.; Wei, K.; Liu, X.; Meng, L.; Yu, D.; Li, H.; Li, B.; He, J.; Hu, W. Comparing the similarity and difference of three influenza surveillance systems in China. Sci. Rep. 2018, 8, 2840. [Google Scholar] [CrossRef] [PubMed]
- Yang, S.; Santillana, M.; Kou, S.C. Accurate estimation of influenza epidemics using Google search data via ARGO. Proc. Natl. Acad. Sci. USA 2015, 112, 14473–14478. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Hswen, Y.; Brownstein, J.S.; Liu, J.; Hawkins, J.B. Use of a digital health application for influenza surveillance in China. Am. J. Public Health 2017, 107, 1130–1136. [Google Scholar] [CrossRef]
- Ginsberg, J.; Mohebbi, M.H.; Patel, R.S.; Brammer, L.; Smolinski, M.S.; Brilliant, L. Detecting influenza epidemics using search engine query data. Nature 2009, 457, 1012–1014. [Google Scholar] [CrossRef] [PubMed]
- Achrekar, H.; Gandhe, A.; Lazarus, R.; Yu, S.-H.; Liu, B. Twitter Improves Seasonal Influenza Prediction. In Proceedings of the Healthinf, Algarve, Portugal, 1–4 February 2012; SciTePress: Setúbal, Portugal, 2012; pp. 61–70. [Google Scholar]
- Nsoesie, E.O.; Brownstein, J.S. Computational approaches to influenza surveillance: Beyond timeliness. Cell Host Microbe 2015, 17, 275–278. [Google Scholar] [CrossRef] [Green Version]
- Gupta, A.; Katarya, R. Social media based surveillance systems for healthcare using machine learning: A systematic review. J. Biomed. Inform. 2020, 108, 103500. [Google Scholar] [CrossRef]
- Rees, E.; Ng, V.; Gachon, P.; Mawudeku, A.; McKenney, D.; Pedlar, J.; Yemshanov, D.; Parmely, J.; Knox, J. Early detection and prediction of infectious disease outbreaks. CCDR 2019, 45, 5. [Google Scholar] [CrossRef]
- Yan, S.; Chughtai, A.; Macintyre, C. Utility and potential of rapid epidemic intelligence from internet-based sources. Int. J. Infect. Dis. 2017, 63, 77–87. [Google Scholar] [CrossRef] [Green Version]
- Bernardo, T.M.; Rajic, A.; Young, I.; Robiadek, K.; Pham, M.T.; Funk, J.A. Scoping review on search queries and social media for disease surveillance: A chronology of innovation. J. Med. Internet Res. 2013, 15, e147. [Google Scholar] [CrossRef]
- Allam, Z.; Dey, G.; Jones, D.S. Artificial intelligence (AI) provided early detection of the coronavirus (COVID-19) in China and will influence future Urban health policy internationally. AI 2020, 1, 156–165. [Google Scholar] [CrossRef] [Green Version]
- Wilson, K.; Brownstein, J.S. Early detection of disease outbreaks using the Internet. CMAJ 2009, 180, 829–831. [Google Scholar] [CrossRef] [Green Version]
- He, G.; Chen, Y.; Chen, B.; Wang, H.; Shen, L.; Liu, L.; Suolang, D.; Zhang, B.; Ju, G.; Zhang, L.; et al. Using the Baidu search index to predict the incidence of HIV/AIDS in China. Sci. Rep. 2018, 8, 9038. [Google Scholar] [CrossRef] [PubMed]
- Liu, D.; Clemente, L.; Poirier, C.; Ding, X.; Chinazzi, M.; Davis, J.; Vespignani, A.; Santillana, M. Real-time forecasting of the COVID-19 outbreak in Chinese provinces: Machine learning approach using novel digital data and estimates from mechanistic models. J. Med. Internet Res. 2020, 22, e20285. [Google Scholar] [CrossRef] [PubMed]
- Wang, Y.F.; Xu, K.; Kang, Y.; Wang, H.Y.; Wang, F.; Avram, A. Regional Influenza Prediction with Sampling Twitter Data and PDE Model. Int. J. Environ. Res. Public Health 2020, 17, 678. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Hickmann, K.S.; Fairchild, G.; Priedhorsky, R.; Generous, N.; Hyman, J.M.; Deshpande, A.; Del Valle, S.Y. Forecasting the 2013–2014 influenza season using Wikipedia. PLoS Comput. Biol. 2015, 11, e1004239. [Google Scholar] [CrossRef] [Green Version]
- Smolinski, M.S.; Crawley, A.W.; Baltrusaitis, K.; Chunara, R.; Olsen, J.M.; Wojcik, O.; Santillana, M.; Nguyen, A.; Brownstein, J.S. Flu Near You: Crowdsourced Symptom Reporting Spanning 2 Influenza Seasons. Am. J. Public Health 2015, 105, 2124–2130. [Google Scholar] [CrossRef]
- Barros, J.M.; Duggan, J.; Rebholz-Schuhmann, D. The application of internet-based sources for public health surveillance (infoveillance): Systematic review. J. Med. Internet Res. 2020, 22, e13680. [Google Scholar] [CrossRef]
- Lazer, D.; Kennedy, R.; King, G.; Vespignani, A. The parable of Google Flu: Traps in big data analysis. Science 2014, 343, 1203–1205. [Google Scholar] [CrossRef]
- Derczynski, L.; Ritter, A.; Clark, S.; Bontcheva, K. Twitter part-of-speech tagging for all: Overcoming sparse and noisy data. In Proceedings of the International Conference Recent Advances in Natural Language Processing Ranlp 2013, Hissar, Bulgaria, 9–11 September 2013; pp. 198–206. [Google Scholar]
- Gu, J.; Wu, Y.; Xu, Y. Linguistic Feature and Temporal Pattern of User-Generated News: Evidence from an Online News Portal in China. In Proceedings of the PACIS 2018, Yokohama, Japan, 26–30 June 2018; p. 19. [Google Scholar]
- Ghosh, S.; Chakraborty, P.; Nsoesie, E.O.; Cohn, E.; Mekaru, S.R.; Brownstein, J.S.; Ramakrishnan, N. Temporal topic modeling to assess associations between news trends and infectious disease outbreaks. Sci. Rep. 2017, 7, 40841. [Google Scholar] [CrossRef]
- McGough, S.F.; Brownstein, J.S.; Hawkins, J.B.; Santillana, M. Forecasting Zika incidence in the 2016 Latin America outbreak combining traditional disease surveillance with search, social media, and news report data. PLoS Negl. Trop. Dis. 2017, 11, e0005295. [Google Scholar] [CrossRef] [PubMed]
- Kim, J.; Ahn, I. Weekly ILI patient ratio change prediction using news articles with support vector machine. BMC Bioinform. 2019, 20, 259. [Google Scholar] [CrossRef] [PubMed]
- Liu, N.; Chen, Z.; Bao, G. Role of media coverage in mitigating COVID-19 transmission: Evidence from China. Technol. Forecast. Soc. Chang. 2020, 163, 120435. [Google Scholar] [CrossRef]
- Lamb, A.; Paul, M.; Dredze, M. Separating fact from fear: Tracking flu infections on twitter. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta, GA, USA, 9–14 June 2013; pp. 789–795. [Google Scholar]
- World Health Organization. Global Epidemiological Surveillance Standards for Influenza; World Health Organization: Geneva, Switzerland, 2013. [Google Scholar]
- Centers for Disease Control and Prevention. The Flu Season. Available online: https://www.cdc.gov/flu/about/season/flu-season.htm (accessed on 29 May 2021).
- Public Health England. Annual Flu Reports. Available online: https://www.gov.uk/government/statistics/annual-flu-reports (accessed on 29 May 2021).
- European Centre for Disease Prevention and Control. Indicators of Influenza Activity. Available online: https://www.ecdc.europa.eu/en/seasonal-influenza/surveillance-and-disease-data/facts-indicators (accessed on 29 May 2021).
- Ramos, J. Using tf-idf to determine word relevance in document queries. In Proceedings of the First Instructional Conference on Machine Learning, Piscataway, NJ, USA, 3–8 December 2003; pp. 133–142. [Google Scholar]
- Salton, G.; Buckley, C. Term Weighting Approaches in Automatic Text Retrieval; Cornell University: Ithaca, NY, USA, 1987. [Google Scholar]
- The Writing Committee of the World Health Organization (WHO) Consultation on Human Influenza A/H5. Avian influenza A (H5N1) infection in humans. N. Engl. J. Med. 2005, 353, 1374–1385. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zhang, Y.; Yakob, L.; Bonsall, M.B.; Hu, W. Predicting seasonal influenza epidemics using cross-hemisphere influenza surveillance data and local internet query data. Sci. Rep. 2019, 9, 3262. [Google Scholar] [CrossRef] [Green Version]
- Broniatowski, D.A.; Paul, M.J.; Dredze, M. National and local influenza surveillance through Twitter: An analysis of the 2012–2013 influenza epidemic. PLoS ONE 2013, 8, e83672. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Doan, S.; Ohno-Machado, L.; Collier, N. Enhancing Twitter data analysis with simple semantic filtering: Example in tracking influenza-like illnesses. In Proceedings of the 2012 IEEE Second International Conference on Healthcare Informatics, Imaging and Systems Biology, La Jolla, CA, USA, 27–28 September 2012; pp. 62–71. [Google Scholar]
- Lennart, L. System Identification: Theory for the User; PTR Prentice Hall: Upper Saddle River, NJ, USA, 1999; Volume 28, pp. 1–14. [Google Scholar]
- Paul, M.J.; Dredze, M.; Broniatowski, D. Twitter improves influenza forecasting. PLoS Curr. 2014, 6. [Google Scholar] [CrossRef]
- Zou, H.; Yang, Y. Combining time series models for forecasting. Int. J. Forecast. 2004, 20, 69–84. [Google Scholar] [CrossRef] [Green Version]
- James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning; Springer: Berlin/Heidelberg, Germany, 2013; Volume 112. [Google Scholar]
- Maindonald, J.H.; Braun, W.J.; Braun, M.W.J. Package ‘DAAG’. Data Analysis and Graphics Data and Functions. 2015. Available online: https://cran.r-project.org/package=DAAG (accessed on 17 June 2021).
- Kuhn, M. Building predictive models in R using the caret package. J. Stat. Softw. 2008, 28, 1–26. [Google Scholar] [CrossRef] [Green Version]
- Dietterich, T.G. Approximate statistical tests for comparing supervised classification learning algorithms. Neural. Comput. 1998, 10, 1895–1923. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Kreft, J. Instant Articles (Facebook): The Impact of Trust and Relations Among the Partners Pursuing the Strategy of Coopetition. In Eurasian Business Perspectives; Springer: Berlin/Heidelberg, Germany, 2019; pp. 243–253. [Google Scholar]
- Zhang, Y.; Ibaraki, M.; Schwartz, F.W. Disease surveillance using online news: Dengue and Zika in tropical countries. J. Biomed. Inform. 2020, 102, 103374. [Google Scholar] [CrossRef] [PubMed]
- CNNIC. The 46th China Statistical Report on Internet Development; CNNIC: Beijing, China, 2020.
Model (lag) | RMSE | R2 | MAE | Correlation | Correlation of Increment | t (sig.) |
---|---|---|---|---|---|---|
AR(1) + News(2) | 0.087 | 0.938 | 0.072 | 0.980 | 0.824 | |
AR(2) | 0.126 | 0.918 | 0.098 | 0.903 | 0.286 | −3.164 *** (0.002) |
AR(1) + Mblog(2) | 0.150 | 0.872 | 0.113 | 0.946 | 0.517 | −2.212 ** (0.029) |
AR(1) + News(2) + Mblog(0) | 0.107 | 0.905 | 0.083 | 0.985 | 0.839 | −1.047 (0.297) |
Model (lag) | RMSE | R2 | MAE | Correlation | Correlation of Increment | t (sig.) |
---|---|---|---|---|---|---|
AR(1) + News(2) | 0.119 | 0.943 | 0.099 | 0.978 | 0.834 | |
AR(2) | 0.151 | 0.898 | 0.122 | 0.910 | 0.317 | −4.196 *** (0.000) |
AR(1) + Mblog(2) | 0.142 | 0.908 | 0.116 | 0.967 | 0.703 | −5.721 *** (0.000) |
AR(1) + News(2) + Mblog(0) | 0.129 | 0.922 | 0.107 | 0.980 | 0.820 | −1.878 * (0.063) |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, J.; Sia, C.-L.; Chen, Z.; Huang, W. Enhancing Influenza Epidemics Forecasting Accuracy in China with Both Official and Unofficial Online News Articles, 2019–2020. Int. J. Environ. Res. Public Health 2021, 18, 6591. https://doi.org/10.3390/ijerph18126591
Li J, Sia C-L, Chen Z, Huang W. Enhancing Influenza Epidemics Forecasting Accuracy in China with Both Official and Unofficial Online News Articles, 2019–2020. International Journal of Environmental Research and Public Health. 2021; 18(12):6591. https://doi.org/10.3390/ijerph18126591
Chicago/Turabian StyleLi, Jingwei, Choon-Ling Sia, Zhuo Chen, and Wei Huang. 2021. "Enhancing Influenza Epidemics Forecasting Accuracy in China with Both Official and Unofficial Online News Articles, 2019–2020" International Journal of Environmental Research and Public Health 18, no. 12: 6591. https://doi.org/10.3390/ijerph18126591
APA StyleLi, J., Sia, C. -L., Chen, Z., & Huang, W. (2021). Enhancing Influenza Epidemics Forecasting Accuracy in China with Both Official and Unofficial Online News Articles, 2019–2020. International Journal of Environmental Research and Public Health, 18(12), 6591. https://doi.org/10.3390/ijerph18126591