The Role of Natural Language Processing during the COVID-19 Pandemic: Health Applications, Opportunities, and Challenges
Abstract
:1. Introduction
1.1. Motivation
1.2. Significance
1.3. Contributions
- A review of the various applications of NLP that can improve pandemic preparedness and response, and their potential use in future pandemics.
- A deliberation of lessons learned in different NLP application areas in each section, followed by comparisons and a summary of reviewed studies.
- A detailed presentation of research challenges and potential future directions. The challenges we present can be used as a guide for future studies that seek to advance the present health and social response systems and pandemic preparedness.
2. NLP for Electronic Health Records (EHRs)
3. NLP for Mental Health
4. NLP for Understanding Health Behaviors
5. NLP for Surveillance and Outbreak Prediction Systems
6. NLP for Fighting Misinformation
7. NLP for COVID-19 Question-Answering Systems
- I.
- COVID-19 Open Research Dataset (CORD-19) [80]: A recent initiative established by the Allen Institute for AI, which contains all COVID-19-related publications. The CORD-19 dataset is updated daily to include the latest relevant published papers from various databases (such as arXiv, bioRxiv, and medRxiv, Medline, and PubMed Central) [80,81]. CORD-19 has more than 160,000 articles, of which more than 70,000 are full text [5]. The motive behind releasing this dataset is “to mobilize researchers to apply for recent advances in NLP to produce new insights in support of the fight against this infectious disease” [80].
- II.
- COVID-QA dataset [82]: This dataset was created from scientific articles related to COVID-19 and annotated by volunteer biomedical experts. COVID-QA contains 2019 questions-and-answer pairs.
- III.
- COVID-QA dataset by [83]: This dataset contains 124 question-and-article pairs annotated from the CORD-19 dataset.
8. NLP for Knowledge Transfer
9. Opportunities and Challenges for NLP Applications during the COVID-19 Pandemic
9.1. The Nature of a Pandemic
9.2. Characteristics of Health Misinformation
9.3. Designing Clinically Applicable NLP Models
9.4. Synergic Implementation and Deployment
9.5. Sampling Bias on Social Media
9.6. Data Analysis Challenge
10. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Legido-Quigley, H.; Asgari, N.; Teo, Y.Y.; Leung, G.M.; Oshitani, H.; Fukuda, K.; Cook, A.R.; Hsu, L.Y.; Shibuya, K.; Heymann, D. Are high-performing health systems resilient against the COVID-19 epidemic? Lancet 2020, 395, 848–850. [Google Scholar] [CrossRef] [Green Version]
- El Bcheraoui, C.; Weishaar, H.; Pozo-Martin, F.; Hanefeld, J. Assessing COVID-19 through the lens of health systems’ preparedness: Time for a change. Glob. Health 2020, 16, 112. [Google Scholar] [CrossRef] [PubMed]
- Budd, J.; Miller, B.S.; Manning, E.M.; Lampos, V.; Zhuang, M.; Edelstein, M.; Rees, G.; Emery, V.C.; Stevens, M.M.; Keegan, N. Digital technologies in the public-health response to COVID-19. Nat. Med. 2020, 26, 1183–1192. [Google Scholar] [CrossRef] [PubMed]
- Venkatakrishnan, A.; Pawlowski, C.; Zemmour, D.; Hughes, T.; Anand, A.; Berner, G.; Kayal, N.; Puranik, A.; Conrad, I.; Bade, S. Mapping each pre-existing condition’s association to short-term and long-term COVID-19 complications. Npj Digit. Med. 2021, 4, 117. [Google Scholar] [CrossRef]
- Zarocostas, J. How to fight an infodemic. Lancet 2020, 395, 676. [Google Scholar] [CrossRef]
- Yan, R.; Liao, W.; Cui, J.; Zhang, H.; Hu, Y.; Zhao, D. Multilingual COVID-QA: Learning towards global information sharing via web question answering in multiple languages. In Proceedings of the Web Conference 2021, Ljubljana, Slovenia, 19–23 April 2021; pp. 2590–2600. [Google Scholar]
- Liu, H.; Liu, W.; Yoganathan, V.; Osburg, V.-S. COVID-19 information overload and generation Z’s social media discontinuance intention during the pandemic lockdown. Technol. Forecast. Soc. Chang. 2021, 166, 120600. [Google Scholar] [CrossRef]
- Poonia, S.K.; Rajasekaran, K. Information overload: A method to share updates among frontline staff during the COVID-19 pandemic. Otolaryngol. -Head Neck Surg. 2020, 163, 60–62. [Google Scholar] [CrossRef]
- Grabar, N.; Grouin, C. Year 2020 (with COVID): Observation of Scientific Literature on Clinical Natural Language Processing. Yearb. Med. Inform. 2021, 30, 257–263. [Google Scholar] [CrossRef]
- Guo, Y.; Zhang, Y.; Lyu, T.; Prosperi, M.; Wang, F.; Xu, H.; Bian, J. The application of artificial intelligence and data integration in COVID-19 studies: A scoping review. J. Am. Med. Inform. Assoc. 2021, 28, 2050–2067. [Google Scholar] [CrossRef]
- Chen, Q.; Leaman, R.; Allot, A.; Luo, L.; Wei, C.-H.; Yan, S.; Lu, Z. Artificial intelligence in action: Addressing the COVID-19 pandemic with natural language processing. Annu. Rev. Biomed. Data Sci. 2021, 4, 313–339. [Google Scholar] [CrossRef]
- Hallak, J.A.; Scanzera, A.; Azar, D.T.; Chan, R.P. Artificial intelligence in ophthalmology during COVID-19 and in the post COVID-19 era. Curr. Opin. Ophthalmol. 2020, 31, 447. [Google Scholar] [CrossRef] [PubMed]
- Chatterjee, A.; Nardi, C.; Oberije, C.; Lambin, P. Knowledge Graphs for COVID-19: An Exploratory Review of the Current Landscape. J. Pers. Med. 2021, 11, 300. [Google Scholar] [CrossRef] [PubMed]
- Abd-Alrazaq, A.; Alajlani, M.; Alhuwail, D.; Schneider, J.; Al-Kuwari, S.; Shah, Z.; Hamdi, M.; Househ, M. Artificial intelligence in the fight against COVID-19: Scoping review. J. Med. Internet Res. 2020, 22, e20756. [Google Scholar] [CrossRef] [PubMed]
- Tsao, S.-F.; Chen, H.; Tisseverasinghe, T.; Yang, Y.; Li, L.; Butt, Z.A. What social media told us in the time of COVID-19: A scoping review. Lancet Digit. Health 2021, 3, e175–e194. [Google Scholar] [CrossRef]
- Chen, J.; Wang, Y. Social Media Use for Health Purposes: Systematic Review. J. Med. Internet Res. 2021, 23, e17917. [Google Scholar] [CrossRef]
- Shorten, C.; Khoshgoftaar, T.M.; Furht, B. Deep Learning applications for COVID-19. J. Big Data 2021, 8, 18. [Google Scholar] [CrossRef]
- Lalmuanawma, S.; Hussain, J.; Chhakchhuak, L. Applications of machine learning and artificial intelligence for COVID-19 (SARS-CoV-2) pandemic: A review. Chaos Solitons Fractals 2020, 139, 110059. [Google Scholar] [CrossRef]
- Islam, M.M.; Karray, F.; Alhajj, R.; Zeng, J. A review on deep learning techniques for the diagnosis of novel coronavirus (COVID-19). IEEE Access 2021, 9, 30551–30572. [Google Scholar]
- De Felice, F.; Polimeni, A. Coronavirus disease (COVID-19): A machine learning bibliometric analysis. In Vivo 2020, 34, 1613–1617. [Google Scholar]
- Alzubaidi, M.; Zubaydi, H.D.; Bin-Salem, A.A.; Abd-Alrazaq, A.A.; Ahmed, A.; Househ, M. Role of deep learning in early detection of COVID-19: Scoping review. Comput. Methods Programs Biomed. Update 2021, 1, 100025. [Google Scholar]
- Hall, K.; Chang, V.; Jayne, C. A review on Natural Language Processing Models for COVID-19 research. Healthc. Anal. 2022, 2, 100078. [Google Scholar] [CrossRef]
- Topol, E.J. High-performance medicine: The convergence of human and artificial intelligence. Nat. Med. 2019, 25, 44–56. [Google Scholar] [CrossRef] [PubMed]
- Esteva, A.; Chou, K.; Yeung, S.; Naik, N.; Madani, A.; Mottaghi, A.; Liu, Y.; Topol, E.; Dean, J.; Socher, R. Deep learning-enabled medical computer vision. Npj Digit. Med. 2021, 4, 5. [Google Scholar] [CrossRef] [PubMed]
- Wang, F.; Casalino, L.P.; Khullar, D. Deep learning in medicine—Promise, progress, and challenges. JAMA Intern. Med. 2019, 179, 293–294. [Google Scholar] [CrossRef] [PubMed]
- Locke, S.; Bashall, A.; Al-Adely, S.; Moore, J.; Wilson, A.; Kitchen, G.B. Natural language processing in medicine: A review. Trends Anaesth. Crit. Care 2021, 38, 4–9. [Google Scholar] [CrossRef]
- Ching, T.; Himmelstein, D.S.; Beaulieu-Jones, B.K.; Kalinin, A.A.; Do, B.T.; Way, G.P.; Ferrero, E.; Agapow, P.-M.; Zietz, M.; Hoffman, M.M. Opportunities and obstacles for deep learning in biology and medicine. J. R. Soc. Interface 2018, 15, 20170387. [Google Scholar] [CrossRef] [Green Version]
- Van der Laak, J.; Litjens, G.; Ciompi, F. Deep learning in histopathology: The path to the clinic. Nat. Med. 2021, 27, 775–784. [Google Scholar] [CrossRef]
- Ohno-Machado, L. Realizing the full potential of electronic health records: The role of natural language processing. J. Am. Med. Inform. Assoc. 2011, 18, 539. [Google Scholar] [CrossRef]
- Neuraz, A.; Lerner, I.; Digan, W.; Paris, N.; Tsopra, R.; Rogier, A.; Baudoin, D.; Cohen, K.B.; Burgun, A.; Garcelon, N. Natural language processing for rapid response to emergent diseases: Case study of calcium channel blockers and hypertension in the COVID-19 pandemic. J. Med. Internet Res. 2020, 22, e20773. [Google Scholar] [CrossRef]
- Elkin, P.L.; Froehling, D.A.; Wahner-Roedler, D.L.; Brown, S.H.; Bailey, K.R. Comparison of natural language processing biosurveillance methods for identifying influenza from encounter notes. Ann. Intern. Med. 2012, 156, 11–18. [Google Scholar] [CrossRef]
- Barr, P.J.; Ryan, J.; Jacobson, N.C. Precision Assessment of COVID-19 Phenotypes Using Large-Scale Clinic Visit Audio Recordings: Harnessing the Power of Patient Voice. J. Med. Internet Res. 2021, 23, e20545. [Google Scholar] [CrossRef] [PubMed]
- Li, M.; Lang, M.; Deng, F.; Chang, K.; Buch, K.; Rincon, S.; Mehan, W.; Leslie-Mazwi, T.; Kalpathy-Cramer, J. Analysis of stroke detection during the COVID-19 pandemic using natural language processing of radiology reports. Am. J. Neuroradiol. 2021, 42, 429–434. [Google Scholar] [CrossRef] [PubMed]
- Schoening, V.; Liakoni, E.; Drewe, J.; Hammann, F. Automatic identification of risk factors for SARS-CoV-2 positivity and severe clinical outcomes of COVID-19 using Data Mining and Natural Language Processing. medRxiv 2021. [Google Scholar] [CrossRef]
- Wang, J.; Abu-el-Rub, N.; Gray, J.; Pham, H.A.; Zhou, Y.; Manion, F.J.; Liu, M.; Song, X.; Xu, H.; Rouhizadeh, M. COVID-19 SignSym: A fast adaptation of a general clinical NLP tool to identify and normalize COVID-19 signs and symptoms to OMOP common data model. J. Am. Med. Inform. Assoc. 2021, 28, 1275–1283. [Google Scholar] [CrossRef]
- Lybarger, K.; Ostendorf, M.; Thompson, M.; Yetisgen, M. Extracting COVID-19 diagnoses and symptoms from clinical text: A new annotated corpus and neural event extraction framework. J. Biomed. Inform. 2021, 117, 103761. [Google Scholar] [CrossRef]
- Izquierdo, J.L.; Ancochea, J.; Soriano, J.B.; Group, S.C.-R. Clinical characteristics and prognostic factors for intensive care unit admission of patients With COVID-19: Retrospective study using machine learning and natural language processing. J. Med. Internet Res. 2020, 22, e21801. [Google Scholar] [CrossRef]
- Fernandes, M.; Sun, H.; Jain, A.; Alabsi, H.S.; Brenner, L.N.; Ye, E.; Ge, W.; Collens, S.I.; Leone, M.J.; Das, S. Classification of the Disposition of Patients Hospitalized with COVID-19: Reading Discharge Summaries Using Natural Language Processing. JMIR Med. Inform. 2021, 9, e25457. [Google Scholar] [CrossRef]
- Chapman, A.B.; Peterson, K.S.; Turano, A.; Box, T.L.; Wallace, K.S.; Jones, M. A Natural Language Processing System for National COVID-19 Surveillance in the US Department of Veterans Affairs. Openreview 2020, 7, 1–7. [Google Scholar]
- Pfefferbaum, B.; North, C.S. Mental health and the COVID-19 pandemic. N. Engl. J. Med. 2020, 383, 510–512. [Google Scholar] [CrossRef]
- Xiong, J.; Lipsitz, O.; Nasri, F.; Lui, L.M.; Gill, H.; Phan, L.; Chen-Li, D.; Iacobucci, M.; Ho, R.; Majeed, A. Impact of COVID-19 pandemic on mental health in the general population: A systematic review. J. Affect. Disord. 2020, 277, 55–64. [Google Scholar] [CrossRef]
- Calvo, R.A.; Milne, D.N.; Hussain, M.S.; Christensen, H. Natural language processing in mental health applications using non-clinical texts. Nat. Lang. Eng. 2017, 23, 649–685. [Google Scholar] [CrossRef] [Green Version]
- Abd Rahman, R.; Omar, K.; Noah, S.A.M.; Danuri, M.S.N.M.; Al-Garadi, M.A. Application of machine learning methods in mental health detection: A systematic review. IEEE Access 2020, 8, 183952–183964. [Google Scholar] [CrossRef]
- Low, D.M.; Rumker, L.; Talkar, T.; Torous, J.; Cecchi, G.; Ghosh, S.S. Natural Language Processing Reveals Vulnerable Mental Health Support Groups and Heightened Health Anxiety on Reddit During COVID-19: Observational Study. J. Med. Internet Res. 2020, 22, e22635. [Google Scholar] [CrossRef]
- Li, I.; Li, Y.; Li, T.; Alvarez-Napagao, S.; Garcia-Gasulla, D.; Suzumura, T. What Are We Depressed About When We Talk About COVID-19: Mental Health Analysis on Tweets Using Natural Language Processing. In Artificial Intelligence XXXVII, Proceedings of the 40th SGAI International Conference on Artificial Intelligence, AI 2020, Cambridge, UK, 15–17 December 2020; Lecture Notes in Computer Science; Bramer, M., Ellis, R., Eds.; Springer: Cham, Switzerland; Volume 12498. [CrossRef]
- Lwin, M.O.; Lu, J.; Sheldenkar, A.; Schulz, P.J.; Shin, W.; Gupta, R.; Yang, Y. Global sentiments surrounding the COVID-19 pandemic on Twitter: Analysis of Twitter trends. JMIR Public Health Surveill. 2020, 6, e19447. [Google Scholar] [CrossRef]
- Oyebode, O.; Ndulue, C.; Adib, A.; Mulchandani, D.; Suruliraj, B.; Orji, F.A.; Chambers, C.; Meier, S.; Orji, R. Health, Psychosocial, and Social issues emanating from COVID-19 pandemic based on Social Media Comments using Text Mining and Thematic Analysis. JMIR Med. Inform. 2021, 9, e22734. [Google Scholar] [CrossRef] [PubMed]
- Sharma, R.; Pagadala, S.D.; Bharti, P.; Chellappan, S.; Schmidt, T.; Goyal, R. Assessing COVID-19 Impacts on College Students via Automated Processing of Free-form Text. arXiv 2020, arXiv:2012.09369. [Google Scholar]
- Olteanu, A.; Castillo, C.; Diaz, F.; Kıcıman, E. Social data: Biases, methodological pitfalls, and ethical boundaries. Front. Big Data 2019, 2, 13. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Howison, J.; Wiggins, A.; Crowston, K. Validity issues in the use of social network analysis with digital trace data. J. Assoc. Inf. Syst. 2011, 12, 2. [Google Scholar] [CrossRef] [Green Version]
- Chancellor, S.; De Choudhury, M. Methods in predictive techniques for mental health status on social media: A critical review. Npj Digit. Med. 2020, 3, 43. [Google Scholar] [CrossRef] [Green Version]
- Verspoor, K.; Cohen, K.B.; Conway, M.; De Bruijn, B.; Dredze, M.; Mihalcea, R.; Wallace, B.C. Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020, Virtual Meeting, 20 November 2020; Available online: https://aclanthology.org/2020.nlpcovid19-2.0.pdf (accessed on 1 October 2022).
- Kwon, J.; Grady, C.; Feliciano, J.T.; Fodeh, S.J. Defining facets of social distancing during the COVID-19 pandemic: Twitter analysis. J. Biomed. Inform. 2020, 111, 103601. [Google Scholar] [CrossRef]
- Sanders, A.C.; White, R.C.; Severson, L.S.; Ma, R.; McQueen, R.; Paulo, H.C.A.; Zhang, Y.; Erickson, J.S.; Bennett, K.P. Unmasking the conversation on masks: Natural language processing for topical sentiment analysis of COVID-19 Twitter discourse. AMIA Summits Transl. Sci. Proc. 2021, 2021, 555–564. [Google Scholar]
- He, L.; He, C.; Reynolds, T.L.; Bai, Q.; Huang, Y.; Li, C.; Zheng, K.; Chen, Y. Why do people oppose mask wearing? A comprehensive analysis of US tweets during the COVID-19 pandemic. J. Am. Med. Inform. Assoc. 2021, 28, 1564–1573. [Google Scholar] [CrossRef] [PubMed]
- Jang, H.; Rempel, E.; Roth, D.; Carenini, G.; Janjua, N.Z. Tracking COVID-19 Discourse on Twitter in North America: Infodemiology Study Using Topic Modeling and Aspect-Based Sentiment Analysis. J. Med. Internet Res. 2021, 23, e25431. [Google Scholar] [CrossRef] [PubMed]
- Cotfas, L.-A.; Delcea, C.; Roxin, I.; Ioanăş, C.; Gherai, D.S.; Tajariol, F. The Longest Month: Analyzing COVID-19 Vaccination Opinions Dynamics From Tweets in the Month Following the First Vaccine Announcement. IEEE Access 2021, 9, 33203–33223. [Google Scholar] [CrossRef]
- Eysenbach, G. Infodemiology: Tracking Flu-Related Searches on the Web for Syndromic Surveillance. In AMIA Annual Symposium Proceedings; American Medical Informatics Association: Bethesda, MD, USA; p. 244.
- Velardi, P.; Stilo, G.; Tozzi, A.E.; Gesualdo, F. Twitter mining for fine-grained syndromic surveillance. Artif. Intell. Med. 2014, 61, 153–163. [Google Scholar] [CrossRef]
- Eysenbach, G. Infodemiology and infoveillance: Framework for an emerging set of public health informatics methods to analyze search, communication and publication behavior on the Internet. J. Med. Internet Res. 2009, 11, e1157. [Google Scholar] [CrossRef]
- Brownstein, J.S.; Freifeld, C.C.; Madoff, L.C. Digital disease detection—Harnessing the Web for public health surveillance. N. Engl. J. Med. 2009, 360, 2153–2157. [Google Scholar] [CrossRef] [Green Version]
- Chew, C.; Eysenbach, G. Pandemics in the age of Twitter: Content analysis of Tweets during the 2009 H1N1 outbreak. PLoS ONE 2010, 5, e14118. [Google Scholar] [CrossRef]
- Broniatowski, D.A.; Paul, M.J.; Dredze, M. National and local influenza surveillance through Twitter: An analysis of the 2012-2013 influenza epidemic. PLoS ONE 2013, 8, e83672. [Google Scholar] [CrossRef] [Green Version]
- Lampos, V.; Cristianini, N. Tracking the Flu Pandemic by Monitoring the Social Web. In Proceedings of the 2010 2nd International Workshop on Cognitive Information Processing, Elba, Italy, 14–16 June 2010; pp. 411–416. [Google Scholar]
- Neumann, G.; Kawaoka, Y. Predicting the next influenza pandemics. J. Infect. Dis. 2019, 219, S14–S20. [Google Scholar] [CrossRef]
- Al-Garadi, M.A.; Khan, M.S.; Varathan, K.D.; Mujtaba, G.; Al-Kabsi, A.M. Using online social networks to track a pandemic: A systematic review. J. Biomed. Inform. 2016, 62, 1–11. [Google Scholar] [CrossRef]
- Lopreite, M.; Panzarasa, P.; Puliga, M.; Riccaboni, M. Early warnings of COVID-19 outbreaks across Europe from social media. Sci. Rep. 2021, 11, 2147. [Google Scholar] [CrossRef] [PubMed]
- Cinelli, M.; Quattrociocchi, W.; Galeazzi, A.; Valensise, C.M.; Brugnoli, E.; Schmidt, A.L.; Zola, P.; Zollo, F.; Scala, A. The COVID-19 social media infodemic. Sci. Rep. 2020, 10, 16598. [Google Scholar] [CrossRef] [PubMed]
- WHO. Novel Coronavirus (2019-nCoV) Situation Report—13; World Health Organization: Geneva, Switzerland, 2020. [Google Scholar]
- Tasnim, S.; Hossain, M.M.; Mazumder, H. Impact of rumors and misinformation on COVID-19 in social media. J. Prev. Med. Public Health 2020, 53, 171–174. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zhou, X.; Wu, J.; Zafarani, R. (SAFE): Similarity-Aware Multi-modal Fake News Detection. In Pacific-Asia Conference on Knowledge Discovery and Data Mining; Springer: Cham, Switzerland, 2020; pp. 354–367. [Google Scholar]
- Zhou, X.; Mulay, A.; Ferrara, E.; Zafarani, R. Recovery: A Multimodal Repository for COVID-19 News Credibility Research. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, New York, NY, USA, 19–23 October 2020; pp. 3205–3212. [Google Scholar]
- Patwa, P.; Sharma, S.; PYKL, S.; Guptha, V.; Kumari, G.; Akhtar, M.S.; Ekbal, A.; Das, A.; Chakraborty, T. Fighting an infodemic: COVID-19 fake news dataset. arXiv 2020, arXiv:2011.03327. [Google Scholar]
- Cui, L.; Lee, D. Coaid: COVID-19 healthcare misinformation dataset. arXiv 2020, arXiv:2006.00885. [Google Scholar]
- Dharawat, A.; Lourentzou, I.; Morales, A.; Zhai, C. Drink bleach or do what now? Covid-HeRA: A dataset for risk-informed health decision making in the presence of COVID19 misinformation. arXiv 2020, arXiv:2010.08743. [Google Scholar] [CrossRef]
- Memon, S.A.; Carley, K.M. Characterizing COVID-19 misinformation communities using a novel twitter dataset. arXiv 2020, arXiv:2008.00791. [Google Scholar]
- Vijjali, R.; Potluri, P.; Kumar, S.; Teki, S. Two stage transformer model for COVID-19 fake news detection and fact checking. arXiv 2020, arXiv:2011.13253. [Google Scholar]
- Pennycook, G.; Rand, D.G. Fighting misinformation on social media using crowdsourced judgments of news source quality. Proc. Natl. Acad. Sci. USA 2019, 116, 2521–2526. [Google Scholar] [CrossRef] [Green Version]
- Rathore, F.A.; Farooq, F. Information overload and infodemic in the COVID-19 pandemic. J. Pak. Med. Assoc. 2020, 70, 162–165. [Google Scholar] [CrossRef] [PubMed]
- Colavizza, G.; Costas, R.; Traag, V.A.; Van Eck, N.J.; Van Leeuwen, T.; Waltman, L. A scientometric overview of CORD-19. PLoS ONE 2021, 16, e0244839. [Google Scholar] [CrossRef] [PubMed]
- Wang, L.L.; Lo, K.; Chandrasekhar, Y.; Reas, R.; Yang, J.; Eide, D.; Funk, K.; Kinney, R.; Liu, Z.; Merrill, W. Cord-19: The COVID-19 open research dataset. arXiv 2020, arXiv:2004.10706v2. [Google Scholar]
- Möller, T.; Reina, A.; Jayakumar, R.; Pietsch, M. COVID-QA: A Question Answering Dataset for COVID-19. In Proceedings of the ACL 2020 Workshop on Natural Language Processing for COVID-19 (NLP-COVID), Seattle, DC, USA, 9 July 2020. [Google Scholar]
- Tang, R.; Nogueira, R.; Zhang, E.; Gupta, N.; Cam, P.; Cho, K.; Lin, J. Rapidly bootstrapping a question answering dataset for COVID-19. arXiv 2020, arXiv:2004.11339. [Google Scholar]
- Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; Liu, P.J. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv 2019, arXiv:1910.10683. [Google Scholar]
- Nogueira, R.; Jiang, Z.; Lin, J. Document ranking with a pretrained sequence-to-sequence model. arXiv 2020, arXiv:2003.06713. [Google Scholar]
- Su, D.; Xu, Y.; Winata, G.I.; Xu, P.; Kim, H.; Liu, Z.; Fung, P. Generalizing Question Answering System with Pre-Trained Language Model Fine-Tuning. In Proceedings of the 2nd Workshop on Machine Reading for Question Answering, Hong Kong, China, 4 November 2019; pp. 203–211. [Google Scholar]
- Lee, J.; Yoon, W.; Kim, S.; Kim, D.; Kim, S.; So, C.H.; Kang, J. BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 2020, 36, 1234–1240. [Google Scholar]
- Lewis, M.; Liu, Y.; Goyal, N.; Ghazvininejad, M.; Mohamed, A.; Levy, O.; Stoyanov, V.; Zettlemoyer, L. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv 2019, arXiv:1910.13461. [Google Scholar]
- Lan, Z.; Chen, M.; Goodman, S.; Gimpel, K.; Sharma, P.; Soricut, R. Albert: A lite bert for self-supervised learning of language representations. arXiv 2019, arXiv:1909.11942. [Google Scholar]
- Venkataram, H.S.; Mattmann, C.A.; Penberthy, S. TopiQAL: Topic-aware Question Answering using Scalable Domain-specific Supercomputers. Proceedings of 2020 IEEE/ACM Fourth Workshop on Deep Learning on Supercomputers (DLS), Atlanta, GA, USA, 11 November 2020; pp. 48–55. [Google Scholar]
- Lee, J.; Yi, S.S.; Jeong, M.; Sung, M.; Yoon, W.; Choi, Y.; Ko, M.; Kang, J. Answering questions on COVID-19 in real-time. arXiv 2020, arXiv:2006.15830. [Google Scholar]
- Reddy, R.G.; Iyer, B.; Sultan, M.A.; Zhang, R.; Sil, A.; Castelli, V.; Florian, R.; Roukos, S. End-to-End QA on COVID-19: Domain Adaptation with Synthetic Training. arXiv 2020, arXiv:2012.01414. [Google Scholar]
- Zhu, F.; Lei, W.; Wang, C.; Zheng, J.; Poria, S.; Chua, T.-S. Retrieving and reading: A comprehensive survey on open-domain question answering. arXiv 2021, arXiv:2101.00774. [Google Scholar]
- Bérard, A.; Kim, Z.M.; Nikoulina, V.; Park, E.L.; Gallé, M. A Multilingual Neural Machine Translation Model for Biomedical Data. arXiv 2020, arXiv:2008.02878. [Google Scholar]
- Arora, A.; Shrivastava, A.; Mohit, M.; Lecanda, L.S.-M.; Aly, A. Cross-lingual Transfer Learning for Intent Detection of COVID-19 Utterances. Openreview 2020, 1–8. [Google Scholar]
- Kruspe, A.; Häberle, M.; Kuhn, I.; Zhu, X.X. Cross-language sentiment analysis of European Twitter messages duringthe COVID-19 pandemic. arXiv 2020, arXiv:2008.12172. [Google Scholar]
- Okazaki, N.; Tsujii, J.I. Simple and Efficient Algorithm for Approximate Dictionary Matching. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), Beijing, China, 23–27 August 2010; pp. 851–859. [Google Scholar]
- Cury, R.C.; Megyeri, I.; Lindsey, T.; Macedo, R.; Batlle, J.; Kim, S.; Baker, B.; Harris, R.; Clark, R.H. Natural language processing and machine learning for detection of respiratory illness by chest ct imaging and tracking of COVID-19 pandemic in the us. Radiol. Cardiothorac. Imaging 2021, 3, e200596. [Google Scholar] [CrossRef]
- Obeid, J.S.; Davis, M.; Turner, M.; Meystre, S.M.; Heider, P.M.; O’Bryan, E.C.; Lenert, L.A. An artificial intelligence approach to COVID-19 infection risk assessment in virtual visits: A case report. J. Am. Med. Inform. Assoc. 2020, 27, 1321–1325. [Google Scholar] [CrossRef]
- Tabak, T.; Purver, M. Temporal Mental Health Dynamics on Social Media. arXiv 2020, arXiv:2008.13121. [Google Scholar]
- Micallef, N.; He, B.; Kumar, S.; Ahamad, M.; Memon, N. The Role of the Crowd in Countering Misinformation: A Case Study of the COVID-19 Infodemic. arXiv 2020, arXiv:2011.05773. [Google Scholar]
- Dan, S.; Xu, Y.; Yu, T.; Siddique, F.B.; Barezi, E.; Fung, P. CAiRE-COVID: A question answering and query-focused multi-document summarization system for COVID-19 scholarly information management. arXiv 2020, arXiv:2005.03975. [Google Scholar]
- Yang, Y.; Cer, D.; Ahmad, A.; Guo, M.; Law, J.; Constant, N.; Abrego, G.H.; Yuan, S.; Tar, C.; Sung, Y.-H. Multilingual universal sentence encoder for semantic retrieval. arXiv 2019, arXiv:1907.04307. [Google Scholar]
- Madhav, N.; Oppenheim, B.; Gallivan, M.; Mulembakani, P.; Rubin, E.; Wolfe, N. Pandemics: Risks, Impacts, and Mitigation. In Disease Control Priorities: Improving Health and Reducing Poverty, 3rd ed.; The International Bank for Reconstruction and Development/The World Bank: Washington, DC, USA, 2017. [Google Scholar]
- Jones, K.E.; Patel, N.G.; Levy, M.A.; Storeygard, A.; Balk, D.; Gittleman, J.L.; Daszak, P. Global trends in emerging infectious diseases. Nature 2008, 451, 990–993. [Google Scholar] [CrossRef] [PubMed]
- Gates, B. Responding to COVID-19—A once-in-a-century pandemic? N. Engl. J. Med. 2020, 382, 1677–1679. [Google Scholar] [CrossRef] [PubMed]
- CDC. Delta Variant: What We Know About the Science. Cent. Dis. Control. Prev. 2021. [Google Scholar]
- de Oliveira, N.R.; Pisa, P.S.; Lopez, M.A.; de Medeiros, D.S.V.; Mattos, D.M. Identifying Fake News on Social Networks Based on Natural Language Processing: Trends and Challenges. Information 2021, 12, 38. [Google Scholar] [CrossRef]
- Southwell, B.G.; Niederdeppe, J.; Cappella, J.N.; Gaysynsky, A.; Kelley, D.E.; Oh, A.; Peterson, E.B.; Chou, W.-Y.S. Misinformation as a misunderstood challenge to public health. Am. J. Prev. Med. 2019, 57, 282–285. [Google Scholar] [CrossRef]
- Stokes, D.C.; Andy, A.; Guntuku, S.C.; Ungar, L.H.; Merchant, R.M. Public priorities and concerns regarding COVID-19 in an online discussion forum: Longitudinal topic modeling. J. Gen. Intern. Med. 2020, 35, 2244–2247. [Google Scholar] [CrossRef]
- Wu, J.T.; Dernoncourt, F.; Gehrmann, S.; Tyler, P.D.; Moseley, E.T.; Carlson, E.T.; Grant, D.W.; Li, Y.; Welt, J.; Celi, L.A. Behind the scenes: A medical natural language processing project. Int. J. Med. Inform. 2018, 112, 68–73. [Google Scholar] [CrossRef]
- Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 2019, 1, 206–215. [Google Scholar] [CrossRef]
- Auxier, B.; Anderson, M. Social Media Use in 2021. Pew Research Center. 2021. Available online: https://www.pewresearch.org/internet/wp-content/uploads/sites/9/2021/04/PI_2021.04.07_Social-Media-Use_FINAL.pdf (accessed on 1 October 2022).
Study Reference | Application | Employed NLP Model | Data Source |
---|---|---|---|
[30] | EHRs |
| A multi-center study involving data from 39 hospitals |
[98] | EHRs | Keyword-extraction NLP that uses an unsupervised ML approach (clustering) | 450,114 patient CT comprehensive reports gathered from 1 January to October 2020 |
[99] | EHRs | Word frequency for text analytics and CNN trained using Word2vector as a classification model | Data are collected through telehealth visits, including 6813 patients, of whom 498 tested positive and 6315 tested negative |
[32] | EHRs | NLP model (medical named entity recognition) | Audio or video recordings of clinic visits |
[38] | EHRs | Multi-class logistic regression model trained n-gram features | The study cohort includes 1737 COVID-19 adult patients discharged from two hospitals in Boston, Massachusetts, between 10 March and 30 June 2022 |
[39] | EHRs | NLP rule-based pipeline | Data from VA Corporate Data Warehouse (CDW) include clinical data in 2020 between 1 January and 15 June |
[33] | EHRs | Random-forest trained on N-grams | 32,555 radiology reports from brain CTs and MRIs from a comprehensive stroke center |
[34] | EHRs | NLP rule-based pipeline | 6250 patients (5664 negative and 586 positives; 46,138 non-severe and 125 severe) |
[36] | EHRs | BERT and Bi-LSTM with attention | Annotated 1472 clinical notes distinguishing COVID-19 diagnoses, testing, and symptoms |
[35] | EHRs | NLP rule-based pipeline | NLP is validated on several datasets; the main one is related to COVID-19 and contains 50 posts (1162 sentences) of related dialogues |
[44] | Mental health | Supervised text classification used stochastic gradient descent linear classifier with L1 penalty TF-IDF grams with principal component analysis with k-NN used for unsupervised clustering. LDA is used in topic modeling. | Social media: Reddit Mental Health Dataset including posts from 826,961 unique users |
[45] | Mental health | BERT (ft) | Social media: 1000 English tweets for training the model and 1 million tweets included in the analysis |
[46] | Mental health | Sentiment analytic systems called CrystalFeel | Social media: Over 20 million COVID-19 tweets between 28 January and April 2020 |
[47] | Mental health | Key phrase extraction and sentiment score using lexicon-based technique | Social media: 47 million COVID-19- related comments extracted from Twitter, Facebook, and YouTube |
[100] | Mental health | Bi-directional LSTM and a self-attention layer | Social media: The diagnosed group has approximately 900,000 tweets from several countries. The control group has approximately 14 million tweets from several countries |
[48] | Mental health | Sentence-BERT (SBERT) | 9090 English free-form texts from 1451 students between 1 February and 30 April 2020 |
[52] | Health behaviors | BERT | 1.1 million COVID-19-related tweets from 181 counties in the US |
[54] | Health behaviors |
| 189,958,459 English COVID-19-related tweets COVID-19 between 17 March to 27 July 2020 |
[55] | Health behaviors | SVM, XGBoost, and LSTM | 771,268 tweets from the US between January and October 2020 |
[56] | Health behaviors | LDA for topic modeling andaspect-based sentiment analysis | English COVID-19 tweets are 25,595 for Canada and 293,929 for the US |
[57] | Health behaviors | BERT | 2,349,659 tweets related to COVID-19 vaccination 1 month after the first vaccine announcement |
[52] | Health behaviors | BERT | 1.1 million COVID-19-related tweets from 181 counties in the US |
[71] | Misinformation detection | Uses SAFE systems developed in [53] | 2029 news articles on COVID-19 (between January and May 2020) and 140,820 tweets that disclose how these news articles have circulated on Twitter |
[76] | Misinformation detection | NLP and network analysis method | 4573 annotated tweets comprising 3629 users |
[73] | Misinformation detection | SVM | 10,700 social media posts and articles of real and fake news on COVID-19 |
[101] | Misinformation detection | Sentence-BERT and BERTScore | 4800 expert-annotated social media posts |
[77] | Misinformation detection | BERT and ALBERT | 5500 claims and explanation pairs |
[90] | COVID QA systems | BERT and LDA | COVID-19 scientific publications: CORD-19 dataset |
[83] | COVID QA systems | T5 | COVID-19 scientific publications: CORD-19 dataset |
[102] | COVID QA systems | COVID-19 scientific publications: CORD-19 dataset | |
[91] | COVID QA systems | BioBERT | COVID-19 scientific publications: CORD-19 dataset, with additional 111 QA pairs annotated for test |
[92] | COVID QA systems | Synthetically generated QA examples to optimize the QA system performance on closed domains. The machine reading comprehension employs the Roberta model. | COVID-19 scientific publications: CORD-19 dataset |
[95] | Knowledge transfer | XLM-R Large | Dataset, M-CID, containing 5271 utterances across English, Spanish, French, and Spanglish |
[96] | Knowledge transfer | Multilingual Universal Sentence Encoder [103] | 4,683,226 geo-referenced tweets in 60 languages located in Europe |
[94] | Knowledge transfer | Variant transformers big architecture | The model is trained on more than 350 million sentences in French, Spanish, German, Italian, and Korean (into English) |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Al-Garadi, M.A.; Yang, Y.-C.; Sarker, A. The Role of Natural Language Processing during the COVID-19 Pandemic: Health Applications, Opportunities, and Challenges. Healthcare 2022, 10, 2270. https://doi.org/10.3390/healthcare10112270
Al-Garadi MA, Yang Y-C, Sarker A. The Role of Natural Language Processing during the COVID-19 Pandemic: Health Applications, Opportunities, and Challenges. Healthcare. 2022; 10(11):2270. https://doi.org/10.3390/healthcare10112270
Chicago/Turabian StyleAl-Garadi, Mohammed Ali, Yuan-Chi Yang, and Abeed Sarker. 2022. "The Role of Natural Language Processing during the COVID-19 Pandemic: Health Applications, Opportunities, and Challenges" Healthcare 10, no. 11: 2270. https://doi.org/10.3390/healthcare10112270
APA StyleAl-Garadi, M. A., Yang, Y. -C., & Sarker, A. (2022). The Role of Natural Language Processing during the COVID-19 Pandemic: Health Applications, Opportunities, and Challenges. Healthcare, 10(11), 2270. https://doi.org/10.3390/healthcare10112270