Comparison of Pretraining Models and Strategies for Health-Related Social Media Text Classification
Abstract
:1. Introduction
- We compare the performances of six models pretrained with texts from different domains and sourcesBERT [14] and RoBERTa [15] (generic text), BERTweet [22], and Twitter BERT (social media text, specifically Twitter) [20], BioClinical_BERT [18] (clinical text), and BioBERT [16] (biomedical literature text)—on 22 social media-based health-related text classification tasks.
- We perform TSPT using the masked language model (MLM) [40], and assess its impact on classification performance compared to other pretraining strategies for three tasks.
- We conduct an analysis of document-level embeddings at distinct stages of processing, namely pretraining and fine-tuning, to study how the embeddings are shifted by DAPT, SAPT, and TSPT.
- We summarize effective strategies to serve as guidance for future research in this space.
2. Methods
2.1. Evaluation
2.2. Data Preprocessing
2.3. Model Architectures
2.3.1. Statistical Significance
2.3.2. Document Embedding Transfer Evaluation
- : Default encoder without any modification
- : Encoder after pretraining
- : Encoder after pretraining and fine-tuning
2.4. Experiments
2.4.1. Data Collection and Preparation
2.4.2. Experimental Setup
3. Results
3.1. Comparison of Pretrained Models
3.2. Pretraining Results
3.3. Document Embedding Transfer Results
4. Discussion
4.1. Performance Comparisons
4.2. Embedding Transfer
4.3. Implications for Informatics Research
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Aggarwal, C.C.; Zhai, C. A Survey of Text Classification Algorithms. In Mining Text Data; Aggarwal, C., Zhai, C., Eds.; Springer: Boston, MA, USA, 2012; Volume 9781461432, pp. 163–222. [Google Scholar] [CrossRef]
- Shah, F.P.; Patel, V. A Review on Feature Selection and Feature Extraction for Text Classification. In Proceedings of the 2016 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), Chennai, India, 23–25 March 2016; pp. 2264–2268. [Google Scholar] [CrossRef]
- Uysal, A.K.; Gunal, S. A Novel Probabilistic Feature Selection Method for Text Classification. Knowl-Based Syst. 2012, 36, 226–235. [Google Scholar] [CrossRef]
- Yang, S.; Ding, Z.; Jian, H.; Councill, I.G.; Hongyuan, Z.; Giles, C.L. Boosting the Feature Space: Text Classification for Unstructured Data on the Web. In Proceedings of the IEEE International Conference on Data Mining, Hong Kong, China, 18–22 December 2006; pp. 1064–1069. [Google Scholar] [CrossRef]
- Gao, L.; Zhou, S.; Guan, J. Effectively Classifying Short Texts by Structured Sparse Representation With Dictionary Filtering. Inf. Sci. 2015, 323, 130–142. [Google Scholar] [CrossRef]
- Cortes, C.; Vapnik, V. Support-vector Networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Ho, T.K. Random Decision Forests. In Proceedings of the International Conference on Document Analysis and Recognition, Montreal, QC, Canada, 14–16 August 1995; IEEE Computer Society: Washington, DC, USA, 1995; Volume 1, pp. 278–282. [Google Scholar] [CrossRef]
- Walker, S.H.; Duncan, D.B. Estimation of the Probability of an Event as a Function of Several Independent Variables. Biometrika 1967, 54, 167. [Google Scholar] [CrossRef] [PubMed]
- McCray, A.T.; Aronson, A.R.; Browne, A.C.; Rindflesch, T.C.; Razi, A.; Srinivasan, S. UMLS® Knowledge for Biomedical Language Processing. Bull. Med. Libr. Assoc. 1993, 81, 184–194. [Google Scholar]
- Ahmed, N.; Dilmaç, F.; Alpkocak, A. Classification of Biomedical Texts for Cardiovascular Diseases with Deep Neural Network Using a Weighted Feature Representation Method. Healthcare 2020, 8, 392. [Google Scholar] [CrossRef]
- Wu, S.; Roberts, K.; Datta, S.; Du, J.; Ji, Z.; Si, Y.; Soni, S.; Wang, Q.; Wei, Q.; Xiang, Y.; et al. Deep Learning in Clinical Natural Language Processing: A Methodical Review. J. Am. Med. Inform. Assoc. 2020, 27, 457–470. [Google Scholar] [CrossRef]
- Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Distributed Representations of Words and Phrases and their Compositionality. In Proceedings of the NIPS’13: Proceedings of the 26th International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, 5–8 December 2013. [Google Scholar] [CrossRef]
- Pennington, J.; Socher, R.; Manning, C. GloVe: Global Vectors for Word Representation. In Proceedings of the Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; Association for Computational Linguistics: Doha, Qatar, 2014; pp. 1532–1543. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.W.; Lee, K.; Google, K.T.; Language, A.I. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the NAACL-HLT, Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
- Lee, J.; Yoon, W.; Kim, S.; Kim, D.; Kim, S.; So, C.H.; Kang, J. BioBERT: A Pre-trained Biomedical Language Representation Model for Biomedical Text Mining. Bioinformatics 2019, 36, 1234–1240. [Google Scholar] [CrossRef]
- Alsentzer, E.; Murphy, J.; Boag, W.; Weng, W.H.; Jindi, D.; Naumann, T.; McDermott, M. Publicly Available Clinical BERT Embeddings. In Proceedings of the 2nd Clinical Natural Language Processing Workshop, Minneapolis, MN, USA, 6–7 June 2019; Association for Computational Linguistics: Minneapolis, MN, USA, 2019; pp. 72–78. [Google Scholar] [CrossRef]
- Leroy, G.; Gu, Y.; Pettygrove, S.; Kurzius-Spencer, M. Automated Lexicon and Feature Construction Using Word Embedding and Clustering for Classification of ASD Diagnoses Using EHR. In Proceedings of the Natural Language Processing and Information Systems—22nd International Conference on Applications of Natural Language to Information Systems, Liege, Belgium, 21–23 June 2017; Frasincar, F., Ittoo, A., Metais, E., Nguyen, L., Eds.; Springer: Berlin/Heidelberg, Germany, 2017; pp. 34–37. [Google Scholar] [CrossRef]
- Gururangan, S.; Marasović, A.; Swayamdipta, S.; Lo, K.; Beltagy, I.; Downey, D.; Smith, N.A. Do not Stop Pretraining: Adapt Language Models to Domains and Tasks. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 8342–8360. [Google Scholar]
- Dai, X.; Karimi, S.; Hachey, B.; Paris, C. Cost-effective Selection of Pretraining Data: A Case Study of Pretraining BERT on Social Media. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP, Online, 16–20 November 2020; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 1675–1681. [Google Scholar] [CrossRef]
- Guo, Y.; Dong, X.; Al-Garadi, M.A.; Sarker, A.; Paris, C.; Mollá-Aliod, D. Benchmarking of Transformer-Based Pre-Trained Models on Social Media Text Classification Datasets. In Proceedings of the 18th Annual Workshop of the Australasian Language Technology Association, Online, 13–15 January 2021; pp. 86–91. [Google Scholar]
- Nguyen, D.Q.; Vu, T.; Tuan Nguyen, A. BERTweet: A Pre-trained Language Model for English Tweets. In Proceedings of the Empirical Methods in Natural Language Processing: System Demonstrations, Online, 16–20 November 2020; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 9–14. [Google Scholar] [CrossRef]
- Qudar, M.M.A.; Mago, V. TweetBERT: A Pretrained Language Representation Model for Twitter Text Analysis. arXiv 2020, arXiv:2010.11091. [Google Scholar]
- Conway, M.; Hu, M.; Chapman, W.W. Recent Advances in Using Natural Language Processing to Address Public Health Research Questions Using Social Media and ConsumerGenerated Data. Yearb. Med. Inform. 2019, 28, 208–217. [Google Scholar] [CrossRef] [PubMed]
- Gonzalez-Hernandez, G.; Sarker, A.; O’Connor, K.; Savova, G. Capturing the Patient’s Perspective: A Review of Advances in Natural Language Processing of Health-Related Text. Yearb. Med. Inform. 2017, 26, 214–227. [Google Scholar] [CrossRef]
- Paul, M.J.; Sarker, A.; Brownstein, J.S.; Nikfarjam, A.; Scotch, M.; Smith, K.L.; Gonzalez, G. Social Media Mining for Public Health Monitoring and Surveillance. In Proceedings of the Pacific Symposium on Biocomputing, Waimea, HI, USA, 4–8 January 2016; World Scientific Publishing Co. Pte Ltd.: Singapore, 2016; pp. 468–479. [Google Scholar] [CrossRef]
- Chou, W.Y.S.; Hunt, Y.M.; Beckjord, E.B.; Moser, R.P.; Hesse, B.W. Social Media Use in the United States: Implications for Health Communication. J. Med. Internet Res. 2009, 11, e1249. [Google Scholar] [CrossRef] [PubMed]
- Signorini, A.; Segre, A.M.; Polgreen, P.M. The Use of Twitter to Track Levels of Disease Activity and Public Concern in the U.S. during the Influenza A H1N1 Pandemic. PLoS ONE 2011, 6, e19467. [Google Scholar] [CrossRef]
- Ireland, M.E.; Schwartz, H.A.; Chen, Q.; Ungar, L.H.; Albarracín, D. Future-oriented Tweets Predict Lower County-level HIV Prevalence in the United States. Health Psychol. 2015, 34S, 1252–1260. [Google Scholar] [CrossRef] [PubMed]
- Nguyen, M.H.; Gruber, J.; Fuchs, J.; Marler, W.; Hunsaker, A.; Hargittai, E. Changes in Digital Communication during the COVID-19 Global Pandemic: Implications for Digital Inequality and Future Research. Soc. Media Soc. 2020, 6, 2056305120948255. [Google Scholar] [CrossRef]
- Dillon, G.; Hussain, R.; Loxton, D.; Rahman, S. Mental and Physical Health and Intimate Partner Violence Against Women: A Review of the Literature. Int. J. Fam. Med. 2013, 2013, 313909. [Google Scholar] [CrossRef]
- Leaman, R.; Wojtulewicz, L.; Sullivan, R.; Skariah, A.; Yang, J.; Gonzalez, G. Towards Internet-Age Pharmacovigilance: Extracting Adverse Drug Reactions from User Posts in Health-Related Social Networks. In Proceedings of the 2010 Workshop on Biomedical Natural Language Processing, Uppsala, Sweden, 15 July 2020; Association for Computational Linguistics: Uppsala, Sweden, 2010; pp. 117–125. [Google Scholar]
- Sarker, A.; Ginn, R.; Nikfarjam, A.; O’Connor, K.; Smith, K.; Jayaraman, S.; Upadhaya, T.; Gonzalez, G. Utilizing Social Media Data for Pharmacovigilance: A Review. J. Biomed. Inform. 2015, 54, 202–212. [Google Scholar] [CrossRef]
- Harpaz, R.; DuMouchel, W.; Shah, N.H.; Madigan, D.; Ryan, P.; Friedman, C. Novel Data-Mining Methodologies for Adverse Drug Event Discovery and Analysis. Clin. Pharmacol. Ther. 2012, 91, 1010–1021. [Google Scholar] [CrossRef]
- Forster, A.J.; Jennings, A.; Chow, C.; Leeder, C.; van Walraven, C. A Systematic Review to Evaluate the Accuracy of Electronic Adverse Drug Event Detection. J. Am. Med. Inform. Assoc. 2012, 19, 31–38. [Google Scholar] [CrossRef]
- Kumar, V. Challenges and Future Consideration for Pharmacovigilance. J. Pharmacovigil. 2013, 1, 1–3. [Google Scholar] [CrossRef]
- Névéol, A.; Dalianis, H.; Velupillai, S.; Savova, G.; Zweigenbaum, P. Clinical Natural Language Processing in languages other than English: Opportunities and challenges. J. Biomed. Semant. 2018, 9, 12. [Google Scholar] [CrossRef] [PubMed]
- Perera, S.; Sheth, A.; Thirunarayan, K.; Nair, S.; Shah, N. Challenges in Understanding Clinical Notes: Why Nlp Engines Fall Short and Where Background Knowledge Can Help. In Proceedings of the International Conference on Information and Knowledge Management, Francisco, CA, USA, 27 October–1 November 2013; pp. 21–26. [Google Scholar] [CrossRef]
- Sarker, A.; Gonzalez, G. Portable Automatic Text Classification for Adverse Drug Reaction Detection via Multi-Corpus Training. J. Biomed. Inform. 2015, 53, 196–207. [Google Scholar] [CrossRef]
- Salazar, J.; Liang, D.; Nguyen, T.Q.; Kirchhoff, K. Masked Language Model Scoring. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 2699–2712. [Google Scholar] [CrossRef]
- Al-Garadi, M.A.; Yang, Y.C.; Lakamana, S.; Lin, J.; Li, S.; Xie, A.; Hogg-Bremer, W.; Torres, M.; Banerjee, I.; Sarker, A. Automatic Breast Cancer Cohort Detection from Social Media for Studying Factors Affecting Patient-Centered Outcomes. In Proceedings of the 18th International Conference on Artificial Intelligence in Medicine, Minneapolis, MN, USA, 25–28 August 2020; Michalowski, M., Moskovitch, R., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 100–110. [Google Scholar]
- Al-Garadi, M.A.; Yang, Y.C.; Cai, H.; Ruan, Y.; O’Connor, K.; Graciela, G.H.; Perrone, J.; Sarker, A. Text Classification Models for the Automatic Detection of Nonmedical Prescription Medication Use From Social Media. BMC Med. Inform. Decis. Mak. 2021, 21, 27. [Google Scholar] [CrossRef]
- Nguyen, D.Q.; Vu, T.; Rahimi, A.; Dao, M.H.; Nguyen, L.T.; Doan, L. WNUT-2020 Task 2: Identification of Informative COVID-19 English Tweets. In Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), Online, 19 November 2020; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 314–318. [Google Scholar] [CrossRef]
- Sarker, A.; Belousov, M.; Friedrichs, J.; Hakala, K.; Kiritchenko, S.; Mehryary, F.; Han, S.; Tran, T.; Rios, A.; Kavuluru, R.; et al. Data and Systems for Medication-Related Text Classification and Concept Normalization From Twitter: Insights From the Social Media Mining for Health (SMM4H)-2017 Shared Task. J. Am. Med. Inform. Assoc. 2018, 25, 1274–1283. [Google Scholar] [CrossRef]
- Klein, A.Z.; Gonzalez-Hernandez, G. An Annotated Data Set for Identifying Women Reporting Adverse Pregnancy Outcomes on Twitter. Data Brief 2020, 32, 106249. [Google Scholar] [CrossRef] [PubMed]
- Magge, A.; Klein, A.Z.; Miranda-Escalada, A.; Al-Garadi, M.A.; Alimova, I.; Miftahutdinov, Z.; Lima López, S.; Flores, I.; O’connor, K.; Weissenbacher, D.; et al. Overview of the Sixth Social Media Mining for Health Applications (#SMM4H) Shared Tasks at NAACL 2021; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021. [Google Scholar]
- Gaur, M.; Aribandi, V.; Alambo, A.; Kursuncu, U.; Thirunarayan, K.; Beich, J.; Pathak, J.; Sheth, A. Characterization of Time-variant and Time-invariant Assessment of Suicidality on Reddit Using C-SSRS. PLoS ONE 2021, 16, e0250448. [Google Scholar] [CrossRef] [PubMed]
- Ghosh, S.; Misra, J.; Ghosh, S.; Podder, S. Utilizing Social Media for Identifying Drug Addiction and Recovery Intervention. In Proceedings of the IEEE International Conference on Big Data (Big Data), Atlanta, GA, USA, 10–13 December 2020; pp. 3413–3422. [Google Scholar] [CrossRef]
- Parapar, J.; Martín-Rodilla, P.; Losada, D.E.; Crestani, F. eRisk 2021: Pathological Gambling, Self-harm and Depression Challenges. In Proceedings of the Advances in Information Retrieval—42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, 14–17 April 2020; Hiemstra, D., Moens, M.F., Mothe, J., Perego, R., Potthast, M., Sebastiani, F., Eds.; Springer International Publishing: Cham, Switzerland, 2021; pp. 650–656. [Google Scholar]
- Carrillo-de Albornoz, J.; Rodriguez Vidal, J.; Plaza, L. Feature Engineering for Sentiment Analysis in e-health Forums. PLoS ONE 2018, 13, e0207996. [Google Scholar] [CrossRef]
- Paulus, R.; Pennington, J. Script for Preprocessing Tweets. Available online: https://nlp.stanford.edu/projects/glove/preprocess-twitter.rb (accessed on 4 July 2022).
- Koehn, P. Statistical Significance Tests for Machine Translation Evaluation. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain, 25–26 July 2004; pp. 388–395. [Google Scholar]
- Peters, M.E.; Neumann, M.; Iyyer, M.; Gardner, M.; Clark, C.; Lee, K.; Zettlemoyer, L. Deep Contextualized Word Representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologiesd, New Orleans, LA, USA, 1–6 June 2018; Association for Computational Linguistics (ACL): Stroudsburg, PA, USA, 2018; Volume 1, pp. 2227–2237. [Google Scholar] [CrossRef]
- Tenney, I.; Xia, P.; Chen, B.; Wang, A.; Poliak, A.; McCoy, R.T.; Kim, N.; Van Durme, B.; Bowman, S.R.; Das, D.; et al. What Do You Learn from Context? Probing for Sentence Structure in Contextualized Word Representations. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Hewitt, J.; Manning, C.D. A Structural Probe for Finding Syntax in Word Representations. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; Association for Computational Linguistics: Stroudsburg, PA, USA, 2019; pp. 4129–4138. [Google Scholar] [CrossRef]
- Sarker, A.; Lakamana, S.; Hogg-Bremer, W.; Xie, A.; Al-Garadi, M.A.; Yang, Y.C. Self-reported COVID-19 Symptoms on Twitter: An Analysis and A Research Resource. J. Am. Med. Inform. Assoc. 2020, 27, 1310–1315. [Google Scholar] [CrossRef]
- Cohen, J. A Coefficient of Agreement for Nominal Scales. Educ. Psychol. Meas. 1960, 20, 37–46. [Google Scholar] [CrossRef]
- Strubell, E.; Ganesh, A.; McCallum, A. Energy and Policy Considerations for Modern Deep Learning Research. In Proceedings of the AAAI Conference on Artificial Intelligence 2020, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 13693–13696. [Google Scholar] [CrossRef]
- Schwartz, R.; Dodge, J.; Smith, N.A.; Etzioni, O. Green AI. Commun. ACM 2020, 63, 54–63. [Google Scholar] [CrossRef]
ID | Task | Source | Evaluation Metric | TRN | TST | L | IAA |
---|---|---|---|---|---|---|---|
1 | ADR Detection | 4318 | 1152 | 2 | 0.71 | ||
2 | Breast Cancer | 3513 | 1204 | 2 | 0.85 | ||
3 | NPMU characterization | * | 11,829 | 3271 | 4 | 0.86 | |
4 | WNUT-20-T2 (informative COVID-19 tweet detection) | 6238 | 1000 | 2 | 0.80 | ||
5 | SMM4H-17-T1 (ADR detection) | 5340 | 6265 | 2 | 0.69 | ||
6 | SMM4H-17-T2 (medication consumption) | 7291 | 5929 | 3 | 0.88 | ||
7 | SMM4H-21-T1 (ADR detection) | 15,578 | 913 | 2 | - | ||
8 | SMM4H-21-T3a (regimen change on Twitter) | 5295 | 1572 | 2 | - | ||
9 | SMM4H-21-T3b (regimen change on WebMD) | WebMD | 9344 | 1297 | 2 | - | |
10 | SMM4H-21-T4 (adverse pregnancy outcomes) | 4926 | 973 | 2 | 0.90 | ||
11 | SMM4H-21-T5 (COVID-19 potential case) | 5790 | 716 | 2 | 0.77 | ||
12 | SMM4H-21-T6 (COVID-19 symptom) | 8188 | 500 | 3 | - | ||
13 | Suicidal Ideation Detection | 1695 | 553 | 6 | 0.88 | ||
14 | Drug Addiction and Recovery Intervention | 2032 | 601 | 5 | - | ||
15 | eRisk-21-T1 (Signs of Pathological Gambling) | 1511 | 481 | 2 | - | ||
16 | eRisk-21-T2 (Signs of Self-Harm) | 926 | 284 | 2 | - | ||
17 | Sentiment Analysis in EHF (Food Allergy Related) | MedHelp | 618 | 191 | 3 | 0.75 | |
18 | Sentiment Analysis in EHF (Crohn’s Disease Related) | MedHelp | 1056 | 317 | 3 | 0.72 | |
19 | Sentiment Analysis in EHF (Breast Cancer Related) | MedHelp | 551 | 161 | 3 | 0.75 | |
20 | Factuality Classification in EHF (Food Allergy Related) | MedHelp | 580 | 159 | 3 | 0.73 | |
21 | Factuality Classification in EHF (Crohn’s Disease Related) | MedHelp | 1018 | 323 | 3 | 0.75 | |
22 | Factuality Classification in EHF (Breast Cancer Related) | MedHelp | 524 | 161 | 3 | 0.75 |
Task | BERT | RoBERTa | BERTweet | Twitter BERT | BioClinical BERT | BioBERT |
---|---|---|---|---|---|---|
ADR Detection | 56.3 [48.3–63.6] | 60.6 [50.7–64.5] | 64.5 [58.4–70.6] | 57.6 [50.6–64.8] | 58.9 [51.7–65.3] | 60.2 [53.4–66.9] |
Breast Cancer | 84.7 [81.4–87.7] | 88.5 [85.2–90.3] | 87.4 [84.5–90.2] | 86.3 [83.3–89.1] | 83.0 [79.4–85.8] | 83.9 [80.4–86.9] |
NPMU | 59.5 [55.9–63.0] | 61.8 [54.1–61.5] | 64.9 [61.5–68.9] | 59.5 [56.0–63.3] | 56.8 [53.3–60.6] | 52.7 [49.2–56.4] |
WNUT-20-T2 (COVID-19) | 87.8 [85.7–90.0] | 88.7 [87.0–90.9] | 88.8 [86.2–90.9] | 87.1 [84.7–89.2] | 86.1 [83.9–88.4] | 87.4 [85.1–89.6] |
SMM4H-17-T1 (ADR detection) | 48.6 [44.6–52.8] | 53.4 [47.7–55.5] | 50.7 [46.6–54.7] | 47.6 [43.3–51.3] | 45.5 [41.5–49.1] | 44.5 [40.6–48.4] |
SMM4H-17-T2 (Medication consumption) | 76.8 [75.7–77.8] | 79.2 [76.9–79.1] | 79.8 [78.8–80.8] | 77.6 [76.6–78.7] | 74.7 [73.6–75.7] | 75.2 [74.2–76.3] |
SMM4H-21-T1 (ADR detection) | 68.3 [58.3–77.4] | 71.8 [62.1–80.4] | 66.2 [55.7–74.8] | 64.9 [53.0–73.9] | 64.9 [53.2–73.6] | 62.7 [51.0–72.3] |
SMM4H-21-T3a (Regimen change on Twitter) | 55.5 [48.2–62.7] | 62.1 [55.1–68.8] | 57.6 [50.7–64.7] | 54.0 [46.4–60.9] | 53.6 [46.3–60.6] | 55.0 [48.1–61.8] |
SMM4H-21-T3b (Regimen change on WebMD) | 87.7 [86.1–89.3] | 88.6 [86.9–90.1] | 87.6 [85.8–89.2] | 87.7 [85.9–89.4] | 86.7 [84.8–88.5] | 87.1 [85.3–88.9] |
SMM4H-21-T4 (Adverse pregnancy outcomes) | 86.0 [83.4–88.4] | 89.5 [87.0–91.4] | 88.8 [86.4–91.1] | 88.4 [86.3–90.7] | 83.4 [80.4–86.0] | 83.3 [80.4–85.9] |
SMM4H-21-T5 (COVID-19 potential case) | 69.5 [61.9–75.5] | 75.5 [68.9–81.0] | 71.0 [64.6–76.8] | 70.9 [64.2–76.8] | 65.0 [57.8–71.7] | 66.4 [59.0–72.9] |
SMM4H-21-T6 (COVID-19 symptom) | 98.4 [97.2–99.4] | 98.0 [96.6–99.2] | 98.2 [97.0–99.2] | 97.8 [96.4–99.0] | 97.8 [96.4–99.0] | 98.2 [97.0–99.2] |
Suicidal Ideation Detection | 63.9 [60.0–67.9] | 64.6 [60.4–68.6] | 63.3 [59.3–67.3] | 59.8 [56.0–64.0] | 61.7 [57.4–65.7] | 61.7 [57.4–66.1] |
Drug Addiction and Recovery Intervention | 71.9 [68.2–75.5] | 74.0 [70.4–77.5] | 71.9 [68.2–75.2] | 69.9 [66.2–73.4] | 69.7 [66.2–73.4] | 69.7 [66.1–73.2] |
eRisk-21-T1 (Signs of Pathological Gambling) | 73.9 [57.1–86.2] | 75.0 [59.1–87.7] | 67.9 [52.0–81.1] | 70.2 [54.5–81.8] | 68.1 [50.0–82.1] | 62.7 [45.5–76.4] |
eRisk-21-T2 (Signs of Self-Harm) | 49.1 [32.0–63.8] | 49.3 [34.4–62.9] | 48.6 [32.8–61.8] | 49.2 [34.0–64.0] | 40.0 [25.9–53.3] | 45.2 [27.6–60.0] |
EHF Sentiment Analysis (Food Allergy) | 74.3 [68.1–80.6] | 76.4 [70.2–82.7] | 74.3 [68.1–80.6] | 71.2 [64.4–77.5] | 71.7 [65.4–77.5] | 74.9 [68.6-80.6] |
EHF Sentiment Analysis (Crohn’s Disease) | 77.3 [72.6–81.7] | 79.2 [74.4–83.6] | 78.2 [73.5-82.6] | 75.4 [70.7–79.8] | 75.7 [71.3–80.1] | 75.7 [71.0–80.1] |
EHF Sentiment Analysis (Breast Cancer) | 73.9 [67.1–80.7] | 75.2 [68.3–81.4] | 70.8 [63.4–77.6] | 72.7 [65.8–79.5] | 73.9 [67.1–80.1] | 70.2 [62.7–77.6] |
EHF Factuality Classification (Food Allergy) | 76.1 [69.8-82.4] | 78.0 [71.1–83.6] | 76.1 [69.2–82.4] | 76.1 [69.2–83.0] | 70.4 [62.9–77.4] | 76.7 [69.8–83.6] |
EHF Factuality Classification (Crohn’s Disease) | 83.0 [78.9–87.3] | 85.4 [81.7–89.2] | 84.2 [80.2–88.2] | 84.8 [81.1–88.5] | 82.4 [78.0–86.1] | 81.4 [77.1–85.4] |
EHF Factuality Classification (Breast Cancer) | 75.8 [69.6-82.6] | 75.2 [67.7–82.0] | 77.0 [70.2–83.2] | 74.5 [67.1–80.7] | 75.8 [68.9–82.0] | 72.0 [64.6–78.9] |
Continual Pretraining Data | Initial Model | Breast Cancer | NPMU | COVID-19 | |||
---|---|---|---|---|---|---|---|
OpenWebText (generic) | RB | 87.6 [84.8–90.2] | 87.3 [84.4–90.4] | 59.5 [55.4–63.1] | 57.2 [53.5–61.1] | 89.2 [87.1–91.3] | 88.5 [86.2-90.6] |
BT | 86.5 [83.3–89.2] | 87.1 [84.1–89.8] | 61.6 [57.8–65.3] | 62.1 [58.2-65.2] | 88.5 [86.4-90.7] | 87.9 [85.8-90.1] | |
Twitter + off-topic (SAPT) | RB | 87.5 [84.5–90.1] | 86.4 [83.7–89.2] | 65.2 [61.5–68.6] | 64.7 [59.0–66.5] | 90.8 [88.8–92.6] | 89.2 [87.0–91.2] |
BT | 86.9 [83.9-89.4] | 87.6 [84.7-90.3] | 65.7 [62.3-69.0] | 64.7 [61.4–67.9] | 90.2 [88.0–92.1] | 90.1 [88.2–92.1] | |
Twitter + on-topic (SAPT + TSPT) | RB | 89.7 [87.1–92.0] | 88.9 [86.0–91.5] | 65.8 [62.5–69.2] | 66.0 [63.2–70.0] | 90.5 [88.4–92.1] | 91.2 [89.2–92.9] |
BT | 89.1 [86.4–91.6] | 89.5 [86.9–92.1] | 66.7 [63.5–69.9] | 68.0 [64.7–71.4] | 90.5 [88.4–92.4] | 91.1 [89.1–93.0] | |
PubMed + off-topic (DAPT) | RB | 85.1 [81.9–88.1] | - | 55.8 [51.9–59.3] | - | 89.0 [87.0–91.2] | - |
BT | 85.9 [83.0–88.7] | - | 58.8 [55.2–62.1] | - | 88.8 [87.0–91.0] | - | |
PubMed + on-topic (DAPT + TSPT) | RB | 85.8 [82.7–88.7] | - | 58.6 [55.1–62.4] | - | 89.8 [87.7–91.7] | - |
BT | 86.9 [84.0–89.5] | - | 60.2 [56.6–64.0] | - | 89.2 [87.1–91.3] | - | |
Data size | - | 298K | 1M | 586K | 1M | 272K | 1M |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Guo, Y.; Ge, Y.; Yang, Y.-C.; Al-Garadi, M.A.; Sarker, A. Comparison of Pretraining Models and Strategies for Health-Related Social Media Text Classification. Healthcare 2022, 10, 1478. https://doi.org/10.3390/healthcare10081478
Guo Y, Ge Y, Yang Y-C, Al-Garadi MA, Sarker A. Comparison of Pretraining Models and Strategies for Health-Related Social Media Text Classification. Healthcare. 2022; 10(8):1478. https://doi.org/10.3390/healthcare10081478
Chicago/Turabian StyleGuo, Yuting, Yao Ge, Yuan-Chi Yang, Mohammed Ali Al-Garadi, and Abeed Sarker. 2022. "Comparison of Pretraining Models and Strategies for Health-Related Social Media Text Classification" Healthcare 10, no. 8: 1478. https://doi.org/10.3390/healthcare10081478
APA StyleGuo, Y., Ge, Y., Yang, Y.-C., Al-Garadi, M. A., & Sarker, A. (2022). Comparison of Pretraining Models and Strategies for Health-Related Social Media Text Classification. Healthcare, 10(8), 1478. https://doi.org/10.3390/healthcare10081478