Next Article in Journal
Biologically Inspired Self-Organizing Computational Model to Mimic Infant Learning
Previous Article in Journal
A Multi-Input Machine Learning Approach to Classifying Sex Trafficking from Online Escort Advertisements
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Evaluating the Coverage and Depth of Latent Dirichlet Allocation Topic Model in Comparison with Human Coding of Qualitative Data: The Case of Education Research

1
School of Engineering Technology Purdue University, West Lafayette, IN 47906, USA
2
Center for Intercultural Learning, Mentorship, Assessment and Research (CILMAR), Purdue University, West Lafayette, IN 47906, USA
3
Department of Computer and Information Technology, Purdue University, West Lafayette, IN 47906, USA
*
Author to whom correspondence should be addressed.
Mach. Learn. Knowl. Extr. 2023, 5(2), 473-490; https://doi.org/10.3390/make5020029
Submission received: 13 March 2023 / Revised: 22 April 2023 / Accepted: 10 May 2023 / Published: 14 May 2023

Abstract

Fields in the social sciences, such as education research, have started to expand the use of computer-based research methods to supplement traditional research approaches. Natural language processing techniques, such as topic modeling, may support qualitative data analysis by providing early categories that researchers may interpret and refine. This study contributes to this body of research and answers the following research questions: (RQ1) What is the relative coverage of the latent Dirichlet allocation (LDA) topic model and human coding in terms of the breadth of the topics/themes extracted from the text collection? (RQ2) What is the relative depth or level of detail among identified topics using LDA topic models and human coding approaches? A dataset of student reflections was qualitatively analyzed using LDA topic modeling and human coding approaches, and the results were compared. The findings suggest that topic models can provide reliable coverage and depth of themes present in a textual collection comparable to human coding but require manual interpretation of topics. The breadth and depth of human coding output is heavily dependent on the expertise of coders and the size of the collection; these factors are better handled in the topic modeling approach.
Keywords: topic modeling; latent Dirichlet allocation; qualitative analysis; human coding; natural language processing; unsupervised machine learning topic modeling; latent Dirichlet allocation; qualitative analysis; human coding; natural language processing; unsupervised machine learning

Share and Cite

MDPI and ACS Style

Nanda, G.; Jaiswal, A.; Castellanos, H.; Zhou, Y.; Choi, A.; Magana, A.J. Evaluating the Coverage and Depth of Latent Dirichlet Allocation Topic Model in Comparison with Human Coding of Qualitative Data: The Case of Education Research. Mach. Learn. Knowl. Extr. 2023, 5, 473-490. https://doi.org/10.3390/make5020029

AMA Style

Nanda G, Jaiswal A, Castellanos H, Zhou Y, Choi A, Magana AJ. Evaluating the Coverage and Depth of Latent Dirichlet Allocation Topic Model in Comparison with Human Coding of Qualitative Data: The Case of Education Research. Machine Learning and Knowledge Extraction. 2023; 5(2):473-490. https://doi.org/10.3390/make5020029

Chicago/Turabian Style

Nanda, Gaurav, Aparajita Jaiswal, Hugo Castellanos, Yuzhe Zhou, Alex Choi, and Alejandra J. Magana. 2023. "Evaluating the Coverage and Depth of Latent Dirichlet Allocation Topic Model in Comparison with Human Coding of Qualitative Data: The Case of Education Research" Machine Learning and Knowledge Extraction 5, no. 2: 473-490. https://doi.org/10.3390/make5020029

APA Style

Nanda, G., Jaiswal, A., Castellanos, H., Zhou, Y., Choi, A., & Magana, A. J. (2023). Evaluating the Coverage and Depth of Latent Dirichlet Allocation Topic Model in Comparison with Human Coding of Qualitative Data: The Case of Education Research. Machine Learning and Knowledge Extraction, 5(2), 473-490. https://doi.org/10.3390/make5020029

Article Metrics

Back to TopTop