The Research Interest in ChatGPT and Other Natural Language Processing Tools from a Public Health Perspective: A Bibliometric Analysis

Favara, Giuliana; Barchitta, Martina; Maugeri, Andrea; Magnano San Lio, Roberta; Agodi, Antonella

doi:10.3390/informatics11020013

Open AccessArticle

The Research Interest in ChatGPT and Other Natural Language Processing Tools from a Public Health Perspective: A Bibliometric Analysis

Department of Medical and Surgical Sciences and Advanced Technologies “GF Ingrassia”, University of Catania, 95123 Catania, Italy

^*

Author to whom correspondence should be addressed.

Informatics 2024, 11(2), 13; https://doi.org/10.3390/informatics11020013

Submission received: 29 December 2023 / Revised: 13 March 2024 / Accepted: 20 March 2024 / Published: 22 March 2024

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Background: Natural language processing, such as ChatGPT, demonstrates growing potential across numerous research scenarios, also raising interest in its applications in public health and epidemiology. Here, we applied a bibliometric analysis for a systematic assessment of the current literature related to the applications of ChatGPT in epidemiology and public health. Methods: A bibliometric analysis was conducted on the Biblioshiny web-app, by collecting original articles indexed in the Scopus database between 2010 and 2023. Results: On a total of 3431 original medical articles, “Article” and “Conference paper”, mostly constituting the total of retrieved documents, highlighting that the term “ChatGPT” becomes an interesting topic from 2023. The annual publications escalated from 39 in 2010 to 719 in 2023, with an average annual growth rate of 25.1%. In terms of country production over time, the USA led with the highest overall production from 2010 to 2023. Concerning citations, the most frequently cited countries were the USA, UK, and China. Interestingly, Harvard Medical School emerges as the leading contributor, accounting for 18% of all articles among the top ten affiliations. Conclusions: Our study provides an overall examination of the existing research interest in ChatGPT’s applications for public health by outlining pivotal themes and uncovering emerging trends.

Keywords:

bibliometric analysis; public health; epidemiology; ChatGPT; artificial intelligence; natural language processing

1. Introduction

In the era of technological advancement, the intersection of artificial intelligence (AI) and public health has become increasingly significant [1]. In fact, the utilization of AI in medical and epidemiological research may be crucial in enhancing precision and accuracy, as well as improving efficiency in various aspects of the healthcare system.

The synergistic relationship between AI and public health not only underscores the transformative power of technology but also underscores its pivotal role in shaping the future landscape of healthcare. The nuanced capabilities of AI offer unprecedented opportunities to revolutionize healthcare challenges, paving the way for more effective solutions and improved patient outcomes [2,3].

In this context, Large Language Models (LLMs) represent a category of machine learning models meticulously crafted to produce text resembling human language. Within the medical domain, natural language processing (NLP) has captured significant attention, given its transformative potential to reshape medical research, patient care, and educational practices [4,5]. This potential arises from their ability to process extensive datasets with speed and precision surpassing human capabilities. As a versatile language model developed by OpenAI, ChatGPT (https://chat.openai.com/, accessed on 28 December 2023) has witnessed increasing exploration for its application in diverse medical realms, including research, clinical practice, and educational settings, surmounting the constraints posed by earlier Artificial Neural Network (ANN) models (e.g., Recurrent Neural Networks (RNNs) and Long Short-Term Memory Networks (LSTMs)) that faced limitations in comprehending the contextual nuances of a given language input [6].

The deployment of ChatGPT signifies a notable advancement in the realm of AI, specifically for its proficiency in capturing the subtleties and complexities inherent in natural human conversations. This capability empowers ChatGPT to produce responses tailored to the nuances of a diverse array of prompts. ChatGPT has found applications across diverse domains representing one of the largest publicly available language models [2,7,8]. ChatGPT demonstrates growing potential across numerous clinical and research scenarios, encompassing potential applications in medical and epidemiological research, ranging from identifying research topics to providing support for professionals in clinical and laboratory diagnoses [3,6,9].

The adoption of ChatGPT in public health and epidemiology may introduce a unique avenue for transforming education, prevention, and intervention strategies.

By leveraging NLP capabilities, ChatGPT can facilitate accessible and personalized interactions, delivering timely reminders for medication and lifestyle recommendations and addressing inquiries related to symptoms and available treatment options. Furthermore, the application of ChatGPT extends to enhancing patient engagement and education. Through natural language interactions, patients can communicate with ChatGPT and receive personalized responses tailored to their medical history, preferences, and specific clinical needs [10,11]. However, the integration of ChatGPT in public health poses a set of risks, including the accuracy and reliability of health information delivered. In fact, this is coupled with several limitations and ethical considerations, such as concerns regarding privacy, data security, and the potential for reinforcing health disparities, which need careful consideration [12]. For these reasons, understanding these implications is crucial in harnessing the full potential of ChatGPT within the public health domain [7,8,13].

With the increasing number of articles focusing on the utilization of ChatGPT in medical contexts, bibliometric analysis may serve as a valuable tool for uncovering and mapping the accumulated body of scientific knowledge. This methodology involves applying quantitative methods, such as citation analysis, to bibliometric data, including the number of citations and publications, as well as occurrences of keywords and topics. While the exploration of bibliometric methodology commenced in the 1950s, the widespread adoption of bibliometrics is a relatively recent phenomenon, holding great popularity in business and economics research [14,15,16,17].

The availability of bibliometric software has enhanced the bibliometric approach, enabling a comprehensive analysis of publications, emerging trends, collaboration patterns, and the identification of gaps in the research landscape. This capability also includes the efficient management of vast scientific datasets, resulting in a significant impact that plays a crucial role in unraveling and mapping cumulative scientific knowledge [17,18,19].

In brief, bibliometric analysis condenses extensive sets of bibliometric data to elucidate the current intellectual landscape and emerging trends within a particular research topic or field and nuanced developments within well-established fields by systematically interpreting extensive volumes of unstructured data. In comparison, a systematic review succinctly gathers and synthesizes the insights derived from the existing literature pertaining to a research topic or field. Concurrently, meta-analysis scrutinizes empirical evidence, unveiling relationships between variables and shedding light on connections not explored in prior studies [17].

The present paper explores the potential of employing a bibliometric approach for systematically examining the existing literature and knowledge gaps about the applications of NLP tools and ChatGPT for public health issues. To do this, the study critically assesses their application in epidemiology and public health research across diverse countries, identifying trends, assessing the geographical and disciplinary distribution of research, and uncovering potential gaps.

2. Materials and Methods

Analyses of Bibliometric Data

Analysis of bibliometric data was conducted using Biblioshiny, a web-app included in the Bibliometrix package (https://www.bibliometrix.org/home/index.php/layout/bibliometrix, 28 December 2023), which allows no coders to use the software. Bibliometrix is an open-source R-tool for executing a comprehensive science mapping analysis of scientific literature. Interestingly, Biblioshiny combines the functionality of the Bibliometrix package with the ease of use of web apps.

After defining the scope and research criteria, on 4 November 2023 a literature search was conducted to extract essential bibliographic information (i.e., titles, authors, affiliations, abstracts, publication years, and keywords), as well as to collect original articles published in English from 2010 to 2023.

An extensive bibliometric analysis was conducted on the Scopus database to identify articles that explicitly referenced the terms (“ChatGPT” OR “Chatbot*” OR “Natural Language Processing”) AND (“prevent*” OR “public health” OR “epidemiolog*”) within their titles, abstracts, or keywords. Scopus was selected as the database of choice due to its distinguished status as the largest repository of peer-reviewed scientific literature, encompassing a broad spectrum of subjects [20].

Following the import of bibliometric data in the Biblioshiny tool results about descriptive statistics (i.e., the number of documents, authors, sources, keywords, timespan, and the average number of citations) were obtained [21].

Consequently, tables and graphic visualizations were produced to illustrate the annual scientific production, top manuscripts based on citation count, the most prolific authors, the leading countries in terms of productivity, total citations per country, the most relevant journals, and the most significant keywords. Moreover, this analytical approach provides a nuanced and comprehensive understanding of the bibliometric landscape, including co-authorship networks, keyword co-occurrence maps, publication trends, collaboration networks, and identifying patterns, trends, and influential publications and/or authors.

3. Results

3.1. Descriptive Analyses of Bibliometric Data

We have identified 3431 original medical articles published in English and indexed in the Scopus database between 2010 and 2023. These articles originate from 1637 distinct sources, including journals and books, with an average age of the documents of 2.9 years. Significantly, the first two references, specifically the “Journal of Medical Internet Research” and the “Lecture Notes in Computer Science”, encompassing both “Subseries Lecture Notes in Artificial Intelligence” and “Lecture Notes in Bioinformatics (including subseries “Lecture Notes in Artificial Intelligence” and “Lecture Notes in Bioinformatics)” jointly account for nearly 35% of the total articles published by the first ten sources. Figure 1 illustrates the most pertinent sources based on the document count. The document types with the highest representation were “Article” and “Conference paper”, constituting 51% and 36% of the total retrieved documents, respectively. Moreover, the bibliometric analysis of all the articles yielded a total of 121,089 references.

The average number of citations per document stood at 13.8, with an additional metric indicating an average of 3.3 citations per document per year. Table 1 highlights the top ten articles based on citations, with total citation counts ranging from 424 to 1094. Furthermore, their total citations per year spanned from 40 to 179.

3.2. Evolving Trends in Research Interest

As shown in Figure 2, our bibliometric analysis reveals a noteworthy surge in the volume of publications spanning the years 2010 to 2023. Remarkably, the annual count of publications witnessed a substantial ascent, escalating from a modest 39 in 2010 to a substantial 719 in 2023. This substantial increase underscores an impressive average annual growth rate of 25.1%, portraying a robust and consistent expansion in the scholarly discourse related to the subject over this period.

Figure 3 illustrates the average total citations per year, providing a comprehensive view from 2010 to 2023. It is essential to note that the lower values observed post-2019 may be attributed to the diverse ages of the documents analyzed. As scholarly works accumulate citations over time, recent publications may naturally have fewer citations. This temporal consideration is crucial for a nuanced interpretation of the citation trends. Figure 4 depicts the evolution of topics from 2010 to 2023. Notably, the term “ChatGPT” emerges as a notable and interesting topic, gaining prominence particularly from the year 2023.

3.3. Leading Contributors and Collaborative Networks in Authorship

Within the selected articles, a total of 12,932 authors were identified, collectively contributing 16,872 appearances. Specifically, there were 167 single-authored documents authored by 158 distinct and independent authors. Consequently, the document-to-author ratio stood at 0.3, while the co-authors-per-document ratio was 4.9. Table 2 presents the top ten productive authors in terms of both total and fractionalized articles, with an impactful contribution ranging from 20 to 44 articles (3.8 to 8.3 fractionalized articles).

3.4. Top Countries of Contribution and Global Collaborative Networks

Figure 5 provides a comprehensive overview of the geographical distribution of publications by highlighting the top ten countries of corresponding authors per document, taking into account both publications originating from a single country and those involving international collaboration between authors. The visualization underscores that the vast majority of publications originated from a single country, indicating a concentrated contribution from a specific geographical region. Approximately 21.0% of the publications exhibit international collaboration, demonstrating a level of global engagement and cooperation in research efforts. Interestingly, the USA shows the highest number of publications originating from single- and multi-country collaboration.

Figure 6 highlights a significant trend in the global landscape of research contributions, clearly illustrating the dominant role played by the USA in the publications of the top ten corresponding author’s countries. In fact, the USA stands out as the singular contributor to 44.5% of all articles included in this analysis. Following the USA, China and India emerge as notable contributors, securing positions in the rankings.

Among the top ten corresponding author’s countries, those with the highest proportion of multi-country publications were Australia (34.2%), Canada (31.1%), and the United Kingdom (29.0%). Conversely, the countries with the lowest proportion were South Korea (16.4%), the USA (15.2%), and India (12.7%) (Figure 7).

In terms of country production over time, the USA led with the highest overall production from 2010 to 2023. Concerning citations, the most frequently cited countries were the USA, the UK, and China, as depicted in Figure 8.

Figure 9 provides a visual representation of the collaboration network among countries, offering valuable insights into the dynamics of global research partnerships. This visualization allows us to discern patterns of collaboration, identify key nodes in the research network, and understand the extent of international cooperation in the explored field.

In Figure 10, we are presented with a snapshot of the most influential affiliations in published articles, shedding light on the key players in the research landscape. Harvard Medical School stands out prominently as the foremost contributor, commanding a significant 18% share of all articles within the top ten affiliations. The Mayo Clinic closely follows, making a substantial impact at 11.3%, while King’s College London secures its position as a major contributor with an 11% share.

4. Discussion

To the best of our knowledge, our work is the first which aims to enrich the broader comprehension of ChatGPT’s impact on shaping public health discourse, by providing a comprehensive overview of the current research landscape, delineate key themes, and reveal nascent trends.

Here, we aim to not only contribute valuable insights to the ongoing discourse but also lay the groundwork for future investigations into the intricate interplay between ChatGPT and the dynamic realm of public health communication. Through our analysis, we also aspire to foster a deeper appreciation of the potential implications and applications of ChatGPT in shaping the narrative landscape of public health.

Through this rigorous examination, we endeavor to contribute not only to the ongoing discourse surrounding ChatGPT but also to offer valuable insights that could guide future research initiatives and inform decision-making processes in the realms of healthcare and academia. In addition, the current study aims to enhance our understanding of the implications, limitations, and potential avenues for leveraging ChatGPT in public health contexts. Our objective arises from the fact that ChatGPT stands out as one of the most popular LLMs, amassing over 100 million users within two months of its release [6,22]. In fact, its widespread use within the medical research community has generated a substantial body of literature, suggesting the need to gain deeper insights into the role of ChatGPT in Public Health [11,23].

Here, we conducted a comprehensive bibliometric analysis of ChatGPT to gain deeper insights into the response of the scientific and medical community to the utilization of ChatGPT in Public Health [20].

Our study offers profound insights into the growing utilization of ChatGPT in academic and medical environments. This is underscored by the types of publications featured in our analysis. Specifically, out of a total of 3431 documents published between 2010 and 2023, we observed that 1757 were articles and 1247 were conference papers, highlighting a growing interest in the significance and applicability of ChatGPT in the field of Public Health. Interestingly, the upward trend in the visibility of ChatGPT within scholarly and academic discussions suggests a growing interest and recognition of its relevance from 2010 to 2023. In delving into the potential applications of conversational AI-based tools within epidemiological research, our study culminated in a compelling evaluation of ChatGPT, specifically in its capacity to provide pertinent answers tailored to common inquiries related to both infectious and non-communicable disease prevention and control, thus effectively addressing pressing public health and epidemiological concerns. Our findings not only shed light on the adaptability and utility of ChatGPT in the epidemiological domain but also underscored its potential to contribute significantly to public health strategies. By providing nuanced and contextually relevant responses to specific queries, particularly those related to disease prevention, ChatGPT demonstrated its capacity to enhance communication and disseminate vital information in a user-friendly manner.

Furthermore, our work serves as a foundation for understanding the broader implications of incorporating conversational AI tools like ChatGPT in the realm of epidemiological research. As we continue to explore and refine the applications of these tools, the potential impact on public health communication and intervention strategies becomes increasingly evident, paving the way for more informed decision-making and proactive health management [24,25,26,27,28,29,30,31,32,33]. For instance, Hava and colleagues delved into the suitability of ChatGPT responses to frequently asked questions regarding breast cancer prevention and screening, as assessed by fellowship-trained breast radiologists. They revealed that approximately 90% of the generated responses were deemed appropriate [24]. In this context, Hermann and colleagues conducted the first study with the aim of assessing the accuracy of ChatGPT in responding to frequently asked questions related to specific aspects of gynecological health, providing accurate answers to questions concerning cervical cancer prevention and survivorship but less precise responses related to diagnosis and treatment [25].

A similar approach was adopted by Yeo and colleagues, who noted that ChatGPT offers a limited number of comprehensive answers to questions about cirrhosis and hepatocellular carcinoma (HCC) [26]. In a similar vein, two studies conducted by Cao and Rahsepar both observed that ChatGPT inconsistently delivered precise information by posing questions related to liver and lung cancer, respectively. In fact, responses often contained contradictory or misleading information, if not outright inaccuracies, posing potential implications for medical management and the potential to impact patient outcomes [27,28]. Interestingly, two studies also discussed the potential role of ChatGPT for the prevention and control of the outbreak of infectious diseases [29,30]. Kizito and colleagues denoted the potential opportunities of ChatGPT in enhancing the care and management of people living with HIV, providing a resource for patients seeking information about antiretroviral therapy (ART) [2,30,34]. Nevertheless, a study conducted by Cheng and colleagues revealed that ChatGPT is unable to speculate or offer a conclusive answer regarding the origin and transmission of Monkeypox, as well as the future trends in confirmed cases [29].

In light of the critical significance of accurate information concerning vaccines, it may be crucial to gauge the capacity of the ChatGPT tool to provide accurate information regarding vaccines and immunization. With this in mind, Deiana and colleagues explored the potential of ChatGPT in enhancing health literacy and mitigating vaccine hesitancy, according to the answers given to a list of eleven myths and misconceptions about vaccines [31]. Sohail and colleagues applied a similar approach by asking five different questions regarding the COVID-19 vaccine [32]. A recent study also suggests the potential applications and limitations of ChatGPT for diagnosing, managing, and prognosis cardiovascular and cerebrovascular disease [33].

In our work, compiling the top ten most cited articles serves the purpose of providing insights into the global interest in the field of NLP applications in Public Health, with a specific focus on ChatGPT. This analysis offers a framework to explore the articles that have garnered the highest number of citations, thereby enhancing the overall understanding and significance of the field’s development and current research landscape. For example, the second most referenced study indicates that over the past few decades, deep neural networks (DNNs) have achieved significant success across various applications, such as computer vision and NLP. This is noteworthy, especially when taking into account the high predictive accuracy demonstrated by DNNs on extensive datasets derived from Merck’s drug discovery initiatives [35]. Hence, the subsequent highly cited manuscript serves as a comprehensive review outlining how Deep Learning (DL) could offer a viable strategy to revolutionize the field of ophthalmology. It delves into various challenges, both clinical and technical, along with medicolegal concerns and the factor of patient acceptance that still requires consideration. Notably, the manuscript proposed some innovative applications of DL in ocular imaging, encompassing fundus photographs, optical coherence tomography, and visual fields, as promising solutions for the screening, diagnosis, and monitoring of eye pathological conditions [36]. The article ranked fifth in terms of citations is a comprehensive review, encompassing a wide range of digital innovations aimed at bolstering the global public health response to coronavirus disease 2019 (COVID-19) and other infectious diseases. Additionally, the review explores the various challenges of machine learning and NLP, including legal, ethical, and privacy considerations, that may hinder the implementation of these innovations [37]. Therefore, the sixth most cited work introduced NLP algorithms as a computational phenotyping method for extracting information from electronic health records. This method, in particular, has the potential to aid in the early diagnosis of asthma, which is recognized as a heterogeneous disease with distinct phenotypes and endotypes requiring comprehensive characterization [38]. Thus, the tenth research in the list of the top 10 most cited manuscripts suggests the utility of NLP in countering the spread of misinformation concerning COVID-19 etiology, outcomes, and prevention. To address this issue, NLP can play a crucial role in detecting and removing scientifically unfounded online content across all social media platforms [39]. The examination of current research on ChatGPT within the realm of public health has uncovered areas that have been extensively studied while also highlighting significant gaps requiring further investigation. Although promising outcomes have been observed in domains such as breast cancer prevention and chronic disease management, it is evident that there are still unexplored and pivotal sectors crucial for advancing public health. For instance, the explored studies did not comprehensively address global public health challenges (e.g., pandemic management, maternal and child health, mental health, or access to healthcare in resource-limited settings), suggesting that expanding research in these directions could yield valuable insights. Despite some mention of chronic diseases, there is a lack of understanding regarding ChatGPT’s potential to prevent non-communicable diseases like diabetes, heart disease, and obesity. Thus, a thorough assessment of its impact in these contexts would be fitting. Moreover, investigating how ChatGPT can be utilized to tackle disparities in healthcare access or enhance communication in multilingual contexts is a pertinent subject. Moreover, our work suggested that ChatGPT was not yet employed to address challenges related to health equity should be explored, as well as to explore domains pertaining to Infection Prevention and Control (IPC) and Antimicrobial Resistance (AMR).

While the integration of ChatGPT in public health holds promise, it is crucial to delve into the downsides of implementing ChatGPT in public health, which need to be considered. Among these precision constraints that are referred to the system’s limited accuracy in delivering precise and reliable information crucial in the dynamic field of public health. In addition, inherent biases within the training data can be reflected in ChatGPT’s responses, potentially perpetuating or amplifying existing biases within the public health domain. Also, contextual deficiency is related to the model’s struggle to grasp and incorporate context effectively, which may result in responses that lack the nuanced understanding required for addressing complex public health scenarios. While promising, ChatGPT may encounter difficulties in maintaining meaningful and engaging interactions. This limitation could impact its effectiveness in specific public health communication contexts where interaction quality is paramount. Moreover, the tool’s inability to directly interact with healthcare professionals restricts its potential to offer personalized and expert-guided insights tailored to specific medical queries, limiting its scope in providing comprehensive healthcare information.

For these reasons, addressing these challenges is imperative for the responsible and beneficial deployment of ChatGPT in public health settings, ensuring its full potential is realized while mitigating potential drawbacks [10]. In our work, the distribution of corresponding authors in ChatGPT-related articles reveals a notable concentration in the United States, China, and India, with these countries contributing the highest number of publications. This pattern likely signifies a significant level of research activity and engagement with ChatGPT in these regions, potentially driven by well-established research infrastructure, technological advancements, and academic interest in AI. Conversely, Italy’s position as the last among the top ten countries suggests a comparatively lower level of involvement in producing ChatGPT-related research articles. This could be influenced by factors such as the scale of AI research initiatives, funding availability, or the prioritization of other research topics within the Italian academic and scientific community. This suggests that it is essential to delve deeper into the specific dynamics of each country’s research landscape to understand the factors contributing to these geographical variations in ChatGPT-related publications [6].

Nevertheless, the utilization of ChatGPT raises significant ethical and practical concerns, particularly in the medical field, where apprehensions regarding the potential impact on public health exist. In fact, the notion of an ‘infodemic’ is gaining prominence in discussions related to public health. The rapid text generation capability of LLMs could potentially amplify the dissemination of misinformation on an unprecedented scale, giving rise to what can be termed an ‘AI-driven infodemic’. This term refers to the use of LLMs to generate an extensive volume of human-like texts without any scientific foundation or support [12].

With this in mind, it is crucial to consider certain limitations when interpreting our findings. Firstly, the metric employed in our bibliometric analysis methodology relies on literature citations. In this context, the recent widespread adoption of ChatGPT may have influenced its incorporation into relevant publications, potentially leading to the oversight of impactful articles and a subsequent lack of citations.

Secondly, bibliometric data may present an incomplete representation of research interest, as a significant portion of research may remain unpublished or be disseminated through non-indexed sources, eluding inclusion in bibliometric analyses. Third, our analysis primarily considers publications in English, potentially introducing a language bias. Relevant contributions in other languages might be overlooked, leading to a partial representation of the global research landscape. Fourth, the ethical implications of AI applications, including ChatGPT, are intricate and multifaceted. However, our analysis did not comprehensively capture the ethical dimensions associated with the utilization of ChatGPT in medical research, with ethical considerations that may vary across different regions and cultural contexts.

Fifth, although our attention was directed towards ChatGPT in the medical field, given its open-source nature, it is worth noting that there are other specialized tools with the potential to make substantial contributions to the medical research community. These alternatives merit thorough consideration and exploration, as they could offer unique advantages and insights into various aspects of medical research, from natural language understanding to data analysis and disease tracking (i.e., tracking infectious disease outbreaks, mapping biomedical text, answering biomedical questions). Lastly, we employed a comprehensive search strategy designed to encompass not only the specific contribution of ChatGPT in public health but also to broadly examine AI chatbots and NLP applications within the context of public health issues. Our literature search spans work published from 2010 to 2023, aiming to delve into the timeframe during which the interest in NLP shifted towards ChatGPT. For these reasons, we explicitly included the term “ChatGPT” in our search criteria, indicating its emergence as a recognized term around the year 2023.

5. Conclusions

Our research endeavors to summarize not only the current state but also the potential evolution of ChatGPT’s impact on epidemiological research. By incorporating a bibliometric approach, we delve into the quantitative analysis of scholarly publications, citations, and trends related to ChatGPT, providing a comprehensive panorama of its influence. This multifaceted exploration not only assists in identifying gaps and areas of interest for future investigation but also facilitates a deeper understanding of how ChatGPT intersects with diverse aspects of epidemiology and public health. As a strategic roadmap for future research, our findings serve as a compass for researchers, policymakers, and practitioners navigating the dynamic landscape of AI in public health. The insights gleaned from our bibliometric analysis offer a bird’s-eye view of the scholarly discourse surrounding ChatGPT, informing decision-makers on potential applications, challenges, and emerging opportunities. In essence, our work contributes not only to the academic discourse on AI language models in epidemiological research but also provides a practical guide for leveraging these technologies responsibly in the service of public health. By fostering an enriched understanding of the evolving landscape, we aim to empower stakeholders to make informed decisions that positively impact the intersection of artificial intelligence and epidemiological inquiry.

Author Contributions

Conceptualization, G.F. and A.A.; methodology, G.F., M.B., R.M.S.L. and A.M.; software, G.F., M.B., R.M.S.L. and A.M.; formal analysis, G.F., R.M.S.L. and A.M.; data curation, G.F., M.B., R.M.S.L. and A.M.; writing—original draft preparation, G.F.; writing—review and editing, all the authors; supervision, A.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dataset used in this work is available upon reasonable request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Kahambing, J.G. ChatGPT, public health communication and ‘intelligent patient companionship’. J. Public Health 2023, 45, e590. [Google Scholar] [CrossRef] [PubMed]
Dave, T.; Athaluri, S.A.; Singh, S. ChatGPT in medicine: An overview of its applications, advantages, limitations, future prospects, and ethical considerations. Front. Artif. Intell. 2023, 6, 1169595. [Google Scholar] [CrossRef] [PubMed]
Cascella, M.; Montomoli, J.; Bellini, V.; Bignami, E. Evaluating the Feasibility of ChatGPT in Healthcare: An Analysis of Multiple Clinical and Research Scenarios. J. Med. Syst. 2023, 47, 33. [Google Scholar] [CrossRef] [PubMed]
Kung, T.H.; Cheatham, M.; Medenilla, A.; Sillos, C.; De Leon, L.; Elepaño, C.; Madriaga, M.; Aggabao, R.; Diaz-Candido, G.; Maningo, J.; et al. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLoS Digit. Health 2023, 2, e0000198. [Google Scholar] [CrossRef]
Li, H.; Moon, J.T.; Purkayastha, S.; Celi, L.A.; Trivedi, H.; Gichoya, J.W. Ethics of large language models in medicine and medical research. Lancet Digit. Health 2023, 5, e333–e335. [Google Scholar] [CrossRef] [PubMed]
Barrington, N.M.; Gupta, N.; Musmar, B.; Doyle, D.; Panico, N.; Godbole, N.; Reardon, T.; D’Amico, R.S. A Bibliometric Analysis of the Rise of ChatGPT in Medical Research. Med. Sci. 2023, 11, 61. [Google Scholar] [CrossRef] [PubMed]
Liebrenz, M.; Schleifer, R.; Buadze, A.; Bhugra, D.; Smith, A. Generating scholarly content with ChatGPT: Ethical challenges for medical publishing. Lancet Digit. Health 2023, 5, e105–e106. [Google Scholar] [CrossRef] [PubMed]
van Dis, E.A.M.; Bollen, J.; Zuidema, W.; van Rooij, R.; Bockting, C.L. ChatGPT: Five priorities for research. Nature 2023, 614, 224–226. [Google Scholar] [CrossRef]
Baumgartner, C. The potential impact of ChatGPT in clinical and translational medicine. Clin. Transl. Med. 2023, 13, e1206. [Google Scholar] [CrossRef]
Biswas, S.S. Role of Chat GPT in Public Health. Ann. Biomed. Eng. 2023, 51, 868–869. [Google Scholar] [CrossRef] [PubMed]
Morita, P.P.; Abhari, S.; Kaur, J.; Lotto, M.; Miranda, P.A.D.S.; Oetomo, A. Applying ChatGPT in public health: A SWOT and PESTLE analysis. Front. Public Health 2023, 11, 1225861. [Google Scholar] [CrossRef] [PubMed]
De Angelis, L.; Baglivo, F.; Arzilli, G.; Privitera, G.P.; Ferragina, P.; Tozzi, A.E.; Rizzo, C. ChatGPT and the rise of large language models: The new AI-driven infodemic threat in public health. Front. Public Health 2023, 11, 1166120. [Google Scholar] [CrossRef] [PubMed]
King, M.R.; chatGPT. A Conversation on Artificial Intelligence, Chatbots, and Plagiarism in Higher Education. Cell. Mol. Bioeng. 2023, 16, 1–2. [Google Scholar] [CrossRef] [PubMed]
Zhong, M.; Lin, M. Bibliometric analysis for economy in COVID-19 pandemic. Heliyon 2022, 8, e10757. [Google Scholar] [CrossRef] [PubMed]
Prabakusuma, A.S.; Wardono, B.; Fahlevi, M.; Zulham, A.; Djoko Sunarno, M.T.; Syukur, M.; Aljuaid, M.; Saniuk, S.; Apriliani, T.; Pramoda, R. A bibliometric approach to understanding the recent development of self-sufficient fish feed production utilizing agri-food wastes and by-products towards sustainable aquaculture. Heliyon 2023, 9, e17573. [Google Scholar] [CrossRef] [PubMed]
Camón Luis, E.; Celma, D. Circular Economy. A Review and Bibliometric Analysis. Sustainability 2020, 12, 6381. [Google Scholar] [CrossRef]
Donthu, N.; Kumar, S.; Mukherjee, D.; Pandey, N.; Marc Lim, W. How to conduct a bibliometric analysis: An overview and guidelines. J. Bus. Res. 2021, 133, 285–296. [Google Scholar] [CrossRef]
Sevillano-Jimenez, J.; Carrión-Chambilla, M.; Espinoza-Lecca, E.; Contreras-Pulache, H.; Moya-Salazar, J. A bibliometric analysis of 47-years of research on public health in Peru. Electron. J. Gen. Med. 2023, 20, em488. [Google Scholar] [CrossRef] [PubMed]
Moura, L.K.B.; de Mesquita, R.F.; Mobin, M.; Matos, F.T.C.; Monte, T.L.; Lago, E.C.; Falcão, C.A.M.; de Arêa Leão Ferraz, M.Â.; Santos, T.C.; Sousa, L.R.M. Uses of Bibliometric Techniques in Public Health Research. Iran J. Public Health 2017, 46, 1435–1436. [Google Scholar]
Md Khudzari, J.; Kurian, J.; Tartakovsky, B.; Raghavan, G.S.V. Bibliometric analysis of global research trends on microbial fuel cells using Scopus database. Biochem. Eng. J. 2018, 136, 51–60. [Google Scholar] [CrossRef]
Aria, M.; Cuccurullo, C. bibliometrix: An R-tool for comprehensive science mapping analysis. J. Informetr. 2017, 11, 959–975. [Google Scholar] [CrossRef]
Shah, N.H.; Entwistle, D.; Pfeffer, M.A. Creation and Adoption of Large Language Models in Medicine. JAMA 2023, 330, 866–869. [Google Scholar] [CrossRef] [PubMed]
Frosolini, A.; Gennaro, P.; Cascino, F.; Gabriele, G. In Reference to “Role of Chat GPT in Public Health”, to Highlight the AI’s Incorrect Reference Generation. Ann. Biomed. Eng. 2023, 51, 2120–2122. [Google Scholar] [CrossRef] [PubMed]
Haver, H.L.; Ambinder, E.B.; Bahl, M.; Oluyemi, E.T.; Jeudy, J.; Yi, P.H. Appropriateness of Breast Cancer Prevention and Screening Recommendations Provided by ChatGPT. Radiology 2023, 307, e230424. [Google Scholar] [CrossRef] [PubMed]
Hermann, C.E.; Patel, J.M.; Boyd, L.; Growdon, W.B.; Aviki, E.; Stasenko, M. Let’s chat about cervical cancer: Assessing the accuracy of ChatGPT responses to cervical cancer questions. Gynecol. Oncol. 2023, 179, 164–168. [Google Scholar] [CrossRef] [PubMed]
Yeo, Y.H.; Samaan, J.S.; Ng, W.H.; Ting, P.S.; Trivedi, H.; Vipani, A.; Ayoub, W.; Yang, J.D.; Liran, O.; Spiegel, B.; et al. Assessing the performance of ChatGPT in answering questions regarding cirrhosis and hepatocellular carcinoma. Clin. Mol. Hepatol. 2023, 29, 721–732. [Google Scholar] [CrossRef] [PubMed]
Cao, J.J.; Kwon, D.H.; Ghaziani, T.T.; Kwo, P.; Tse, G.; Kesselman, A.; Kamaya, A.; Tse, J.R. Accuracy of Information Provided by ChatGPT Regarding Liver Cancer Surveillance and Diagnosis. AJR Am. J. Roentgenol. 2023, 221, 556–559. [Google Scholar] [CrossRef] [PubMed]
Rahsepar, A.A.; Tavakoli, N.; Kim, G.H.J.; Hassani, C.; Abtin, F.; Bedayat, A. How AI Responds to Common Lung Cancer Questions: ChatGPT vs Google Bard. Radiology 2023, 307, e230922. [Google Scholar] [CrossRef] [PubMed]
Cheng, K.; He, Y.; Li, C.; Xie, R.; Lu, Y.; Gu, S.; Wu, H. Talk with ChatGPT About the Outbreak of Mpox in 2022: Reflections and Suggestions from AI Dimensions. Ann. Biomed. Eng. 2023, 51, 870–874. [Google Scholar] [CrossRef]
Kizito, S. ChatGPT has the potential to enhance antiretroviral therapy adherence among adolescents with HIV in sub-Saharan Africa. Med. Educ. Online 2023, 28, 2246781. [Google Scholar] [CrossRef] [PubMed]
Deiana, G.; Dettori, M.; Arghittu, A.; Azara, A.; Gabutti, G.; Castiglia, P. Artificial Intelligence and Public Health: Evaluating ChatGPT Responses to Vaccination Myths and Misconceptions. Vaccines 2023, 11, 1217. [Google Scholar] [CrossRef]
Sohail, S.S.; Madsen, D.; Farhat, F.; Alam, M.A. ChatGPT and Vaccines: Can AI Chatbots Boost Awareness and Uptake? Ann. Biomed. Eng. 2023, 52, 446–450. [Google Scholar] [CrossRef] [PubMed]
Chlorogiannis, D.D.; Apostolos, A.; Chlorogiannis, A.; Palaiodimos, L.; Giannakoulas, G.; Pargaonkar, S.; Xesfingi, S.; Kokkinidis, D.G. The Role of ChatGPT in the Advancement of Diagnosis, Management, and Prognosis of Cardiovascular and Cerebrovascular Disease. Healthcare 2023, 11, 2906. [Google Scholar] [CrossRef]
Sallam, M. ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns. Healthcare 2023, 11, 887. [Google Scholar] [CrossRef] [PubMed]
Ma, J.; Sheridan, R.P.; Liaw, A.; Dahl, G.E.; Svetnik, V. Deep neural nets as a method for quantitative structure-activity relationships. J. Chem. Inf. Model. 2015, 55, 263–274. [Google Scholar] [CrossRef]
Ting, D.S.W.; Pasquale, L.R.; Peng, L.; Campbell, J.P.; Lee, A.Y.; Raman, R.; Tan, G.S.W.; Schmetterer, L.; Keane, P.A.; Wong, T.Y. Artificial intelligence and deep learning in ophthalmology. Br. J. Ophthalmol. 2019, 103, 167–175. [Google Scholar] [CrossRef] [PubMed]
Budd, J.; Miller, B.S.; Manning, E.M.; Lampos, V.; Zhuang, M.; Edelstein, M.; Rees, G.; Emery, V.C.; Stevens, M.M.; Keegan, N.; et al. Digital technologies in the public-health response to COVID-19. Nat. Med. 2020, 26, 1183–1192. [Google Scholar] [CrossRef] [PubMed]
Dharmage, S.C.; Perret, J.L.; Custovic, A. Epidemiology of Asthma in Children and Adults. Front. Pediatr. 2019, 7, 246. [Google Scholar] [CrossRef] [PubMed]
Tasnim, S.; Hossain, M.M.; Mazumder, H. Impact of Rumors and Misinformation on COVID-19 in Social Media. J. Prev. Med. Public Health 2020, 53, 171–174. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Most relevant sources in terms of number of documents.

Figure 2. Annual scientific production from 2010 to 2023.

Figure 3. Average citations per year.

Figure 4. Trend topics from 2010 to 2023.

Figure 5. Corresponding author’s countries per number of documents.

Figure 6. Country scientific production.

Figure 7. Country articles production over time.

Figure 8. Top ten of most cited countries.

Figure 9. Countries’ collaboration networks.

Figure 10. Top ten affiliations with the utmost relevance in published articles.

Table 1. Top ten of most cited manuscripts.

Ranking	First Author	Year	Sources	DOI	Total Citations	Total Citations per Year	Normalized Total Citations
1	Socher R	2012	Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning	NA	1094	912	182
2	Ma J	2015	Journal of Chemical Information and Modeling	10.1021/ci500747n	760	844	257
3	Ting DSW	2018	The British Journal of Ophthalmology	10.1136/bjophthalmol-2018-313173	615	1230	210
4	Zhu L	2019	Advances in Neural Information Processing Systems	NA	591	1182	202
5	Budd J	2020	Nature Medicine	10.1038/s41591-020-1011-4	538	1345	288
6	Dharmage SC	2019	Frontiers in Pediatrics	10.3389/fped.2019.00246	537	1074	183
7	Xue L	2021	Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies	NA	536	1787	493
8	Aramaki E	2011	Conference on Empirical Methods in Natural Language Processing	NA	517	398	156
9	Kumar V	2019	Chemosphere	10.1016/j.chemosphere.2019.124364	425	850	145
10	Tasnim S	2020	Journal of Preventive Medicine and Public Health	10.3961/jpmph.20.094	424	1060	227

Table 2. Top ten of most productive authors.

Ranking	Authors	Articles	Authors	Articles Fractionalized
1	Wang Y	44	Wang Y	8.3
2	Liu H	31	Zhang Y	6.0
3	Zhang Y	27	Li Y	5.3
4	Li J	26	Li J	5.2
5	Li Y	26	Sarker A	4.8
6	Liu Y	26	Liu Y	4.8
7	Wang J	25	Liu H	4.2
8	Sarker A	23	Wang H	4.1
9	Wang H	20	Wang J	4.1
10	Wang X	20	Wang X	3.8

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Favara, G.; Barchitta, M.; Maugeri, A.; Magnano San Lio, R.; Agodi, A. The Research Interest in ChatGPT and Other Natural Language Processing Tools from a Public Health Perspective: A Bibliometric Analysis. Informatics 2024, 11, 13. https://doi.org/10.3390/informatics11020013

AMA Style

Favara G, Barchitta M, Maugeri A, Magnano San Lio R, Agodi A. The Research Interest in ChatGPT and Other Natural Language Processing Tools from a Public Health Perspective: A Bibliometric Analysis. Informatics. 2024; 11(2):13. https://doi.org/10.3390/informatics11020013

Chicago/Turabian Style

Favara, Giuliana, Martina Barchitta, Andrea Maugeri, Roberta Magnano San Lio, and Antonella Agodi. 2024. "The Research Interest in ChatGPT and Other Natural Language Processing Tools from a Public Health Perspective: A Bibliometric Analysis" Informatics 11, no. 2: 13. https://doi.org/10.3390/informatics11020013

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

The Research Interest in ChatGPT and Other Natural Language Processing Tools from a Public Health Perspective: A Bibliometric Analysis

Abstract

1. Introduction

2. Materials and Methods

Analyses of Bibliometric Data

3. Results

3.1. Descriptive Analyses of Bibliometric Data

3.2. Evolving Trends in Research Interest

3.3. Leading Contributors and Collaborative Networks in Authorship

3.4. Top Countries of Contribution and Global Collaborative Networks

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI