Machine Learning Applications in Prediction Models for COVID-19: A Bibliometric Analysis

Lv, Hai; Liu, Yangyang; Yin, Huimin; Xi, Jingzhi; Wei, Pingmin

doi:10.3390/info15090575

Open AccessArticle

Machine Learning Applications in Prediction Models for COVID-19: A Bibliometric Analysis

by

Hai Lv

,

Yangyang Liu

,

Huimin Yin

,

Jingzhi Xi

and

Pingmin Wei

^*

Key Laboratory of Environmental Medicine Engineering, Ministry of Education, School of Public Health, Southeast University, Nanjing 210009, China

^*

Author to whom correspondence should be addressed.

Information 2024, 15(9), 575; https://doi.org/10.3390/info15090575

Submission received: 12 August 2024 / Revised: 16 September 2024 / Accepted: 16 September 2024 / Published: 18 September 2024

(This article belongs to the Special Issue Real-World Applications of Machine Learning Techniques)

Download

Browse Figures

Versions Notes

Abstract

The COVID-19 pandemic has had a profound impact on global health, inspiring the widespread use of machine learning in combating the disease, particularly in prediction models. This study aimed to assess academic publications utilizing machine learning prediction models to combat COVID-19. We analyzed 2422 original articles published between 2020 and 2023 with bibliometric tools such as Histcite Pro 2.1, Bibliometrix, CiteSpace, and VOSviewer. The United States, China, and India emerged as the most prolific countries, with Stanford University producing the most publications and Huazhong University of Science and Technology receiving the most citations. The National Natural Science Foundation of China and the National Institutes of Health have made significant contributions to this field. Scientific Reports is the most frequent journal for publishing these articles. Current research focuses on deep learning, federated learning, image classification, air pollution, mental health, sentiment analysis, and drug repurposing. In conclusion, this study provides detailed insights into the key authors, countries, institutions, funding agencies, and journals in the field, as well as the most frequently used keywords.

Keywords:

COVID-19; machine learning; artificial intelligence; prediction models; bibliometrics

Graphical Abstract

1. Introduction

COVID-19, a highly contagious disease caused by the SARS-CoV-2, first appeared in December 2019. SARS-CoV-2 is part of the βCoV genera within the coronaviridae family, and its potent toxicity can cause severe acute respiratory symptoms and even death [1,2]. On 30 January 2020, the WHO declared COVID-19 a public health emergency of international concern (PHEIC) and named the SARS-CoV-2 outbreak a pandemic two months later, indicating that it was not containable [3,4]. As of now, COVID-19 has caused over 770 million confirmed infections and more than 7 million reported deaths worldwide [5] and contributed to a reduction in global life expectancy [6]. In addition to health effects, COVID-19 has also affected education, the environment, and the economy. The health emergency led to immense economic disruptions throughout the world, resulting in declines in consumption and investment [7]. Multiple studies have investigated changes in the atmospheric environment during the COVID-19 pandemic and found that restrictions implemented to control the spread of the virus led to significant decreases in nitrogen oxide (NOx) concentrations and PM2.5 levels in the air [8,9]. Also, the pandemic has led to a change in schools and their functioning, such as online teaching during the quarantine period [10]. Although the World Health Organization announced on 5 May 2023 that the virus no longer constituted a PHEIC, the COVID-19 virus has not disappeared and is still spreading worldwide due to its continued mutation. The current variant landscape is dominated by Omicron descendent lineages [11]. Compared with ancestral variants, Omicron subvariants have begun to evolve toward decreased intrinsic pathogenicity, increased transmissibility, and enhanced immune escape [12].

To tackle this worldwide health crisis, artificial intelligence (AI) and machine learning (ML) are being called analytical tools to combat the COVID-19 pandemic. Machine learning is a subset of artificial intelligence. Machine learning models have been shown to have better performance compared to traditional predictive models [13,14]. There are lots of studies performed for the prediction of different diseases employing machine learning techniques, such as cardiac disease prediction [15], the early detection of Parkinson’s disease [16], and breast cancer prediction [17]. In the early stages of the pandemic, Wynants et al. [18] conducted a systematic review of prediction models used for COVID-19 diagnosis and prognosis and found that prediction models entered the academic literature at an unprecedented speed. In particular, an effective prognostic model would contribute to the customization of medical strategies to the needs of individual patients, enabling precision medicine strategies that increase the probability of complete recovery [19]. These predictive systems can help in decision making to manage diseases very effectively by guiding early interventions. Today, machine learning prediction models have extended their applications beyond patient identification and epidemic trend prediction to include vaccine development, drug repurposing, and molecular dynamics.

In this study, we conducted a comprehensive bibliometric analysis of the literature on machine learning methods used for COVID-19 predictive models, aiming to offer insights into the current research landscape. Bibliometric data help identify publication counts, collaborations between researchers and institutions, and the most influential journals in a field [20]. While several bibliometric studies on machine learning techniques applied to COVID-19 have been conducted [21,22,23,24], they often fail to reflect the current state of research due to their inclusion of earlier literature. Our study addresses this gap by focusing on more recent publications, specifically those related to predictive models. Moreover, this analysis critically examines the limitations of existing models and the challenges facing ongoing research in this domain. In this study, the Web of Science Core Collection (WOSCC) and four bibliometric analysis tools—Bibliometrix, HistCite, VOSviewer, and CiteSpace—were used for bibliometric and visualization analysis. This research aims to carry out the following:

(1): Investigate the output and trends of publications in the field of machine learning applications in COVID-19 prediction models.
(2): Identify major contributors, including key authors, countries/regions, institutions, and journals.
(3): Identify cooperation networks between countries/regions, institutions, and authors.
(4): Explore key themes, hotspots, and research trends.
(5): Provide insights into current research directions and suggest opportunities for future research in this field.

2. Materials and Methods

2.1. Data Sources

The data were extracted from SCI-EXPANDED of the Web of Science Core Collection, which is the most widely used and authoritative database in the world with the literature format required for bibliometric analysis. Two researchers (HL and YYL) independently conducted the search to ensure the reliability of the results. In cases of disagreement, the two researchers discussed and reached a consensus.

The search terms were set to TS = (COVID-19 OR 2019 Novel Coronavirus Disease OR coronavirus 2019 OR coronavirus disease 2019 OR 2019-nCoV OR SARS-CoV-2 OR Severe acute respiratory syndrome coronavirus 2) AND TS = (predict * OR forecast *) AND TS = (Machine Learning). The inclusion criteria were defined as follows: (1) timespan: 1 January 2020 to 31 December 2023 (publication date); (2) document type: articles; and (3) language: English. We excluded materials such as proceeding papers, review articles, editorials, and letters, as well as articles that did not directly focus on the application of machine learning to COVID-19 prediction models. Following this process, a total of 2422 articles were retrieved. To avoid changes in search results due to database upgrades, on 8 January 2024, the search was conducted, and all retrieved documents were exported to plain text files in the form of “Full Record and Cited References”. Figure 1 presents a flowchart outlining the publication selection process for this bibliometric analysis, following a structure similar to the PRISMA 2020 guidelines.

2.2. Data Visualization and Analysis

Bibliometrix is an R-tool for comprehensive science mapping analysis developed by Massimo Aria and Corrado Cuccurullo [25]. It was used for quantitative research in scientometrics and bibliometrics and was performed using R version 4.1.3 (R Foundation for Statistical Computing, Vienna, Austria).

HistCite Pro 2.1 is a software tool for bibliometric analysis and visualization of citation data [26]. It was used to perform descriptive statistical analysis. The following indicators were analyzed: number of publications, country of publication, institution of publication, journal of publication, authors, local citation score (LCS), global citation score (GCS), and H-index. LCS is the number of times a particular document has been cited in the local collection, while GCS shows the citation frequency based on the total count in the WOS database. The H-index indicates that at most H papers published by a scientist or country have been cited at least H times. Microsoft Office Excel 2019 was used to calculate and graph the data.

VOSviewer is a computer program jointly developed by Van Eck and Waltman for constructing and visualizing bibliometric networks [27]. It is characterized by its use of visualization of similarity (VoS) to construct network graphs. VOSviewer 1.6.20 is used to visualize collaborative relationships between countries and institutions and to perform keyword co-occurrence analysis.

CiteSpace is a Java-based program developed by Professor Chen Chaomei for data analysis and visualization [28]. It allows for setting time slices, which is advantageous for temporal analysis. CiteSpace 6.1.R6 was used to generate keyword clustering maps for four-year slices to reveal research bases and hotspots [29].

3. Results

3.1. The Research Status of ML in Prediction Related to COVID-19

The current status of machine learning techniques in prediction related to COVID-19 research was described using Bibliometrix. A total of 2422 articles were retrieved from 2020 to 2023, covering 718 journals, with an annual publication growth rate of 60.24%. There were 19,753 authors, with a single author contributing 80 articles. Authors engaged in international collaborations accounted for 37.41%. Each article had an average of 9–10 authors; 5400 keywords were provided, and 85,065 references were cited. The average life span of each paper from its initial recognition to obscurity was 2.14 years, and each article had been cited an average of 12–13 times (Figure 2). The trends of publications and citations from 2020 to 2023 are shown in Figure 3. The number of publications increased significantly in 2021 but showed a slight decrease in 2023, with the number of citations consistent with this.

3.2. Analysis of Top Contributing Authors

VOSviewer was used to analyze the author collaboration network, focusing on authors who had published at least two articles and were cited at least 100 times. Out of 13,888 authors, 127 met these thresholds, resulting in 35 clusters (Figure 4). The size of the circles in the network visualization indicates the number of publications, and the lines between nodes indicate that the authors have collaborated as co-authors on the same articles. The analysis revealed that only the red and yellow clusters are connected, while the remaining clusters are isolated with closely connected internal nodes. This suggests that most authors are co-authors within the same articles and have limited collaborative relationships with others.

Table 1 shows the authors with the highest number of publications. Imran Ashraf from Yeungnam University is the most prolific. The next three are Huilin Chen and Ali Asghar Heidari from Wenzhou University and Peiliang Wu from Wenzhou Medical University; together, they have co-authored seven papers.

3.3. National Research Status and International Cooperation

To construct the country cooperation network map, a threshold of at least 27 publications was established. This criterion allowed 30 out of 113 countries to be included in the subsequent analysis using VOSviewer. The selected 30 countries were then divided into three clusters based on their level of cooperation: Cluster 1 (red) mainly included Belgium, Brazil, Canada, the UK, France, Germany, Greece, Israel, Italy, The Netherlands, Norway, Poland, Spain, Switzerland, and the USA; Cluster 2 (green) mainly included Australia, Bangladesh, Egypt, India, Japan, Malaysia, Pakistan, Saudi Arabia, South Korea, Turkey, and United Arab Emirates; and Cluster 3 (blue) included Iran, China, and Singapore (Figure 5). We further list the top five countries in terms of the number of publications in Table 2, which are the United States, China, India, UK, and Saudi Arabia. In addition to having the most publications, the USA had the highest frequency of GCSs and H-index. However, in the LCS rankings, China ranked first.

3.4. Output and Collaboration Status of Institutions

Among the 2422 articles, a total of 4342 institutions were identified. Using VOSviewer, institutions with at least 10 published articles were selected for further analysis. A total of 75 institutions met this criterion and were grouped into six clusters based on their level of collaboration: (1) the collaboration groups represented by King Abdulaziz University, Princess Nourah bint Abdulrahman University, and Vellore Institute of Technology were mainly represented by the red cluster; (2) the cooperation groups represented by the green cluster were mainly Harvard Medical School, Johns Hopkins University, and the lcahn School of Medicine at Mount Sinai; (3) the collaboration groups represented by the blue cluster were mainly King Saud University, University of Melbourne, and University of Toronto; (4) the cooperation groups represented by the yellow cluster were Huazhong University of Science, Shanghai Jiao Tong University, Fudan University, and the Chinese Academy of Sciences; (5) the cooperation groups represented by the purple cluster were mainly Stanford University, University of Oxford, and Imperial College London; and (6) the cooperative organizations represented by the cyan cluster were mainly Wenzhou Medical University and the National University of Singapore (Figure 6). There were five institutions with the most publications: Stanford University, Harvard Medical School, King Abdulaziz University, Huazhong University of Science and Technology, and King Saud University. In the H-index ranking, Huazhong University of Science and Technology and Stanford University ranked in the top two. However, although Huazhong University of Science and Technology only ranked fourth in terms of the number of publications, it ranked first according to the H-index, in addition to leading other institutions by a huge margin in the LCS and GCS rankings (Table 3).

3.5. Analysis of Funding Sources

We analyzed the funding status of the included articles and found that 1481 out of the 2422 articles were funded. The analysis identified the top five funding agencies that made the most significant contributions to research publications within the dataset (Table 4). The National Natural Science Foundation of China (NSFC) and the National Institutes of Health (NIH) of the United States were the leading contributors, each supporting 157 articles, which accounts for 10.6% of the total funded articles. The National Science Foundation (NSF) followed with 100 articles (6.8%), while the European Union (EU) supported 96 articles (6.5%). The National Institutes of Health Research (NIHR) in the UK funded 73 articles (4.9%).

3.6. Analysis of Journals and Co-Cited Journals

The 2422 papers selected involved 718 journals. Table 5 lists the top 10 journals and co-cited journals regarding machine learning applications in the COVID-19 predictions. Scientific Reports has published the majority of publications (121 documents, 5.0% of the total), followed by IEEE Access (70 documents, 2.9% of the total) and Plos One (67 documents, 2.8% of the total). The journals with the highest numbers of citations were the International Journal of Environmental Research and Public Health (1500), Scientific Reports (1354), and Chaos Solitons & Fractals (1334).

3.7. Analysis of Highly Cited References

The top 10 most highly cited references are detailed in Table 6 below, which includes citation counts and a brief summary of each study’s contribution.

3.8. Analysis of Co-Occurring Keywords

Keywords encapsulate a research paper’s purpose, content, and method in the most concise manner possible. Analyzing the keywords of selected papers reveals the research hotspots within a specific field and tracks the evolution of research topics over time. Firstly, we analyzed the co-occurrence of keywords based on VOSviewer software, 1.6.20 set the minimum number of occurrences to 42, with which 40 out of 7457 keywords met the selection threshold, and drew a visualization map of these 40 keywords (Figure 7). The size of the circles represents the number of occurrences of the keywords, and the largest three circles are “COVID-19”, “machine learning”, and “prediction”, aligning with expectations. It can be clearly noticed that all of the keywords were divided into three major categories with the different colors of the circles. The green part shows the prediction topics of COVID-19, such as mortality, infection, severity, diagnosis, and prognosis; the red part represents the algorithms mainly used in the prediction model, such as regression, neural network, support vector machine; and the blue part represents the technologies applied to prediction, including artificial intelligence, deep learning, and big data.

Next, to benefit from CiteSpace’s ability to select arbitrary time ranges for time slicing, we generated keyword clustering maps for 2020, 2021, 2022, and 2023 (Figure 8). The keyword clusters from 2020 indicate an increasing focus on the application of machine learning to COVID-19, particularly in the development of models and the prediction of pandemic trends. By 2021, research hotspots had shifted toward predicting COVID-19 outcomes and addressing practical clinical challenges. In 2022, both logistic regression, rooted in traditional statistics, and neural networks, a key component of deep learning, were prominent in the development of predictive models. The emergence of cluster terms such as “molecular dynamics” and “blood biomarkers” suggests that the application of machine learning has expanded and that the range of available resources has broadened. In 2023, cluster terms like “federated learning” and “drug repurposing” highlight new areas of focus. Additionally, mental health and air pollution have remained persistent topics throughout the pandemic.

4. Discussion

COVID-19 causes a huge global burden and continues to spread over the world. The pandemic challenges the public health system and has profound impacts on education, the environment, and the economy. The application of machine learning techniques has contributed to tackling this challenge and is still evolving and growing exponentially. In this study, we set out to analyze publications on machine learning applied to prediction in COVID-19. The objective was to highlight key trends, research gaps, and significant contributions, offering a valuable reference for future studies and developments in the field.

4.1. Principal Results

The number of publications effectively reflects the research trends in this field. Shortly after the emergence of COVID-19, researchers in many different fields started exploring the use of machine learning techniques to fight the pandemic globally. Due to the limited availability of early COVID-19 data, the literature in this field was initially sparse, with only 183 articles published in 2020. However, from 2021 to 2023, there was an exponential increase in publications, reflecting a surge in scholarly interest. Although there was a minor decrease in the number of publications in 2023, the enthusiasm for research within this domain remains high.

The analysis of the collaboration network among the most cited authors reveals that many are co-authors of the same articles due to their joint efforts in publishing highly cited work. Notably, five of the most prolific authors do not appear in the network visualization of the most cited authors, indicating that a high publication count does not necessarily correlate with influence or quality. Additionally, the relatively recent publication dates of their work may limit their citation counts.

Regarding countries/regions/institutions, the United States leads other countries in publications and citations (Table 2), and the top two institutions (Stanford University and Harvard Medical School) with the most publications are both from the US (Table 3). This is partly due to the US having amassed a good foundation in the biomedical field and AI and maintaining the focus on scientific development and funding [40]. China and India rank second behind the United States in terms of the number of publications. Despite being developing countries, both are thriving in this field. Among the top five institutions, Huazhong University of Science and Technology reported the highest number of LCSs and GCS, indicating its pioneering research in this area. Using HistCite Pro, we identified that the study by Yan et al. [32] made a significant contribution. The research developed a machine learning model based on the high-performance algorithm XGBoost, which predicts the prognosis of COVID-19 patients using three indicators: lactic dehydrogenase, lymphocytes, and high-sensitivity C-reactive protein. Moreover, Huazhong University of Science and Technology is situated in Wuhan, a location that confers a unique advantage in accessing first-hand medical cases and clinical data, given Wuhan’s status as the city in which the COVID-19 epidemic first occurred. Additionally, analysis of funding agencies shows that the NSFC, NIH, and NSF play a significant role, underscoring the contributions of the United States and China in advancing research. Combined with the analysis of the leading countries and institutions in terms of publications, these results underscore the critical role of robust funding sources in driving research innovation.

In bibliometrics, keywords represent the research theme and core content of the literature, offering a distilled overview of a paper’s content. By analyzing the co-occurrence of keywords, it is possible to provide insight into the research hotspots within an academic field. Figure 7 depicts the results of the co-occurrence of terms; “COVID-19”, “machine learning”, and “prediction” are the most frequent keywords, consistent with the theme of this study. Figure 8 shows the keyword clustering for four years based on the log-likelihood ratio (LLR) algorithm. The keyword cluster “mental health” in this study indicates that as COVID-19 has continued to spread, the influence of the virus on people’s mental health is of concern. Previous research has indicated that the COVID-19 lockdown had a significant impact on stress, depression, and anxiety [41]. Maran et al. found that a greater individual COVID-19-pandemic-related adversity index (CAI) was associated with a greater increase in depressive and anxiety symptoms, as well as loneliness [42]. Hoogendijk et al. showed that people who experienced several COVID-19-pandemic-related stressors, such as COVID-19 infection, job loss, and the death of someone close to them, were more likely to experience poorer mental health outcomes [43]. “Sentiment analysis” is a cluster word associated with mental health, indicating researchers’ interest from an alternative viewpoint. Sentiment analysis is used to decipher people’s opinions and sentiments expressed on social media concerning epidemiology, health policies, drugs, and supplements. Additionally, the association between air pollution and COVID-19 has been explored. Early in the COVID-19 pandemic, short-term exposure to air pollutants was reported to be associated with COVID-19 incidence, mortality, and lethality rates in Italy [44]. Since then, a considerable body of literature has emerged to assess the correlation between air pollution and the risk of COVID-19 [45,46,47]. A distinct research perspective was spurred by the impact of the lockdown on air pollutants. The lockdown, as a policy to contain COVID-19’s spread in its early stages, notably not only mitigated the pandemic but also improved air quality due to restricted human activities. Studies focusing on the consequences of COVID-19 lockdowns present a valuable opportunity to devise and implement efficient clean air strategies. The cluster words “federated learning” and “drug repurposing” in 2023 indicate current research hotspots, focusing on prediction models utilizing federated learning and the application of machine learning techniques for the repurposing of existing drugs. The training of predictive models requires extensive datasets, but healthcare institutions’ data typically encompass sensitive and confidential patient information. Federated learning emerges as a viable solution to address this challenge, eliminating the necessity for data transfer and thus protecting data privacy [48]. It employs decentralized training processes that allow models to be trained on multiple datasets independently, with only the model parameters being shared among participants. This method utilizes data from various sources efficiently while safeguarding patient confidentiality. Trends in major keyword changes indicate a sustained focus on air pollution and mental health research, highlighting an emphasis on the impact of environmental and psychological factors on health. Additionally, the COVID-19 outbreak has accelerated the application of machine learning techniques in healthcare, pushing this field toward more complex and systematic approaches. This shift underscores the necessity of interdisciplinary collaboration to address emerging health challenges effectively.

4.2. Applications of COVID-19 Machine Learning

Initially, machine learning algorithms were mainly adopted in developing COVID-19 diagnostic tools based on X-ray and CT scan images. For example, Ozturk et al. [49] developed an automated tool for diagnosing COVID-19 using raw chest X-ray images. Wu et al. [50] proposed a multi-view model based on a deep learning method to assist radiologists in quickly and accurately identifying patients through the analysis of CT images. Diagnosing COVID-19 with the assistance of ML can mitigate the heavy workload of radiologists, decrease the likelihood of making mistakes, and reduce the cost compared to traditional laboratory tests.

ML based on imaging not only serves a function for diagnosis but also extends to disease quantification, severity assessment, and prognosis determination, thereby being integrative to decision support systems. COVID-19 decision support systems can help decision/policymakers formulate policies to fight the pandemic. Furthermore, beyond image data, various datasets, including those for detecting viral transmission, confirmed cases, mortalities, and recoveries, can be integrated into decision support systems. Ayyoubzadeh et al. [51] proposed the use of data-mining models to build predictive models from Google search data. The data-mining models were designed to predict the COVID-19-positive cases and the trend of the pandemic in Iran based on linear regression and long short-term memory (LSTM), which could help health managers control potential outbreaks and plan healthcare resources. Tiwari et al. [52] built a prediction model based on time series forecasting to predict the number of confirmed cases, recovered cases, and death cases in India. This epidemiologic model could advise the government on policy decisions.

In addition, to the best of our knowledge, machine learning techniques have also been used in drug and vaccine development. The stages for the development of COVID-19 drugs include disease prediction, structural analysis, drug repurposing, and new drug development [53]. Traditional development of new drugs is expensive and time consuming, but drug repurposing can make drug discovery low-risk and low-cost. Efficient drug repurposing can be achieved by mining existing data. Ke et al. [54] identified drugs that can treat COVID-19 in the database of market-approved drugs by predicting the drugs with potential antiviral activities through a machine learning model. Yet, most of the current drugs are administered systemically. To enhance the localized efficacy and reduce the adverse effects of drugs, the convergence of nanomedicine with drug repurposing represents the direction of future medical research. This is attributable to the capacity of nanoparticles to increase drug targeting [55]. For the development of COVID-19 vaccines, machine learning methods have been used for screening compounds for a potential adjuvant candidate [56], as well as for discovering markers of vaccine immunogenicity and reactogenicity [57]. However, machine learning methods cannot replace time-consuming tasks like lab experiments and clinical trials.

4.3. Limitations and Challenges of ML in Medicine

Machine learning has great potential to revolutionize healthcare, particularly in addressing challenges like the COVID-19 pandemic. However, its practical application faces several significant limitations and challenges.

One major challenge is interpretability, which refers to the ability to understand and explain which features most influence a model’s predictions. Clinicians need this transparency to trust and adopt ML models in their decision-making processes [58]. Without clear reasoning behind predictions, even highly accurate models may be disregarded in clinical practice. Roberts et al. [59] argued that none of the reviewed studies were sufficient to transition from scientific research to clinical practice due to issues such as dataset bias, inadequate model evaluation, limited generalizability, and lack of reproducibility. They also highlighted overfitting as a critical problem, where models perform well on training data but fail to generalize to new, unseen data, greatly affecting the clinical usefulness of ML models.

Moreover, there is often a disconnect between ML experts and healthcare professionals. ML practitioners may not fully understand clinical needs, while clinicians may struggle with the technical aspects of implementing and interpreting complex ML models.

To overcome these challenges, several key steps must be taken. First, external validation should become standard practice to ensure that models can be applied to real-world scenarios. Second, creating high-quality public datasets is essential to ensure that models are trained on comprehensive and representative data. Finally, fostering interdisciplinary collaboration between data scientists, engineers, and healthcare professionals is crucial.

4.4. Related Works

Several bibliometric studies have explored machine learning applications related to COVID-19. Below, we summarize key works and highlight how our study differs from them.

Mohadab et al. [20] conducted a broad bibliometric analysis of the COVID-19 literature across three major databases (Web of Science, Scopus, and PubMed) from early 2020 to May 2020. Their study provided a general overview of research trends in the early stages of the pandemic. In contrast, our work examines more recent developments in ML applications related to COVID-19, covering the period from 2020 to 2023, allowing us to capture the evolving research priorities in this field.

Chiroma et al. [21] focused on the use of ML to combat COVID-19 in the first half of 2020. They found that, at that time, research primarily focused on COVID-19 diagnostics, while the development of COVID-19 drugs and vaccines remained limited. Our study extends beyond this early focus by analyzing more recent trends, including the growing emphasis on prediction models, drug repurposing, and other emerging areas like mental health and air pollution.

Steiner et al. [22] performed a bibliometric analysis and systematic review of the 117 most-cited articles from January 2020 to June 2021, categorizing the use of ML in areas such as lung imaging, media data analysis, and general COVID-19 prediction. While their work focuses on highly cited papers, our study examines a larger dataset (2422 articles) and provides a more in-depth analysis of prediction models.

Baygül Eden [23] et al. conducted a comprehensive bibliometric analysis of 3559 ML-based COVID-19 studies published between December 2019 and December 2022 using Web of Science. Although similar in scope, our study narrows down the focus to the specific application of ML in COVID-19 prediction models and provides critical insights into the field.

Ballaz et al. [24] focused on ML-based research related to early diagnosis, prognosis, and treatment, analyzing articles from the Scopus database between January 2020 and July 2022. Their study highlighted specific ML techniques like random forests and convolutional neural networks. In contrast, our work takes a broader approach, covering a variety of ML models and their applications in COVID-19 prediction.

4.5. Strengths and Limitations

Our study presents a detailed bibliometric analysis of the existing literature, providing a comprehensive overview of the current state of research on machine learning applications in prediction models for COVID-19. By examining multiple dimensions, such as authors, countries, institutions, funding agencies, journals, and keywords, we have thoroughly explored collaboration networks and research hotspots. This analysis offers valuable insights into the development trends in the field and helps clarify future research directions. However, there are some limitations. Firstly, our research excluded non-core collection journals, preprints, and gray literature, which may result in incomplete coverage of relevant publications. Secondly, since not all retrieved articles underwent full-text review, we cannot ensure that every publication is fully relevant to the topics of interest. Finally, the database is continually updated, so the most recent publications may have been missed.

5. Conclusions

In this study, we performed a comprehensive bibliometric analysis of 2422 publications on machine learning prediction models for COVID-19, covering the period from 2020 to 2023. Our findings reveal that the United States leads in terms of contributions to this field, followed by China and India, which form the core of the three primary clusters of international collaboration. Stanford University emerged as the institution with the highest number of publications, while Scientific Reports was identified as the most prolific journal. Significant contributions to the development of this domain were made by funding bodies such as the NSFC, NIH, NSF, EU, and NIHR. Present research in this area predominantly concentrates on deep learning, federated learning, image classification, air pollution, mental health, sentiment analysis, and drug repurposing.

Despite the clear advantages of applying machine learning models in healthcare, challenges such as data quality, dataset size, and data collection difficulties persist. The research focus and trends are progressively orienting toward more complex and systematic directions, and fostering interdisciplinary collaboration is crucial to unlocking the full potential of machine learning in the medical domain.

Author Contributions

Conceptualization, H.L. and Y.L.; data curation, H.L., Y.L., H.Y. and J.X.; methodology, H.L., Y.L., H.Y. and J.X.; project administration, P.W.; resources, H.L.; supervision, H.L., Y.L. and H.Y.; writing—original draft preparation, H.L.; writing—review and editing, H.L., Y.L., H.Y., J.X. and P.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wang, C.; Horby, P.W.; Hayden, F.G.; Gao, G.F. A Novel Coronavirus Outbreak of Global Health Concern. Lancet 2020, 395, 470–473. [Google Scholar] [CrossRef] [PubMed]
Ai, T.; Yang, Z.; Hou, H.; Zhan, C.; Chen, C.; Lv, W.; Tao, Q.; Sun, Z.; Xia, L. Correlation of Chest CT and RT-PCR Testing for Coronavirus Disease 2019 (COVID-19) in China: A Report of 1014 Cases. Radiology 2020, 296, E32–E40. [Google Scholar] [CrossRef]
IHR Emergency Committee on Novel Coronavirus (2019-nCoV). Available online: https://www.who.int/director-general/speeches/detail/who-director-general-s-statement-on-ihr-emergency-committee-on-novel-coronavirus-(2019-ncov) (accessed on 31 August 2024).
WHO Director-General’s Opening Remarks at the Media Briefing on COVID-19—11 March 2020. Available online: https://www.who.int/director-general/speeches/detail/who-director-general-s-opening-remarks-at-the-media-briefing-on-covid-19---11-march-2020 (accessed on 1 September 2024).
COVID-19 Cases | WHO COVID-19 Dashboard. Available online: https://data.who.int/dashboards/covid19/cases (accessed on 17 March 2024).
Mantovani, A.; Morrone, M.C.; Patrono, C.; Santoro, M.G.; Schiaffino, S.; Remuzzi, G.; Bussolati, G. Long Covid: Where We Stand and Challenges Ahead. Cell Death Differ. 2022, 29, 1891–1900. [Google Scholar] [CrossRef] [PubMed]
Teitler Regev, S.; Tavor, T. Analyzing the Varied Impact of COVID-19 on Stock Markets: A Comparative Study of Low- and High-Infection-Rate Countries. PLoS ONE 2024, 19, e0296673. [Google Scholar] [CrossRef]
Forster, P.M.; Forster, H.I.; Evans, M.J.; Gidden, M.J.; Jones, C.D.; Keller, C.A.; Lamboll, R.D.; Quéré, C.L.; Rogelj, J.; Rosen, D.; et al. Current and Future Global Climate Impacts Resulting from COVID-19. Nat. Clim. Change 2020, 10, 913–919. [Google Scholar] [CrossRef]
Pata, U.K. How Is COVID-19 Affecting Environmental Pollution in US Cities? Evidence from Asymmetric Fourier Causality Test. Air Qual. Atmos. Health 2020, 13, 1149–1155. [Google Scholar] [CrossRef] [PubMed]
Martín-Sánchez, M.; Cáceres-Muñoz, J.; Flores-Rodríguez, C. The Effects of the COVID-19 Pandemic on Educational Communities: Evidence of Early Childhood Education Students. Int. J. Env. Res. Public Health 2022, 19, 4707. [Google Scholar] [CrossRef] [PubMed]
Tracking SARS-CoV-2 Variants. Available online: https://www.who.int/activities/tracking-SARS-CoV-2-variants/ (accessed on 28 February 2024).
Tamura, T.; Irie, T.; Deguchi, S.; Yajima, H.; Tsuda, M.; Nasser, H.; Mizuma, K.; Plianchaisuk, A.; Suzuki, S.; Uriu, K.; et al. Virological Characteristics of the SARS-CoV-2 Omicron XBB.1.5 Variant. Nat. Commun. 2024, 15, 1176. [Google Scholar] [CrossRef]
Pan, L.; Liu, G.; Lin, F.; Zhong, S.; Xia, H.; Sun, X.; Liang, H. Machine Learning Applications for Prediction of Relapse in Childhood Acute Lymphoblastic Leukemia. Sci. Rep. 2017, 7, 7402. [Google Scholar] [CrossRef]
Kourou, K.; Exarchos, T.P.; Exarchos, K.P.; Karamouzis, M.V.; Fotiadis, D.I. Machine Learning Applications in Cancer Prognosis and Prediction. Comput. Struct. Biotechnol. J. 2015, 13, 8–17. [Google Scholar] [CrossRef]
Daghistani, T.A.; Elshawi, R.; Sakr, S.; Ahmed, A.M.; Al-Thwayee, A.; Al-Mallah, M.H. Predictors of In-Hospital Length of Stay among Cardiac Patients: A Machine Learning Approach. Int. J. Cardiol. 2019, 288, 140–147. [Google Scholar] [CrossRef]
Junaid, M.; Ali, S.; Eid, F.; El-Sappagh, S.; Abuhmed, T. Explainable Machine Learning Models Based on Multimodal Time-Series Data for the Early Detection of Parkinson’s Disease. Comput. Methods Programs Biomed. 2023, 234, 107495. [Google Scholar] [CrossRef]
Tahmassebi, A.; Wengert, G.J.; Helbich, T.H.; Bago-Horvath, Z.; Alaei, S.; Bartsch, R.; Dubsky, P.; Baltzer, P.; Clauser, P.; Kapetas, P.; et al. Impact of Machine Learning With Multiparametric Magnetic Resonance Imaging of the Breast for Early Prediction of Response to Neoadjuvant Chemotherapy and Survival Outcomes in Breast Cancer Patients. Investig. Radiol. 2019, 54, 110–117. [Google Scholar] [CrossRef] [PubMed]
Wynants, L.; Calster, B.V.; Collins, G.S.; Riley, R.D.; Heinze, G.; Schuit, E.; Albu, E.; Arshi, B.; Bellou, V.; Bonten, M.M.J.; et al. Prediction Models for Diagnosis and Prognosis of COVID-19: Systematic Review and Critical Appraisal. BMJ 2020, 369, m1328. [Google Scholar] [CrossRef] [PubMed]
Casiraghi, E.; Malchiodi, D.; Trucco, G.; Frasca, M.; Cappelletti, L.; Fontana, T.; Esposito, A.A.; Avola, E.; Jachetti, A.; Reese, J.; et al. Explainable Machine Learning for Early Assessment of COVID-19 Risk Prediction in Emergency Departments. IEEE Access 2020, 8, 196299–196325. [Google Scholar] [CrossRef] [PubMed]
Mohadab, M.E.; Bouikhalene, B.; Safi, S. Bibliometric Method for Mapping the State of the Art of Scientific Production in COVID-19. Chaos Solitons Fractals 2020, 139, 110052. [Google Scholar] [CrossRef]
Chiroma, H.; Ezugwu, A.E.; Jauro, F.; Al-Garadi, M.A.; Abdullahi, I.N.; Shuib, L. Early Survey with Bibliometric Analysis on Machine Learning Approaches in Controlling COVID-19 Outbreaks. Peerj Comput. Sci. 2020, 6, e313. [Google Scholar] [CrossRef]
Steiner, M.; Franco, D.; Steiner Nieto, P. Machine Learning Techniques Applied to the Coronavirus Pandemic: A Systematic and Bibliometric Analysis from January 2020 to June 2021. RIMNI 2022, 38, 1–14. [Google Scholar] [CrossRef]
Baygül Eden, A.; Bakir Kayi, A.; Erdem, M.G.; Demirci, M. COVID-19 Studies Involving Machine Learning Methods: A Bibliometric Study. Medicine 2023, 102, e35564. [Google Scholar] [CrossRef]
Ballaz, S.; Pulgar-Sánchez, M.; Chamorro, K.; Fernández-Moreira, E. Scientific Pertinence of Developing Machine Learning Technologies for the Triage of COVID-19 Patients: A Bibliometric Analysis via Scopus. Inform. Med. Unlocked 2023, 41, 101312. [Google Scholar] [CrossRef]
Aria, M.; Cuccurullo, C. Bibliometrix: An R-Tool for Comprehensive Science Mapping Analysis. J. Informetr. 2017, 11, 959–975. [Google Scholar] [CrossRef]
Garfield, E. From the Science of Science to Scientometrics Visualizing the History of Science with HistCite Software. J. Informetr. 2009, 3, 173–179. [Google Scholar] [CrossRef]
van Eck, N.J.; Waltman, L. Software Survey: VOSviewer, a Computer Program for Bibliometric Mapping. Scientometrics 2010, 84, 523–538. [Google Scholar] [CrossRef]
Chen, C. Searching for Intellectual Turning Points: Progressive Knowledge Domain Visualization. Proc. Natl. Acad. Sci. USA 2004, 101, 5303–5310. [Google Scholar] [CrossRef] [PubMed]
Chen, C. Science Mapping: A Systematic Review of the Literature. J. Data Inf. Sci. 2017, 2, 1–40. [Google Scholar] [CrossRef]
Li, S.; Wang, Y.; Xue, J.; Zhao, N.; Zhu, T. The Impact of COVID-19 Epidemic Declaration on Psychological Consequences: A Study on Active Weibo Users. Int. J. Environ. Res. Public Health 2020, 17, 2032. [Google Scholar] [CrossRef]
Yang, Z.; Zeng, Z.; Wang, K.; Wong, S.-S.; Liang, W.; Zanin, M.; Liu, P.; Cao, X.; Gao, Z.; Mai, Z.; et al. Modified SEIR and AI Prediction of the Epidemics Trend of COVID-19 in China under Public Health Interventions. J. Thorac. Dis. 2020, 12, 165. [Google Scholar] [CrossRef]
Yan, L.; Zhang, H.-T.; Goncalves, J.; Xiao, Y.; Wang, M.; Guo, Y.; Sun, C.; Tang, X.; Jing, L.; Zhang, M.; et al. An Interpretable Mortality Prediction Model for COVID-19 Patients. Nat. Mach. Intell. 2020, 2, 283–288. [Google Scholar] [CrossRef]
Chimmula, V.K.R.; Zhang, L. Time Series Forecasting of COVID-19 Transmission in Canada Using LSTM Networks. Chaos Solitons Fractals 2020, 135, 109864. [Google Scholar] [CrossRef]
Shrock, E.; Fujimura, E.; Kula, T.; Timms, R.T.; Lee, I.-H.; Leng, Y.; Robinson, M.L.; Sie, B.M.; Li, M.Z.; Chen, Y.; et al. Viral Epitope Profiling of COVID-19 Patients Reveals Cross-Reactivity and Correlates of Severity. Science 2020, 370, eabd4250. [Google Scholar] [CrossRef]
Wang, X.; Deng, X.; Fu, Q.; Zhou, Q.; Feng, J.; Ma, H.; Liu, W.; Zheng, C. A Weakly-Supervised Framework for COVID-19 Classification and Lesion Localization From Chest CT. IEEE Trans. Med. Imaging 2020, 39, 2615–2625. [Google Scholar] [CrossRef] [PubMed]
Chen, J.; Wang, R.; Wang, M.; Wei, G.-W. Mutations Strengthened SARS-CoV-2 Infectivity. J. Mol. Biol. 2020, 432, 5212–5226. [Google Scholar] [CrossRef] [PubMed]
Nikolopoulos, K.; Punia, S.; Schäfers, A.; Tsinopoulos, C.; Vasilakis, C. Forecasting and Planning during a Pandemic: COVID-19 Growth Rates, Supply Chain Disruptions, and Governmental decisions. Eur. J. Oper. Res. 2021, 290, 99–115. [Google Scholar] [CrossRef] [PubMed]
Ribeiro, M.H.D.M.; da Silva, R.G.; Mariani, V.C.; Coelho, L.d.S. Short-Term Forecasting COVID-19 Cumulative Confirmed Cases: Perspectives for Brazil. Chaos Solitons Fractals 2020, 135, 109853. [Google Scholar] [CrossRef] [PubMed]
Overmyer, K.A.; Shishkova, E.; Miller, I.J.; Balnis, J.; Bernstein, M.N.; Peters-Clarke, T.M.; Meyer, J.G.; Quan, Q.; Muehlbauer, L.K.; Trujillo, E.A.; et al. Large-Scale Multi-Omic Analysis of COVID-19 Severity. Cels 2021, 12, 23–40.e7. [Google Scholar] [CrossRef]
The State of, U.S. Science and Engineering 2022 | NSF—National Science Foundation. Available online: https://ncses.nsf.gov/pubs/nsb20221 (accessed on 25 January 2024).
Fancourt, D.; Steptoe, A.; Bu, F. Trajectories of Anxiety and Depressive Symptoms during Enforced Isolation Due to COVID-19 in England: A Longitudinal Observational Study. Lancet Psychiatry 2021, 8, 141–149. [Google Scholar] [CrossRef]
Maran, P.L.; Klokgieters, S.S.; Giltay, E.J.; van Oppen, P.; Jörg, F.; Eikelenboom, M.; Ottenheim, N.R.; Penninx, B.W.J.H.; Kok, A.A.L. The Impact of COVID-19-Pandemic-Related Adversity on Mental Health: Longitudinal Study in Dutch Populations with and without Mental Health Disorders. BJPsych Open 2023, 9, e181. [Google Scholar] [CrossRef]
Hoogendijk, E.O.; Schuster, N.A.; van Tilburg, T.G.; Schaap, L.A.; Suanet, B.; De Breij, S.; Kok, A.A.; Van Schoor, N.M.; Timmermans, E.J.; de Jongh, R.T.; et al. Longitudinal Aging Study Amsterdam COVID-19 Exposure Index: A Cross-Sectional Analysis of the Impact of the Pandemic on Daily Functioning of Older Adults. BMJ Open 2022, 12, e061745. [Google Scholar] [CrossRef]
Accarino, G.; Lorenzetti, S.; Aloisio, G. Assessing Correlations between Short-Term Exposure to Atmospheric Pollutants and COVID-19 Spread in All Italian Territorial Areas. Environ. Pollut. 2021, 268, 115714. [Google Scholar] [CrossRef]
Sheridan, C.; Klompmaker, J.; Cummins, S.; James, P.; Fecht, D.; Roscoe, C. Associations of Air Pollution with COVID-19 Positivity, Hospitalisations, and Mortality: Observational Evidence from UK Biobank. Environ. Pollut. 2022, 308, 119686. [Google Scholar] [CrossRef]
Nobile, F.; Michelozzi, P.; Ancona, C.; Cappai, G.; Cesaroni, G.; Davoli, M.; Di Martino, M.; Nicastri, E.; Girardi, E.; Beccacece, A.; et al. Air Pollution, SARS-CoV-2 Incidence and COVID-19 Mortality in Rome—A Longitudinal Study. Eur. Respir. J. 2022, 60, 2200589. [Google Scholar] [CrossRef] [PubMed]
Lavigne, E.; Ryti, N.; Gasparrini, A.; Sera, F.; Weichenthal, S.; Chen, H.; To, T.; Evans, G.J.; Sun, L.; Dheri, A.; et al. Short-Term Exposure to Ambient Air Pollution and Individual Emergency Department Visits for COVID-19: A Case-Crossover Study in Canada. Thorax 2023, 78, 459–466. [Google Scholar] [CrossRef] [PubMed]
Malik, H.; Naeem, A.; Naqvi, R.A.; Loh, W.-K. DMFL_Net: A Federated Learning-Based Framework for the Classification of COVID-19 from Multiple Chest Diseases Using X-Rays. Sensors 2023, 23, 743. [Google Scholar] [CrossRef] [PubMed]
Ozturk, T.; Talo, M.; Yildirim, E.A.; Baloglu, U.B.; Yildirim, O.; Rajendra Acharya, U. Automated Detection of COVID-19 Cases Using Deep Neural Networks with X-Ray Images. Comput. Biol. Med. 2020, 121, 103792. [Google Scholar] [CrossRef]
Wu, X.; Hui, H.; Niu, M.; Li, L.; Wang, L.; He, B.; Yang, X.; Li, L.; Li, H.; Tian, J.; et al. Deep Learning-Based Multi-View Fusion Model for Screening 2019 Novel Coronavirus Pneumonia: A Multicentre Study. Eur. J. Radiol. 2020, 128, 109041. [Google Scholar] [CrossRef]
Ayyoubzadeh, S.M.; Ayyoubzadeh, S.M.; Zahedi, H.; Ahmadi, M.; Kalhori, S.R.N. Predicting COVID-19 Incidence Through Analysis of Google Trends Data in Iran: Data Mining and Deep Learning Pilot Study. JMIR Public Health Surveill. 2020, 6, e18828. [Google Scholar] [CrossRef] [PubMed]
Tiwari, S.; Kumar, S.; Guleria, K. Outbreak Trends of Coronavirus Disease–2019 in India: A Prediction. Disaster Med. Public Health Prep. 2020, 14, e33–e38. [Google Scholar] [CrossRef]
Park, Y.; Casey, D.; Joshi, I.; Zhu, J.; Cheng, F. Emergence of New Disease: How Can Artificial Intelligence Help? Trends Mol. Med. 2020, 26, 627–629. [Google Scholar] [CrossRef]
Ke, Y.-Y.; Peng, T.-T.; Yeh, T.-K.; Huang, W.-Z.; Chang, S.-E.; Wu, S.-H.; Hung, H.-C.; Hsu, T.-A.; Lee, S.-J.; Song, J.-S.; et al. Artificial Intelligence Approach Fighting COVID-19 with Repurposing Drugs. Biomed. J. 2020, 43, 355–362. [Google Scholar] [CrossRef]
Tammam, S.N.; El Safy, S.; Ramadan, S.; Arjune, S.; Krakor, E.; Mathur, S. Repurpose but Also (Nano)-Reformulate! The Potential Role of Nanomedicine in the Battle against SARS-CoV2. J. Control Release 2021, 337, 258–284. [Google Scholar] [CrossRef]
Ahuja, A.S.; Reddy, V.P.; Marques, O. Artificial Intelligence and COVID-19: A Multidisciplinary Approach. Integr. Med. Res. 2020, 9, 100434. [Google Scholar] [CrossRef] [PubMed]
Gonzalez-Dias, P.; Lee, E.K.; Sorgi, S.; de Lima, D.S.; Urbanski, A.H.; Silveira, E.L.; Nakaya, H.I. Methods for Predicting Vaccine Immunogenicity and Reactogenicity. Hum. Vaccines Immunother. 2020, 16, 269–276. [Google Scholar] [CrossRef] [PubMed]
Starke, G.; Schmidt, B.; De Clercq, E.; Elger, B.S. Explainability as Fig Leaf? An Exploration of Experts’ Ethical Expectations towards Machine Learning in Psychiatry. AI Ethics 2023, 3, 303–314. [Google Scholar] [CrossRef]
Roberts, M.; Driggs, D.; Thorpe, M.; Gilbey, J.; Yeung, M.; Ursprung, S.; Aviles-Rivero, A.I.; Etmann, C.; McCague, C.; Beer, L.; et al. Common Pitfalls and Recommendations for Using Machine Learning to Detect and Prognosticate for COVID-19 Using Chest Radiographs and CT Scans. Nat. Mach. Intell. 2021, 3, 199–217. [Google Scholar] [CrossRef]

Figure 1. The flowchart for publication selection in this study.

Figure 2. Basic information on the bibliometric analysis included.

Figure 3. Overall publication trends and citations, 2020–2023.

Figure 4. Author collaboration network analysis using VOSviewer.

Figure 5. Visual maps of international cooperation between the countries.

Figure 6. Visual map of collaborating institutions.

Figure 7. Co-occurrence analysis of keywords based on VOSviewer.

Figure 8. Keyword clustering maps for 2020, 2021, 2022, and 2023 based on CiteSpace.

Table 1. Top 9 authors based on number of publications.

Author Name	Total Documents	Citations
Ashraf, I.	9	35
Chen, H.	8	157
Wu, P.	7	152
Heidari, A.	7	154
Moni, M.	6	208
Clifton, D.	6	76
Chowdhury, M.	6	97
Rahman, T.	6	97
Byeon, H.	6	10

Table 2. Records, LCS, GCS, and H-index for each of the top 5 countries.

Country	Records	LCS	GCS	H-Index
USA	638	356	9205	47
China	424	399	7342	37
India	301	152	3750	30
UK	238	257	4640	32
Saudi Arabia	200	124	2390	26

Table 3. Records, LCS, GCS, and H-index for each of the top 5 institutions.

Institution	Records	LCS	GCS	H-Index
Stanford University	40	13	537	14
Harvard Medical School	37	18	907	11
King Abdulaziz University	32	8	250	9
Huazhong University of Science and Technology	31	180	1636	15
King Saud University	30	28	450	12

Table 4. Top funding institutions and agencies.

Name	Number of Funded Publications	Percentage of Total Funded Publications
National Natural Science Foundation of China (NSFC)	157	10.6%
National Institutes of Health (NIH), USA	157	10.6%
National Science Foundation (NSF), USA	100	4.9%
European Union (EU)	96	6.8%
National Institutes of Health Research (NIHR), UK	73	6.5%

Table 5. The top 10 journals and co-cited journals.

Items	Rank	Name	Counts	Country	IF (2023)	JCR
Journal	1	Scientific Reports	121	England	3.8	Q1
	2	IEEE Access	70	USA	3.4	Q2
	3	Plos One	67	USA	2.9	Q1
	4	Computers in Biology and Medicine	57	USA	7.0	Q1
	5	Journal of Medical Internet Research	43	Canada	5.8	Q1
	6	International Journal of Environmental Research and Public Health	42	Switzerland	-	-
	7	CMC-Computers Materials & Continua	36	USA	2.0	Q3
	8	Applied Sciences-Basel	35	Switzerland	2.5	Q3
	9	Frontiers in Public Health	35	Switzerland	3.0	Q1
	10	Electronics	32	Switzerland	2.6	Q3
Co-cited Journal	1	International Journal of Environmental Research and Public Health	1500	Switzerland	-	-
	2	Scientific Reports	1354	England	3.8	Q1
	3	Chaos Solitons & Fractals	1334	England	5.3	Q1
	4	IEEE ACCESS	1030	USA	3.4	Q2
	5	Nature Machine Intelligence	969	England	18.8	Q1
	6	Journal of Medical Internet Research	855	Canada	5.8	Q1
	7	Computers in Biology and Medicine	785	USA	7.0	Q1
	8	Journal Of Thoracic Disease	766	China	2.1	Q3
	9	Plos One	680	USA	2.9	Q1
	10	Science	488	USA	44.7	Q1

Table 6. The top 10 highly cited references.

Title	Authors	Journal	Citations (n)	Summary
The Impact of COVID-19 Epidemic Declaration on Psychological Consequences: A Study on Active Weibo Users	Li et al. (2020) [30]	International Journal of Environmental Research and Public Health	917	This study used an online ecological recognition (OER) method based on multiple machine learning prediction models to analyze social media data, examining the psychological impact of public health emergencies during the pandemic.
Modified SEIR and AI prediction of the epidemics trend of COVID-19 in China under public health interventions	Yang et al. (2020) [31]	Journal of Thoracic Disease	758	A modified SEIR epidemiological model, combined with domestic migration data and COVID-19 epidemiological data, was used to predict the progression of the epidemic. Machine learning techniques were employed to validate the model predictions.
An interpretable mortality prediction model for COVID-19 patients	Yan et al. (2020) [32]	Nature Machine Intelligence	526	A machine learning model based on XGBoost was developed to predict the prognosis of COVID-19 patients using three clinical indicators, enabling early intervention and potentially reducing mortality.
Time series forecasting of COVID-19 transmission in Canada using LSTM networks	Chimmula et al. (2020) [33]	Chaos, Solitons & Fractals	466	A deep learning method using long short-term memory (LSTM) networks was applied to build an infectious disease propagation model to forecast future transmission trends of COVID-19 in Canada.
Viral epitope profiling of COVID-19 patients reveals cross-reactivity and correlates of severity	Shrock et al. (2020) [34]	Science	385	This study developed an XGBoost-based machine learning model using VirScan data to distinguish between COVID-19 positive and negative cases with high sensitivity and specificity. SHAP analysis was used to identify key predictive features.
A Weakly-Supervised Framework for COVID-19 Classification and Lesion Localization From Chest CT	Wang et al. (2020) [35]	IEEE Transactions on Medical Imaging	343	A weakly supervised deep learning model was trained using 3D chest CT images to accurately predict COVID-19 infection probability and identify lesion areas.
Mutations Strengthened SARS-CoV-2 Infectivity	Chen et al. (2020) [36]	Journal of Molecular Biology	321	This study used algebraic topology-based machine learning to quantitatively assess changes in the binding free energy between SARS-CoV-2 spike protein and host ACE2 receptors following viral mutations.
Forecasting and planning during a pandemic: COVID-19 growth rates, supply chain disruptions, and governmental decisions	Nikolopoulos et al. (2021) [37]	European Journal of Operational Research	237	The study evaluated 52 models, including time series, epidemiology, machine learning, and deep learning methods, introducing a hybrid forecasting method to predict COVID-19 growth rates.
Short-term forecasting COVID-19 cumulative confirmed cases: Perspectives for Brazil	Ribeiro et al. (2020) [38]	Chaos, Solitons & Fractals	258	This paper analyzed various forecasting methods, including ARIMA, CUBIST, RF, RIDGE, SVR, and stacking ensemble learning, for short-term prediction of cumulative COVID-19 cases in Brazil.
Large-Scale Multi-omic Analysis of COVID-19 Severity	Overmyer et al. (2021) [39]	Cell Systems	203	This cohort study used RNA-seq and high-resolution mass spectrometry to generate multi-omics data related to COVID-19 severity, which can be used for machine learning predictions. The data are freely available to the scientific community.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lv, H.; Liu, Y.; Yin, H.; Xi, J.; Wei, P. Machine Learning Applications in Prediction Models for COVID-19: A Bibliometric Analysis. Information 2024, 15, 575. https://doi.org/10.3390/info15090575

AMA Style

Lv H, Liu Y, Yin H, Xi J, Wei P. Machine Learning Applications in Prediction Models for COVID-19: A Bibliometric Analysis. Information. 2024; 15(9):575. https://doi.org/10.3390/info15090575

Chicago/Turabian Style

Lv, Hai, Yangyang Liu, Huimin Yin, Jingzhi Xi, and Pingmin Wei. 2024. "Machine Learning Applications in Prediction Models for COVID-19: A Bibliometric Analysis" Information 15, no. 9: 575. https://doi.org/10.3390/info15090575

APA Style

Lv, H., Liu, Y., Yin, H., Xi, J., & Wei, P. (2024). Machine Learning Applications in Prediction Models for COVID-19: A Bibliometric Analysis. Information, 15(9), 575. https://doi.org/10.3390/info15090575

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Machine Learning Applications in Prediction Models for COVID-19: A Bibliometric Analysis

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Sources

2.2. Data Visualization and Analysis

3. Results

3.1. The Research Status of ML in Prediction Related to COVID-19

3.2. Analysis of Top Contributing Authors

3.3. National Research Status and International Cooperation

3.4. Output and Collaboration Status of Institutions

3.5. Analysis of Funding Sources

3.6. Analysis of Journals and Co-Cited Journals

3.7. Analysis of Highly Cited References

3.8. Analysis of Co-Occurring Keywords

4. Discussion

4.1. Principal Results

4.2. Applications of COVID-19 Machine Learning

4.3. Limitations and Challenges of ML in Medicine

4.4. Related Works

4.5. Strengths and Limitations

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI