*2.2. Data Analysis*

In this paper, we extracted data on documents' title, abstract, keywords, citation, and authors' affiliation for analysis. As a document could be authored by scholars from different countries, we considered that all these countries contributed to the document preparation. Moreover, we decided to include both documents with, and without, abstracts for text analysis since the title of the document could partly reflect the document's topic. We first descriptively analyzed the number of publications in each country and presented these data by using Microsoft Excel's Map function. Then, we exported the top ten most cited publications for a detailed analysis of these papers' content.

We used the VOSviewer software (version 1.6.15, Centre for Science and Technology Studies, Leiden University, the Netherlands) to illustrate the networks of the co-occurrence of keywords and most frequent terms in title/abstract [25,26]. Then, we employed Latent Dirichlet allocation (LDA) to discover fifteen latent topics from the titles and abstracts of documents. This Bayesian model treats each document as a set of topics, and topics are probability distributed over a set of words and their co-occurrence [27]. Thus, the LDA technique can produce two outputs: (1) probability distributions of different topics per document (to acknowledge how many topics are created based on the given publications), and (2) probability distributions of unique words per topic (to define the topic) [27]. Because each title/abstract may contain a mixture of topics, the LDA outputs may not reflect a specific research field or discipline. However, experiences from previous work suggested that documents focusing on a particular theme would be more likely to be categorized in the same group. To assure the robustness in labeling each topic, we checked at least ten documents per topic to ensure that the theme's name could generally fit the content of documents.

Multivariable linear regression models were performed to examine the research foci of countries with different income classifications (low, low-middle, high-middle, and high income—according to the World Bank classifications) [28], and different COVID-19 transmission classifications (Pending, Sporadic case, Clusters of cases, Community transmission—according to the WHO classifications) [29]. The dependent variable was the share of publications in specific topic out of total publications in each country (%), while the main independent variables were income classifications and transmission classifications. The models were adjusted to the natural logarithm of gross domestic product (GDP) per capita, the number of COVID-19 cases, and the number of COVID-19 deaths per country. The latest data

on GDP per capita and income classifications were collected from the World Bank database, while data on COVID-19 cases and deaths were extracted from WHO reports on 24 April 2020. A p-value of less than 0.05 was used to detect statistical significance.
