Next Article in Journal
Key Factors and Parameter Ranges for Immune Control of Equine Infectious Anemia Virus Infection
Previous Article in Journal
Packaging and Uncoating of CRISPR/Cas Ribonucleoproteins for Efficient Gene Editing with Viral and Non-Viral Extracellular Nanoparticles
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

Systematic Guidelines for Effective Utilization of COVID-19 Databases in Genomic, Epidemiologic, and Clinical Research

1
Department of Medical Informatics, College of Medicine, Catholic University of Korea, 222 Banpo-daero, Seocho-gu, Seoul 06591, Republic of Korea
2
Graduate School of Medical Science and Engineering, Korea Advanced Institute and Technology (KAIST), Daejeon 34141, Republic of Korea
3
Precision Medicine Research Center, College of Medicine, Catholic University of Korea, 222 Banpo-daero, Seocho-gu, Seoul 06591, Republic of Korea
4
Cancer Evolution Research Center, College of Medicine, Catholic University of Korea, 222 Banpo-daero, Seocho-gu, Seoul 06591, Republic of Korea
*
Author to whom correspondence should be addressed.
These authors contributed equally to this study and should be considered co-first authors.
Viruses 2023, 15(3), 692; https://doi.org/10.3390/v15030692
Submission received: 10 January 2023 / Revised: 27 February 2023 / Accepted: 4 March 2023 / Published: 6 March 2023
(This article belongs to the Section SARS-CoV-2 and COVID-19)

Abstract

:
The pandemic has led to the production and accumulation of various types of data related to coronavirus disease 2019 (COVID-19). To understand the features and characteristics of COVID-19 data, we summarized representative databases and determined the data types, purpose, and utilization details of each database. In addition, we categorized COVID-19 associated databases into epidemiological data, genome and protein data, and drug and target data. We found that the data present in each of these databases have nine separate purposes (clade/variant/lineage, genome browser, protein structure, epidemiological data, visualization, data analysis tool, treatment, literature, and immunity) according to the types of data. Utilizing the databases we investigated, we created four queries as integrative analysis methods that aimed to answer important scientific questions related to COVID-19. Our queries can make effective use of multiple databases to produce valuable results that can reveal novel findings through comprehensive analysis. This allows clinical researchers, epidemiologists, and clinicians to have easy access to COVID-19 data without requiring expert knowledge in computing or data science. We expect that users will be able to reference our examples to construct their own integrative analysis methods, which will act as a basis for further scientific inquiry and data searching.

Graphical Abstract

1. Introduction

In 2019, with the spread of the coronavirus disease 2019 (COVID-19) pandemic, it became crucial for the scientific community to quickly have access to accurate, detailed COVID-19 data [1,2]. Access to epidemiological data proved to be important for tracking the pandemic (rates of infection and the pandemic response of different hotspots/countries) [3,4]. The collection of high-quality genomic data generated a better understanding of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) that causes COVID-19 [5,6]. Protein structure and sequence data improved the understanding of SARS-CoV-2 and its components [7,8]. Clinical trials and drug data helped clinicians and researchers develop treatments during the pandemic [9,10].
Our understanding of COVID-19 developed over a short period of time, making early response to the pandemic challenging. The mRNA vaccines used worldwide were utilized without fully going through testing [11]. It has also been challenging to accurately assess the effectiveness of the vaccines and other treatments for COVID-19 [12]. Through obtaining epidemiological, genomic, and clinical data on COVID-19, managing the pandemic has become possible. However, it is not evidently clear how effective accumulation of COVID-19 data can be, so a comprehensive understanding of the data is required.
The abundance of various types of COVID-19 data has led to an increased number of databases. An increase in genomic data submission has resulted in databases with large amounts of data, a multitude of filters and search options, and numerous downloadable metadata or associated files [13]. This can potentially be challenging for users to access the data they need. The rate at which COVID-19-related publications are accepted and published by far outpaces that of previous outbreaks, such as the Ebola or Zika viruses [14,15]. This has led to an increased need for the organization of the literature. We investigated the amount of genomic data being collected by the National Center for Biotechnology Information (NCBI), as well as virus research and the number of COVID-19 publications submitted on a monthly basis (Figure S1). We also compiled the amount of epidemiological data, genome and protein data, treatment data, and publications that have been submitted to various COVID-19 databases (Table 1).

2. A Survey of COVID-19 Databases

Databases and research organizations that existed before the COVID-19 pandemic played a part in the initial data collection process. The Global Initiative on Sharing All Influenza Data (GISAID), which is responsible for the curation of genomic data on previous influenza outbreaks and virus research, possessed the necessary infrastructure to start collecting data on SARS-CoV-2 sequences [16,17].
Organizations, such as the National Institutes of Health (NIH) and European Molecular Biology Laboratory (EMBL) were poised to be hubs for tools and resources in response to COVID-19 [18]. Global organizations, such as the World Health Organization (WHO), which we usually look to for guidance and resources in worldwide health issues, also played a part. During previous epidemics, organizations, such as the WHO gathered data on cases and deaths, improving the understanding of the pathogen’s infectivity and lethality. Information on vaccine development and vaccine doses helped governments keep track of vaccination progress within populations [19,20].
There has been a rapid emergence of COVID-19-related databases with the fast accumulation of such data. Genomic sequences are being submitted to GISAID faster compared to other viruses. Raw sequence data on SARS-CoV-2 and its variants have been submitted to GISAID, Nextstrain, the University of California Santa Cruz (UCSC) Genome Browser for COVID-19 Research on the UCSC page, and the COVID-19 Data Portal powered by EMBL-European Bioinformatics Institute (EBI).
Databases, such as LitCovid, developed by the NIH, were created to exclusively organize and help users search for COVID-19 publications [21,22]. SARS-CoV-2 has a wide range of variants with numerous classifications of variants of concern (VOCs), such as Alpha, Beta, Gamma, Delta, and Omicron, and variants of interest (VOIs) [23,24,25]. There are various organizations and databases that have different nomenclature for classifying variants. The WHO, Pangolin, and Nextclade, powered by Nextstrain, are considered to be the main classifying entities [26,27,28]. Epidemiological data have been collected by various governments and institutes and then collected and centralized in databases, such as those of the WHO. The gathering of COVID-19 case data enables a better understanding of the infectivity of SARS-CoV-2 and its variants [29]. COVID-19 death data likewise help keep track of the lethality of the variants. With the numerous types of vaccines and the need for multiple vaccinations, vaccine data on COVID-19 have been instrumental in keeping track of global vaccination progress [30,31].
With COVID-19 databases, there has been a transition from text-based visualization to more dynamic visualizations with charts and graphs. Alongside more user-friendly interfaces, there has been improvement in visualization tools, internal database self-developed tools, and more effective analysis tools. Nextstrain visualizes its genomic data as phylogenetic trees that display clade and variant information. Users viewing the phylogenetic tree can interact with it to change display options, time, and the variants being displayed to visualize what they need.
We decided that detailed information of the contents of each of the 15 databases we investigated would be too extensive to include in the main article. We have included detailed explanations of the COVID-19 database features alongside screenshots of the appropriate web pages (Supplementary Notes).

3. COVID-19 Database Categorization and Main Purposes

We selected 15 databases that are most frequently used out of all of the currently available COVID-19 databases. Based on our investigations of the databases’ main purposes and tools, we categorized the databases on the basis of their data characteristics and then reported their specific functions, internally developed tools, and associated websites.
We investigated the main COVID-19 databases and then divided them into three categories: epidemiological data, genome and protein data, and drug and target data. There were databases that fit only into one category and those that displayed more than one characteristic. For each category, we provided explanations on what type of data was included and sorted. First, we categorized cases, deaths, hospitalizations, vaccinations, vaccine dose data, etc., on a country or time basis as epidemiological data. We determined the databases containing these data to be the Johns Hopkins Coronavirus Resource Center, Our World in Data, GISAID, WHO, and the National Center for Biotechnology Information (NCBI) SARS-CoV-2 Resources. Second, we categorized genome and protein data as data on SARS-CoV-2 amino acid sequences, protein domains and regions, and mutations in SARS-CoV-2 lineages and clades. We determined that GISAID, the WHO, the NCBI SARS-CoV-2 Resources, interaction data between coronavirus RNAs and host proteins (CovInter), T-cell COVID-19 Atlas (T-CoV), Immune escape variants in SARS-CoV-2 (ESC), DockCoV2, USCS Genome Browser, COVID-19 Data Portal, cov-lineages.org, Protein Data Bank (PDB), COVID CG, and Nextstrain included this category of data. Third, we categorized drug and target data to include data on antibodies, drugs, drug trials, drug-protein interactions, target molecules, etc., and CovInter, T-CoV, ESC, DockCoV2, the WHO, and NCBI SARS-CoV-2 Resources included drug and target data (Figure 1a, Table 2 and Tables S1–S15).
COVID-19 databases can be sorted into databases that make use of data provided by GISAID and databases that independently collect their own data. We investigated the purposes of the data present in these databases and classified them into nine distinct groups: (1) clade/variant/lineage, (2) genome browser (sequence), (3) protein structure, (4) epidemiological data (cases, vaccine rates, deaths), (5) visualization, (6) data analysis tool, (7) treatment (clinical trials, drugs), (8) literature, and (9) immunity (Figure 1b,c). In addition to the specific purposes we compiled and categorized for the databases, many of the databases possess software, tools, and functions that aid in data viewing and analysis (Figure S2).
In our categorization of the 15 databases, we noted that there was a heavy focus on genome and protein data. GISAID, while considered to possess one of the most extensive datasets on SARS-CoV-2 genome sequences, is not easily accessible. The data cannot be accessed without creating an account first, and there are limitations with regard to downloading the data. Meanwhile, many of the databases source their data from GISAID or perform analysis using sequence data provided by GISAID (Figure 1b). With epidemiological data, the Johns Hopkins University Coronavirus Resource Center focuses heavily on U.S. datasets. With Our World in Data and the WHO database, there are slight discrepancies between datasets and limitations on data collection. For drug and target data, there is no a centralized database that has properly compiled data on COVID-19 preventative measures. Of the databases we investigated, databases, such as CovInter are not easily found simply through search engines.
Many, if not most of the COVID-19 databases, are specialized and built for specific types of data. Data collection and analysis are performed in line with the purpose of the database. Depending on who is accessing the data, there is a difference in what type of data are needed and prioritized for the user. Researchers, epidemiologists, and clinicians each have the respective data they need [50]. Because of these issues, it is necessary to have a centralized system that can make comprehensive use of the available databases. We believe that it is important to provide guidance on how to navigate between and utilize databases. Specifically, we introduced different databases, how to use them, and where they can be used.
We aimed to introduce the currently available COVID-19 databases and categorize them according to their purposes. By introducing their main functions and tools, users can become more familiar with the wide range of databases that exist. We showed how to utilize the databases through specifically built queries and examples, dependent on the purpose of the user. The queries and examples we showed require no expert knowledge or computing skills from the users to utilize the databases.

4. Advanced Utilization of COVID-19 Databases

We aimed to help users with their database utilization by providing specific examples of utilizing the databases we have introduced. In addition to the examples we provided, it is important for users to learn personalized database utilization. We expect that clinical researchers, epidemiologists, and clinicians will find specific uses for our examples [51,52]. Not only will users be able to understand the specific queries we presented but they will also be able to devise their own queries and fully utilize the databases we have introduced. In addition to the four queries we showed, we built additional queries (Figures S3 and S4).
  • “Periods of high numbers of patient deaths and severe cases during the COVID-19 pandemic can be used as a reference to investigate lineage/clade variants to identify dangerous variants and related candidate variants.”
During the COVID-19 pandemic, patient deaths and severe cases occurred in several waves [53]. Worldwide, vaccination began in 2021, with vaccines continuing to be administered at present [54,55,56]. Nevertheless, COVID-19 patient numbers are still on the rise. Numerous COVID-19 variants have emerged through genetic evolution, and in that process, new lineages have disappeared and appeared. Utilizing the WHO database, current and past variants of concerns (VOCs) can be identified [57]. By examining these variants, lineages during periods of high rates of deaths and severe COVID-19 cases can be determined. Research identifying candidates for variants with high associated COVID-19 risks is necessary [58,59]. Therefore, we carried out the following analysis integrating various databases, which will help with the optimization of dealing with future variants and help antibody and drug development [60,61,62].
(1) To identify VOCs, we accessed the WHO Tracking SARS-CoV-2 Variants page to check Alpha (B.1.1.7, 8 December 2020), Beta (B.1.351, 18 December 2020), Gamma (P.1, 11 January 2021), Delta (B.1.617.2, 11 May 2021), and Omicron (B.1.1.519, 26 November 2021) variants and check the lineages and VOC dates;
(2) To identify the mutations in VOCs, we navigated to cov-lineages.org to identify characteristic mutations for the Alpha, Beta, Gamma, Delta, and Omicron lineages;
(3) To obtain data on VOCs, we checked Nextstrain’s latest global analysis table. In the upper left corner, by selecting the dataset (ncov, open or GISAID, global or country), date range (22 December 2019 to 7 November 2022), and color (GISAID, Pango lineage, etc.), we could identify the first reported date and the latest date for VOCs through phylogeny;
(4) We checked epidemiological data by navigating to the Our World in Data database’s COVID-19 Data Explorer Table. We used COVID-19 Data Explorer to check the number of deaths, sorted by international or country, and selected confirmed deaths for metrics. The date range provides data from 28 January 2020 to 11 November 2022;
(5) Upon investigating Pango lineage by deaths per 1 million people from 28 January 2020 to 11 November 2022, we verified that the Delta (21A), Delta (21J), Omicron (21M), Omicron (21K), Omicron (21L), and Omicron (22B) lineages fit in the time period. We could predict our candidate variants by analyzing the lineages that fit in the time period. Upon analyzing the lineages during the periods of high numbers of confirmed deaths, for the spike protein area, the common variants were T478K and D614G. The variants that had different amino acid substitutions in the same location were P681R (Delta) and P681H (Omicron). For the N protein, R203M was found (Figure 2).
① The WHO Tracking SARS-CoV-2 Variants page was utilized to check currently circulating variants of concern (VOCs) and previously circulating VOCs (the currently circulating VOC is Omicron);
② To check the mutations of Omicron’s sublineages (BA.1, BA.2, BA.3, BA.4, and BA.5), cov-lineages.org was used. Upon clicking on cov-lineages.org Pango, the characteristic mutation table data were utilized to search for each lineage of BA.1, BA.2, BA.3, BA.4, and BA.5;
③ To check Omicron’s sublineage data and date information, Nextstrain was used. Upon clicking on Nextstrain’s latest global analysis—GISAID or latest global analysis—open data, the SARS-CoV-2 genomic epidemiology was checked. The data can be filtered by dataset, data range, color by options, etc.;
④ To obtain data on global deaths from the time period when Omicron was dominant, Our World in Data was used. By clicking on the COVID-19 Data Explorer tab, the COVID-19 data explorer is accessed. In options, country name was selected for sorting, confirmed deaths were selected for metrics, and the 7-day rolling average was selected for intervals;
⑤ Based on these results, comprehensive analysis was possible. Through the overlap of lineages’ date of origin and period of time for COVID-19 deaths, common variants were extracted. A route for finding dangerous variants was provided.
This query will be the most useful for targets in the descending order of clinical researchers, epidemiologists, and clinicians.
2.
“For representative SARS-CoV-2 Delta and Omicron lineages, we checked virus and host protein interactions, and after investigating the related publications, we explained the relevance to SARS-CoV-2.”
A well-known SARS-CoV-2 infection pathway involves the coronavirus spike protein detecting the angiotensin-converting enzyme 2 (ACE-2) receptor in the host membrane and entering the host cell to cause infection [63,64,65,66,67]. Numerous mutations have occurred in SARS-CoV-2 through evolution since its outbreak [68,69,70]. These mutations affect 3-D protein structures and can cause problems: vaccine and treatment effectiveness can be weakened, leading to an increase in the number of infections [71,72,73]. From this perspective, an understanding of the virus pathogen structure, infection process and pathway, and interaction is needed [74,75,76]. To investigate viral molecular cell biology, we used various databases to check virus and host interactions. We can then extract the candidate protein for utilization in the development of new treatments [77,78].
(1) We utilized cov-lineages.org to identify the Pango lineages of the WHO nomenclature Delta and Omicron variants to be Delta B.1.617.2 and Omicron BA.1, BA.2, BA.3, BA.4, and BA.5;
(2) We utilized CovInter to investigate the interaction between coronavirus and the host protein in the spike protein, where the highest number of SARS-CoV-2 mutations are found. We selected the Delta (B.1.617.2 and AY lineages) strain hCoV-19/Namibia/N17380/2021 and Omicron (B.1.1.529; BA.1; BA.1.1; BA.2; BA.3; BA.4; BA.5) strain hCoV-19/Zimbabwe/CERI-KRISP-K034087/2021;
(3) From the dates of Delta hCoV-19/Namibia/N17380/2021 and Omicron hCoV-19/Zimbabwe/CERI-KRISP-K034087/2021, we can locate and analyze the virus RNA host protein network;
(4) Using the LitCovid database, we searched for the host IMP-1 protein that interacts with both the Delta and Omicron variants, specifically for publications with wet lab experiments;
(5) Through analysis of the Delta and Omicron variants, candidate proteins that interact with both lineages IMP-1, SND1, EIF3D, EIF3H, FASTKD2, GRWD1, IMP-2, G15, LSM11, and UTP18 were found. By extracting data on these interacting proteins, we can use them to supply data for vaccine and treatment development that is not affected by the emergence of variants (Figure 3).
① To check the mutations present in the Omicron variant, cov-lineages.org was used. Upon clicking on Pango, Delta and Omicron were searched for in the lineage list. For each lineage, data from the characteristic mutation table were obtained;
② To examine the interaction between coronavirus RNA and the host protein, CovInter was used. To check the spike protein region, Search Virus RNA was selected, and the S region was clicked on. Among those, Delta and Omicron analysis samples were clicked on to extract the host protein list from each;
③ To check candidate proteins’ connection to SARS-CoV-2, publications were searched for through LitCovid;
④ Consequently, through an interaction between coronavirus RNA and host protein and Delta and Omicron comparisons, unique proteins and shared proteins were found. Shared proteins can be utilized in vaccines or treatments against multiple SARS-CoV-2 variants.
This query will be the most useful for targets in the descending order of clinical researchers, clinicians, and epidemiologists.
3.
“For the currently circulating VOC Omicron, taking vaccination rates and reproduction rates into consideration, we can obtain effective antibody and drug data based on clinical trial information and Omicron genetic mutation.”
The current dominant SARS-CoV-2 strain worldwide is BA.5 [79,80,81]. Many countries are carrying out vaccinations as a preventive measure against COVID-19 with vaccine and treatment development also underway [82,83]. Many mutations occur in SARS-CoV-2 through evolution, which makes preexisting vaccines ineffective. To combat the continued evolution of SARS-CoV-2, research that targets the mutations is needed [84]. To select effective antibodies and drugs based on mutation data on reported lineages, we suggest the following process [85,86,87].
(1) Through the WHO’s Tracking SARS-CoV-2 Variants page, we can verify that Omicron is the currently circulating VOC;
(2) To obtain the latest data on the Omicron lineage, we searched for BA.5 information through Nextstrain, specifically for BA.5 date information;
(3) From the BA.5 date of origin, we checked vaccination and reproduction rates in the Our World in Data database. BA.5 is indicated in the figure with the vaccine dose in yellow and the reproduction rate in green. Worldwide, new vaccine dose rates have declined in 2022, while the reproduction rate has been consistent thus far;
(4) Through Nextstrain, we checked not just BA.5 but also BA.2, so we used cov-lineages.org to obtain and analyze data on BA.5 and BA.2 variants;
(5) Upon investigating the mutations of the currently dominant strains BA.5 and BA.2, many shared mutations were found in the spike protein. We utilized the ESC database to search for antibodies and related drugs against T478K among the shared mutations. Entering T478K in the search bar shows related vaccines and antibodies, and for drugs related to the T478K mutation, etesevimab data are provided;.
(6) We can use the NIH’s ClinicalTrials.gov to find a study feature to look for etesevimab and current progress on COVID-19 clinical trials;
(7) For effective measures against COVID-19 variants that occur through mutation, research on cures and prevention must be carried out. Excluding the shared mutations for BA.5 and BA.2, if we investigate the unique mutations in the spike protein, we can find del69/70, L452R, and F486V for BA.5 and Q493R for BA.2. For unique mutations of BA.5, antibodies and treatments, such as bamlanivimab, cilgavimab, tixagevimab, and casirivimab are found, while for unique mutations of BA.2, candidate antibodies or treatments do not exist. Therefore, to maximize the effectiveness of treatments, we need data on each variant (Figure 4).
① Through the WHO Tracking SARS-CoV-2 Variants, the currently circulating VOC (Omicron) was checked;
② From information on the currently circulating VOC Omicron, Nextstrain was used to check BA.5 and BA.2. Upon clicking on Nextstrain’s latest global analysis—GISAID tab or latest global analysis—open data tab, genomic epidemiology data of BA.5 and BA.2 can be checked;
③ Our World in Data was used to check the connection with the period of time of BA.5 and BA.2 prevalence and vaccine dose and the reproduction rate among the epidemiological data;
④ Mutation data on the BA.5 and BA.2 variants were collected from cov-lineages.org;
⑤ From these mutations, mutations found in common were extracted (T478K). Antibody and drug data on the extracted mutations were collected from ESC (etesevimab);
⑥ To determine whether there were clinical trial data on etesevimab, ClinicalTrials.gov was used;
⑦ Regarding vaccine doses, vaccination rates are decreasing globally, whereas there are no significant changes in reproduction rates. Upon investigating the mutations of dominant strains BA.5 and BA.2, antibodies or drugs that target the unique mutations can be searched for. Based on these results, antibodies and drugs that are effective against the variant type or characteristic can be presented.
This query will be the most useful for targets in the descending order of clinicians, clinical researchers, and epidemiologists.
4.
“For the VOC Omicron, we obtained data on Pango lineages BA.1, BA.2, BA.3, BA.4, and BA.5, and from current worldwide policy indexes, such as the stringency index and containment and health index and vaccination policy, and we obtained data on countries’ policy responses to COVID-19.”
Following the start of the COVID-19 pandemic, nations worldwide implemented various policies in response to the pandemic [88]. Initial policy responses entailed stricter measures, such as restrictions on gatherings, stay-at-home requirements, and travel restrictions [89,90,91]. Following the vaccine development, with the increase in worldwide vaccinations, policies have changed and become endemic [92,93,94]. However, there were indications that COVID-19 might break out again in late 2022, so constant care and attention are needed. For our example, we planned to investigate countries’ current policy responses based on the currently circulating VOC Omicron and its associated Pango lineages [95,96,97]. We examined the G20 nations in Asia (South Korea, China, Japan, India, Indonesia, Turkey, and Saudi Arabia), Europe ( France, Germany, Italy, the United Kingdom, and Russia), the Americas (the United States, Canada, Mexico, Argentina, and Brazil), Africa (South Africa), and Oceania (Australia).
(1) To investigate the VOCs, we used the WHO Tracking SARS-CoV-2 Variants page to verify that the currently circulating VOC is Omicron, including sublineages BA.1, BA.2, BA.3, BA.4, and BA.5;
(2) Utilizing Nextstrain, we verified the earliest date and most recently observed date for the currently dominant strains BA.1, BA.2, BA.3, BA.4, and BA.5;
(3) To view the worldwide policy response situation, we navigated to Our World in Data’s Coronavirus Pandemic (COVID-19) page and selected the Policy Responses Table. The Policy Responses tab includes information, such as the stringency index, containment and health index, vaccination policy, and Google mobility trends;
(4) Stringency index scores, calculated by the metrics of school closures, workplace closures, cancellation of public events, restrictions on public gatherings, closures of public transport, stay-at-home requirements, public information campaigns, restrictions on internal movements, and international travel controls, are decreasing for G20 countries regardless of vaccination rates. The containment and health index indicates the strictness of government responses, with the strictest country being China. All nations worldwide, excluding a few countries, such as Germany, Egypt, Senegal, and Sierra Leone, currently have universal availability as their vaccination policy. Childhood vaccination eligibility and vaccination eligibility for each country are visually displayed;
(5) Based on the VOC Omicron and the date information of the currently dominant strains BA.2 and BA.5, we investigated the governmental policy responses at the time;
(6) Prior to the emergence of BA.2 and BA.5, deaths from COVID-19 were on the decline in G20 countries, with the trend continuing after their emergence (excluding countries that do not have data). Following the emergence of BA.2 and BA.5, the stringency index and containment and health index became less strict. For infection management, policy guidelines should be established not merely based on COVID-19 deaths, but on the basis of COVID-19 genome and expression data (Figure 5).
① Through the WHO Tracking SARS-CoV-2 Variants page, the currently circulating VOC Omicron can be checked with its lineages BA.1, BA.2, BA.3, BA.4, and BA.5;
② To check the sublineages and date information on the currently circulating VOC Omicron, Nextstrain was used. Upon clicking on Nextstrain’s latest global analysis—GISAID tab or latest global analysis—open data tab, BA.5 and BA.2 data can be accessed;
③ To check data on policies among epidemiological data, Our World in Data was used;
④ By clicking on Our World in Data’s Policy Responses tab, data on numerous COVID-19 policies can be obtained. The stringency index, containment and health index, and vaccination policy were utilized (G20 countries were selected);
⑤ From the stringency index data, after the emergence of BA.5 and BA.2, all countries have become less strict in their policies and containment and health index measures. Since deaths from BA.5 and BA.2 have remained low, relaxed policies related to vaccination or hygiene can provide direction.
This query will be the most useful for targets in the descending order of epidemiologists, clinicians, and clinical researchers.

5. Conclusions

Currently, COVID-19 data are being continuously accumulated in worldwide databases. We have drawn comparisons between 15 primary databases, organizing them into categories and investigating the data present. The challenge posed by utilizing these databases is the difficulty in locating and accessing the necessary data due to how they are specialized. We designed queries, guidelines, and supplementary materials to address these issues. The integration of multiple COVID-19 databases makes use of the individual strengths of the databases we investigated and categorized to comprehensively present methods to answer important scientific questions. The importance of our integrative analysis of various COVID-19 databases is that clinical researchers, epidemiologists, and clinicians working on COVID-19 have increased accessibility to SARS-CoV-2 and COVID-19 data and the ability to effectively utilize databases.
COVID-19 databases are specialized to present specific datasets, yet there is a need to comprehensively access databases. We present our methodology as an easy, effective access to necessary COVID-19 data without expertise in computing or data science. To prevent the recurrence of the pandemic, it is important for researchers, clinicians, epidemiologists, pharmaceutical companies, and public officers to have fast access to necessary data. For a rapidly mutating virus, such as SARS-CoV-2, it is crucial to keep track of emerging variants and mutation data. Furthermore, it becomes possible to find and recommend necessary treatments utilizing COVID-19 data. The methodology investigated and explained in our study is expected to be used by various researchers and be the basis for building infrastructure and resources to effectively deal with any future epidemics that may occur. A new pandemic will lead to a compilation of epidemiologic data and genomic data, necessitating data categorization and interpretation. The queries and guidelines we have presented for navigating and utilizing databases will therefore be helpful for tracking patients and the disease, finding suitable treatments, and data analysis.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/v15030692/s1, Figure S1: COVID-19 data amount and publications collected from January 2020 to October 2022; Figure. S2: Internally developed tools and sublinks of COVID-19 databases; Figure S3: Comprehensive utilization of COVID-19 databases 5; Figure S4: Comprehensive utilization of COVID-19 databases 6; Table S1: GISAID; Table S2: Cov-lineages.org; Table S3. COVID CG; Table S4: COVID-19 Data Portal; Table S5: Nextstrain; Table S6: NIH (NCBI SARS-CoV-2 resources); Table S7: PDB; Table S8: World Health Organization (WHO); Table S9: University of California, Santa Cruz (UCSC) Genome Browser; Table S10: Dock CoV-2; Table S11: Our World in Data; Table S12: Johns Hopkins University coronavirus resource center; Table S13: Immune Escape variants in SARS-CoV-2 (ESC); Table S14: T-cell COVID-19 Atlas (T-CoV); Table S15: CovInter; Supplementary Notes: A detailed guide on navigating COVID-19 databases.

Author Contributions

Study design: D.Y.S., J.P. and D.H.; Data collection: D.Y.S., J.P. and K.Y.; Data analysis: D.Y.S. and J.P.; Writing: D.Y.S., J.P. and D.H.; Supervision: D.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by grants from the National Research Foundation of Korea (NRF) grant funded by the Korea government (Ministry of Science and ICT) (NRF-2021M3H9A2097227 and NRF-2022R1A2C3008162) and the Catholic Medical Center Research Foundation made in the program year of 2020.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors gratefully acknowledge the researchers from the laboratories responsible for obtaining the specimens or submitting the genomic data and sharing them via GISAID. The authors thank Global Science experimental Data hub Center (GSDC) and Korea Research Environment Open NETwork (KREONET), which is managed and operated by the Korea Institute of Science and Technology Information (KISTI).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Ciotti, M.; Ciccozzi, M.; Terrinoni, A.; Jiang, W.C.; Wang, C.B.; Bernardini, S. The COVID-19 pandemic. Crit. Rev. Clin. Lab. Sci. 2020, 57, 365–388. [Google Scholar] [CrossRef]
  2. Wu, F.; Zhao, S.; Yu, B.; Chen, Y.M.; Wang, W.; Song, Z.G.; Hu, Y.; Tao, Z.W.; Tian, J.H.; Pei, Y.Y.; et al. A new coronavirus associated with human respiratory disease in China. Nature 2020, 579, 265–269. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  3. Sun, K.; Chen, J.; Viboud, C. Early epidemiological analysis of the coronavirus disease 2019 outbreak based on crowdsourced data: A population-level observational study. Lancet Digit. Health 2020, 2, e201–e208. [Google Scholar] [CrossRef] [PubMed]
  4. Xu, B.; Gutierrez, B.; Mekaru, S.; Sewalk, K.; Goodwin, L.; Loskill, A.; Cohn, E.L.; Hswen, Y.; Hill, S.C.; Cobo, M.M.; et al. Epidemiological data from the COVID-19 outbreak, real-time case information. Sci. Data 2020, 7, 106. [Google Scholar] [CrossRef] [Green Version]
  5. Hu, B.; Guo, H.; Zhou, P.; Shi, Z.L. Characteristics of SARS-CoV-2 and COVID-19. Nat. Rev. Microbiol. 2021, 19, 141–154. [Google Scholar] [CrossRef] [PubMed]
  6. Knyazev, S.; Chhugani, K.; Sarwal, V.; Ayyala, R.; Singh, H.; Karthikeyan, S.; Deshpande, D.; Baykal, P.I.; Comarova, Z.; Lu, A.; et al. Unlocking capacities of genomics for the COVID-19 response and future pandemics. Nat. Methods 2022, 19, 374–380. [Google Scholar] [CrossRef]
  7. Arya, R.; Kumari, S.; Pandey, B.; Mistry, H.; Bihani, S.C.; Das, A.; Prashar, V.; Gupta, G.D.; Panicker, L.; Kumar, M. Structural insights into SARS-CoV-2 proteins. J. Mol. Biol. 2021, 433, 166725. [Google Scholar] [CrossRef] [PubMed]
  8. Wang, M.Y.; Zhao, R.; Gao, L.J.; Gao, X.F.; Wang, D.P.; Cao, J.M. SARS-CoV-2: Structure, Biology, and Structure-Based Therapeutics Development. Front. Cell. Infect. Microbiol. 2020, 10, 587269. [Google Scholar] [CrossRef] [PubMed]
  9. Doshi, P.; Godlee, F.; Abbasi, K. COVID-19 vaccines and treatments: We must have raw data, now. BMJ 2022, 376, o102. [Google Scholar] [CrossRef]
  10. Grimaldi, A.; Panariello, F.; Annunziata, P.; Giuliano, T.; Daniele, M.; Pierri, B.; Colantuono, C.; Salvi, M.; Bouche, V.; Manfredi, A.; et al. Improved SARS-CoV-2 sequencing surveillance allows the identification of new variants and signatures in infected patients. Genome Med. 2022, 14, 90. [Google Scholar] [CrossRef]
  11. Verbeke, R.; Lentacker, I.; De Smedt, S.C.; Dewitte, H. The dawn of mRNA vaccines: The COVID-19 case. J. Control. Release 2021, 333, 511–520. [Google Scholar] [CrossRef] [PubMed]
  12. Hammerman, A.; Sergienko, R.; Friger, M.; Beckenstein, T.; Peretz, A.; Netzer, D.; Yaron, S.; Arbel, R. Effectiveness of the BNT162b2 Vaccine after Recovery from COVID-19. N. Engl. J. Med. 2022, 386, 1221–1229. [Google Scholar] [CrossRef] [PubMed]
  13. Brito, A.F.; Semenova, E.; Dudas, G.; Hassler, G.W.; Kalinich, C.C.; Kraemer, M.U.G.; Ho, J.; Tegally, H.; Githinji, G.; Agoti, C.N.; et al. Global disparities in SARS-CoV-2 genomic surveillance. Nat. Commun. 2022, 13, 7003. [Google Scholar] [CrossRef] [PubMed]
  14. Gianola, S.; Jesus, T.S.; Bargeri, S.; Castellini, G. Characteristics of academic publications, preprints, and registered clinical trials on the COVID-19 pandemic. PLoS ONE 2020, 15, e0240123. [Google Scholar] [CrossRef]
  15. Palayew, A.; Norgaard, O.; Safreed-Harmon, K.; Andersen, T.H.; Rasmussen, L.N.; Lazarus, J.V. Pandemic publishing poses a new COVID-19 challenge. Nat. Hum. Behav. 2020, 4, 666–669. [Google Scholar] [CrossRef]
  16. Elbe, S.; Buckland-Merrett, G. Data, disease and diplomacy: GISAID’s innovative contribution to global health. Glob. Chall. 2017, 1, 33–46. [Google Scholar] [CrossRef] [Green Version]
  17. Shu, Y.; McCauley, J. GISAID: Global initiative on sharing all influenza data—From vision to reality. Euro Surveill. 2017, 22, 30494. [Google Scholar] [CrossRef] [Green Version]
  18. Madeira, F.; Pearce, M.; Tivey, A.R.N.; Basutkar, P.; Lee, J.; Edbali, O.; Madhusoodanan, N.; Kolesnikov, A.; Lopez, R. Search and sequence analysis tools services from EMBL-EBI in 2022. Nucleic Acids Res. 2022, 50, W276–W279. [Google Scholar] [CrossRef]
  19. Ullah, S.; Al-Sehemi, A.G.; KlemeŠ, J.J.; Saqib, S.; Gondal, S.M.A.; Saqib, S.; Arshad, A.; Saqib, H.; Mukhtar, A.; Ibrahim, M.; et al. A Review of the Progress of COVID-19 Vaccine Development. Düzce Tıp. Fakültesi Derg. 2021, 23, 1–23. [Google Scholar] [CrossRef]
  20. Zhang, J.; Zeng, H.; Gu, J.; Li, H.; Zheng, L.; Zou, Q. Progress and Prospects on Vaccine Development against SARS-CoV-2. Vaccines 2020, 8, 153. [Google Scholar] [CrossRef] [Green Version]
  21. Chen, Q.; Allot, A.; Leaman, R.; Islamaj, R.; Du, J.; Fang, L.; Wang, K.; Xu, S.; Zhang, Y.; Bagherzadeh, P.; et al. Multi-label classification for biomedical literature: An overview of the BioCreative VII LitCovid Track for COVID-19 literature topic annotations. Database 2022, 2022, baac069. [Google Scholar] [CrossRef] [PubMed]
  22. Chen, Q.; Allot, A.; Leaman, R.; Wei, C.H.; Aghaarabi, E.; Guerrerio, J.J.; Xu, L.; Lu, Z. LitCovid in 2022: An information resource for the COVID-19 literature. Nucleic Acids Res. 2023, 51, D1512–D1518. [Google Scholar] [CrossRef] [PubMed]
  23. Duong, D. Alpha, Beta, Delta, Gamma: What’s important to know about SARS-CoV-2 variants of concern? CMAJ 2021, 193, E1059–E1060. [Google Scholar] [CrossRef] [PubMed]
  24. Shiehzadegan, S.; Alaghemand, N.; Fox, M.; Venketaraman, V. Analysis of the Delta Variant B.1.617.2 COVID-19. Clin. Pr. 2021, 11, 778–784. [Google Scholar] [CrossRef] [PubMed]
  25. Tian, D.; Sun, Y.; Xu, H.; Ye, Q. The emergence and epidemic characteristics of the highly mutated SARS-CoV-2 Omicron variant. J. Med. Virol. 2022, 94, 2376–2383. [Google Scholar] [CrossRef]
  26. Cella, E.; Benedetti, F.; Fabris, S.; Borsetti, A.; Pezzuto, A.; Ciotti, M.; Pascarella, S.; Ceccarelli, G.; Zella, D.; Ciccozzi, M.; et al. SARS-CoV-2 Lineages and Sub-Lineages Circulating Worldwide: A Dynamic Overview. Chemotherapy 2021, 66, 3–7. [Google Scholar] [CrossRef]
  27. Parums, V. Editorial: Revised World Health Organization (WHO) Terminology for Variants of Concern and Variants of Interest of SARS-CoV-2. Med. Sci. Monit. 2021, 27, e933622. [Google Scholar] [CrossRef]
  28. Rambaut, A.; Holmes, E.C.; O’Toole, A.; Hill, V.; McCrone, J.T.; Ruis, C.; du Plessis, L.; Pybus, O.G. A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nat. Microbiol. 2020, 5, 1403–1407. [Google Scholar] [CrossRef]
  29. Meehan, M.T.; Rojas, D.P.; Adekunle, A.I.; Adegboye, O.A.; Caldwell, J.M.; Turek, E.; Williams, B.M.; Marais, B.J.; Trauer, J.M.; McBryde, E.S. Modelling insights into the COVID-19 pandemic. Paediatr. Respir. Rev. 2020, 35, 64–69. [Google Scholar] [CrossRef]
  30. Li, M.; Wang, H.; Tian, L.; Pang, Z.; Yang, Q.; Huang, T.; Fan, J.; Song, L.; Tong, Y.; Fan, H. COVID-19 vaccine development: Milestones, lessons and prospects. Signal. Transduct. Target Ther. 2022, 7, 146. [Google Scholar] [CrossRef]
  31. Tregoning, J.S.; Flight, K.E.; Higham, S.L.; Wang, Z.; Pierce, B.F. Progress of the COVID-19 vaccine effort: Viruses, vaccines and variants versus efficacy, effectiveness and escape. Nat. Rev. Immunol. 2021, 21, 626–636. [Google Scholar] [CrossRef] [PubMed]
  32. Khare, S.; Gurry, C.; Freitas, L.; Schultz, M.B.; Bach, G.; Diallo, A.; Akite, N.; Ho, J.; Lee, R.T.; Yeo, W.; et al. GISAID’s Role in Pandemic Response. China CDC Wkly. 2021, 3, 1049–1051. [Google Scholar] [CrossRef] [PubMed]
  33. O’Toole, A.; Scher, E.; Underwood, A.; Jackson, B.; Hill, V.; McCrone, J.T.; Colquhoun, R.; Ruis, C.; Abu-Dahab, K.; Taylor, B.; et al. Assignment of epidemiological lineages in an emerging pandemic using the pangolin tool. Virus Evol. 2021, 7, veab064. [Google Scholar] [CrossRef] [PubMed]
  34. Chen, A.T.; Altschuler, K.; Zhan, S.H.; Chan, Y.A.; Deverman, B.E. COVID-19 CG enables SARS-CoV-2 mutation and lineage tracking by locations and dates of interest. Elife 2021, 10, e63409. [Google Scholar] [CrossRef]
  35. Harrison, P.W.; Lopez, R.; Rahman, N.; Allen, S.G.; Aslam, R.; Buso, N.; Cummins, C.; Fathy, Y.; Felix, E.; Glont, M.; et al. The COVID-19 Data Portal: Accelerating SARS-CoV-2 and COVID-19 research through rapid open access data sharing. Nucleic Acids Res. 2021, 49, W619–W623. [Google Scholar] [CrossRef]
  36. Hadfield, J.; Megill, C.; Bell, S.M.; Huddleston, J.; Potter, B.; Callender, C.; Sagulenko, P.; Bedford, T.; Neher, R.A. Nextstrain: Real-time tracking of pathogen evolution. Bioinformatics 2018, 34, 4121–4123. [Google Scholar] [CrossRef] [Green Version]
  37. Chen, Q.; Allot, A.; Lu, Z. LitCovid: An open database of COVID-19 literature. Nucleic Acids Res. 2021, 49, D1534–D1540. [Google Scholar] [CrossRef]
  38. Zuo, X.; Chen, Y.; Ohno-Machado, L.; Xu, H. How do we share data in COVID-19 research? A systematic review of COVID-19 datasets in PubMed Central Articles. Brief Bioinform. 2021, 22, 800–811. [Google Scholar] [CrossRef]
  39. Wlodawer, A.; Dauter, Z.; Shabalin, I.G.; Gilski, M.; Brzezinski, D.; Kowiel, M.; Minor, W.; Rupp, B.; Jaskolski, M. Ligand-centered assessment of SARS-CoV-2 drug target models in the Protein Data Bank. FEBS J. 2020, 287, 3703–3718. [Google Scholar] [CrossRef]
  40. Florez, H.; Singh, S. Online dashboard and data analysis approach for assessing COVID-19 case and death data. F1000Research 2020, 9, 570. [Google Scholar] [CrossRef]
  41. Lee, B.T.; Barber, G.P.; Benet-Pages, A.; Casper, J.; Clawson, H.; Diekhans, M.; Fischer, C.; Gonzalez, J.N.; Hinrichs, A.S.; Lee, C.M.; et al. The UCSC Genome Browser database: 2022 update. Nucleic Acids Res. 2022, 50, D1115–D1122. [Google Scholar] [CrossRef]
  42. Turakhia, Y.; Thornlow, B.; Hinrichs, A.S.; De Maio, N.; Gozashti, L.; Lanfear, R.; Haussler, D.; Corbett-Detig, R. Ultrafast Sample placement on Existing tRees (UShER) enables real-time phylogenetics for the SARS-CoV-2 pandemic. Nat. Genet. 2021, 53, 809–816. [Google Scholar] [CrossRef]
  43. Speir, M.L.; Bhaduri, A.; Markov, N.S.; Moreno, P.; Nowakowski, T.J.; Papatheodorou, I.; Pollen, A.A.; Raney, B.J.; Seninge, L.; Kent, W.J.; et al. UCSC Cell Browser: Visualize Your Single-Cell Data. Bioinformatics 2021, 37, 4578–4580. [Google Scholar] [CrossRef]
  44. Mathieu, E.; Ritchie, H.; Ortiz-Ospina, E.; Roser, M.; Hasell, J.; Appel, C.; Giattino, C.; Rodes-Guirao, L. A global database of COVID-19 vaccinations. Nat. Hum. Behav. 2021, 5, 947–953. [Google Scholar] [CrossRef]
  45. Dong, E.; Ratcliff, J.; Goyea, T.D.; Katz, A.; Lau, R.; Ng, T.K.; Garcia, B.; Bolt, E.; Prata, S.; Zhang, D.; et al. The Johns Hopkins University Center for Systems Science and Engineering COVID-19 Dashboard: Data collection process, challenges faced, and lessons learned. Lancet Infect. Dis. 2022, 22, e370–e376. [Google Scholar] [CrossRef]
  46. Chen, T.F.; Chang, Y.C.; Hsiao, Y.; Lee, K.H.; Hsiao, Y.C.; Lin, Y.H.; Tu, Y.E.; Huang, H.C.; Chen, C.Y.; Juan, H.F. DockCoV2: A drug database against SARS-CoV-2. Nucleic Acids Res. 2021, 49, D1152–D1159. [Google Scholar] [CrossRef]
  47. Nersisyan, S.; Zhiyanov, A.; Shkurnikov, M.; Tonevitsky, A. T-CoV: A comprehensive portal of HLA-peptide interactions affected by SARS-CoV-2 mutations. Nucleic Acids Res. 2022, 50, D883–D887. [Google Scholar] [CrossRef]
  48. Amahong, K.; Zhang, W.; Zhou, Y.; Zhang, S.; Yin, J.; Li, F.; Xu, H.; Yan, T.; Yue, Z.; Liu, Y.; et al. CovInter: Interaction data between coronavirus RNAs and host proteins. Nucleic Acids Res. 2023, 51, D546–D556. [Google Scholar] [CrossRef]
  49. Rophina, M.; Pandhare, K.; Shamnath, A.; Imran, M.; Jolly, B.; Scaria, V. ESC: A comprehensive resource for SARS-CoV-2 immune escape variants. Nucleic Acids Res. 2022, 50, D771–D776. [Google Scholar] [CrossRef]
  50. Vasireddy, D.; Vanaparthy, R.; Mohan, G.; Malayala, S.V.; Atluri, P. Review of COVID-19 Variants and COVID-19 Vaccine Efficacy: What the Clinician Should Know? J. Clin. Med. Res. 2021, 13, 317–325. [Google Scholar] [CrossRef]
  51. Callaghan, S. COVID-19 Is a Data Science Issue. Patterns 2020, 1, 100022. [Google Scholar] [CrossRef] [PubMed]
  52. Cevik, M.; Bamford, C.G.G.; Ho, A. COVID-19 pandemic-a focused review for clinicians. Clin. Microbiol. Infect. 2020, 26, 842–847. [Google Scholar] [CrossRef] [PubMed]
  53. Campi, G.; Perali, A.; Marcelli, A.; Bianconi, A. SARS-CoV-2 world pandemic recurrent waves controlled by variants evolution and vaccination campaign. Sci. Rep. 2022, 12, 18108. [Google Scholar] [CrossRef]
  54. Benvenuto, D.; Giovanetti, M.; Ciccozzi, A.; Spoto, S.; Angeletti, S.; Ciccozzi, M. The 2019-new coronavirus epidemic: Evidence for virus evolution. J. Med. Virol. 2020, 92, 455–459. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  55. Rydland, H.T.; Friedman, J.; Stringhini, S.; Link, B.G.; Eikemo, T.A. The radically unequal distribution of COVID-19 vaccinations: A predictable yet avoidable symptom of the fundamental causes of inequality. Humanit. Soc. Sci. Commun. 2022, 9, 1–6. [Google Scholar] [CrossRef]
  56. Tao, K.; Tzou, P.L.; Nouhin, J.; Gupta, R.K.; de Oliveira, T.; Kosakovsky Pond, S.L.; Fera, D.; Shafer, R.W. The biological and clinical significance of emerging SARS-CoV-2 variants. Nat. Rev. Genet. 2021, 22, 757–773. [Google Scholar] [CrossRef] [PubMed]
  57. Parums, D.V. Editorial: World Health Organization (WHO) Variants of Concern Lineages Under Monitoring (VOC-LUM) in Response to the Global Spread of Lineages and Sublineages of Omicron, or B.1.1.529, SARS-CoV-2. Med. Sci. Monit. 2022, 28, e937676. [Google Scholar] [CrossRef] [PubMed]
  58. Hadj Hassine, I. COVID-19 vaccines and variants of concern: A review. Rev. Med. Virol. 2022, 32, e2313. [Google Scholar] [CrossRef]
  59. Toyoshima, Y.; Nemoto, K.; Matsumoto, S.; Nakamura, Y.; Kiyotani, K. SARS-CoV-2 genomic variations associated with mortality rate of COVID-19. J. Hum. Genet. 2020, 65, 1075–1082. [Google Scholar] [CrossRef]
  60. Chen, J.; Gao, K.; Wang, R.; Nguyen, D.D.; Wei, G.W. Review of COVID-19 Antibody Therapies. Annu. Rev. Biophys. 2021, 50, 1–30. [Google Scholar] [CrossRef]
  61. Ku, Z.; Ye, X.; Salazar, G.T.a.; Zhang, N.; An, Z. Antibody therapies for the treatment of COVID-19. Antib. Ther. 2020, 3, 101–108. [Google Scholar] [CrossRef]
  62. Ning, L.; Abagna, H.B.; Jiang, Q.; Liu, S.; Huang, J. Development and application of therapeutic antibodies against COVID-19. Int. J. Biol. Sci. 2021, 17, 1486–1496. [Google Scholar] [CrossRef]
  63. Hoffmann, M.; Kleine-Weber, H.; Schroeder, S.; Kruger, N.; Herrler, T.; Erichsen, S.; Schiergens, T.S.; Herrler, G.; Wu, N.H.; Nitsche, A.; et al. SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2 and Is Blocked by a Clinically Proven Protease Inhibitor. Cell 2020, 181, 271–280.e278. [Google Scholar] [CrossRef]
  64. Jackson, C.B.; Farzan, M.; Chen, B.; Choe, H. Mechanisms of SARS-CoV-2 entry into cells. Nat. Rev. Mol. Cell Biol. 2022, 23, 3–20. [Google Scholar] [CrossRef] [PubMed]
  65. Lukassen, S.; Chua, R.L.; Trefzer, T.; Kahn, N.C.; Schneider, M.A.; Muley, T.; Winter, H.; Meister, M.; Veith, C.; Boots, A.W.; et al. SARS-CoV-2 receptor ACE2 and TMPRSS2 are primarily expressed in bronchial transient secretory cells. EMBO J. 2020, 39, e105114. [Google Scholar] [CrossRef]
  66. Salian, V.S.; Wright, J.A.; Vedell, P.T.; Nair, S.; Li, C.; Kandimalla, M.; Tang, X.; Carmona Porquera, E.M.; Kalari, K.R.; Kandimalla, K.K. COVID-19 Transmission, Current Treatment, and Future Therapeutic Strategies. Mol. Pharm. 2021, 18, 754–771. [Google Scholar] [CrossRef]
  67. Shang, J.; Wan, Y.; Luo, C.; Ye, G.; Geng, Q.; Auerbach, A.; Li, F. Cell entry mechanisms of SARS-CoV-2. Proc. Natl. Acad. Sci. USA 2020, 117, 11727–11734. [Google Scholar] [CrossRef]
  68. Magazine, N.; Zhang, T.; Wu, Y.; McGee, M.C.; Veggiani, G.; Huang, W. Mutations and Evolution of the SARS-CoV-2 Spike Protein. Viruses 2022, 14, 640. [Google Scholar] [CrossRef]
  69. Singh, D.; Yi, S.V. On the origin and evolution of SARS-CoV-2. Exp. Mol. Med. 2021, 53, 537–547. [Google Scholar] [CrossRef]
  70. Wrobel, A.G.; Benton, D.J.; Roustan, C.; Borg, A.; Hussain, S.; Martin, S.R.; Rosenthal, P.B.; Skehel, J.J.; Gamblin, S.J. Evolution of the SARS-CoV-2 spike protein in the human host. Nat. Commun. 2022, 13, 1178. [Google Scholar] [CrossRef] [PubMed]
  71. Guruprasad, L. Human SARS-CoV-2 spike protein mutations. Proteins 2021, 89, 569–576. [Google Scholar] [CrossRef] [PubMed]
  72. Wu, S.; Tian, C.; Liu, P.; Guo, D.; Zheng, W.; Huang, X.; Zhang, Y.; Liu, L. Effects of SARS-CoV-2 mutations on protein structures and intraviral protein-protein interactions. J. Med. Virol. 2021, 93, 2132–2140. [Google Scholar] [CrossRef] [PubMed]
  73. Yang, T.J.; Yu, P.Y.; Chang, Y.C.; Liang, K.H.; Tso, H.C.; Ho, M.R.; Chen, W.Y.; Lin, H.T.; Wu, H.C.; Hsu, S.D. Effect of SARS-CoV-2 B.1.1.7 mutations on spike protein structure and function. Nat. Struct. Mol. Biol. 2021, 28, 731–739. [Google Scholar] [CrossRef] [PubMed]
  74. Jahanafrooz, Z.; Chen, Z.; Bao, J.; Li, H.; Lipworth, L.; Guo, X. An overview of human proteins and genes involved in SARS-CoV-2 infection. Gene 2022, 808, 145963. [Google Scholar] [CrossRef]
  75. Yao, H.; Song, Y.; Chen, Y.; Wu, N.; Xu, J.; Sun, C.; Zhang, J.; Weng, T.; Zhang, Z.; Wu, Z.; et al. Molecular Architecture of the SARS-CoV-2 Virus. Cell 2020, 183, 730–738 e713. [Google Scholar] [CrossRef]
  76. Zhang, J.; Xiao, T.; Cai, Y.; Chen, B. Structure of SARS-CoV-2 spike protein. Curr. Opin. Virol. 2021, 50, 173–182. [Google Scholar] [CrossRef]
  77. Sadegh, S.; Matschinske, J.; Blumenthal, D.B.; Galindez, G.; Kacprowski, T.; List, M.; Nasirigerdeh, R.; Oubounyt, M.; Pichlmair, A.; Rose, T.D.; et al. Exploring the SARS-CoV-2 virus-host-drug interactome for drug repurposing. Nat. Commun. 2020, 11, 3518. [Google Scholar] [CrossRef]
  78. Yang, S.L.; DeFalco, L.; Anderson, D.E.; Zhang, Y.; Aw, J.G.A.; Lim, S.Y.; Lim, X.N.; Tan, K.Y.; Zhang, T.; Chawla, T.; et al. Comprehensive mapping of SARS-CoV-2 interactions in vivo reveals functional virus-host interactions. Nat. Commun. 2021, 12, 5113. [Google Scholar] [CrossRef]
  79. Cao, Y.; Yisimayi, A.; Jian, F.; Song, W.; Xiao, T.; Wang, L.; Du, S.; Wang, J.; Li, Q.; Chen, X.; et al. BA.2.12.1, BA.4 and BA.5 escape antibodies elicited by Omicron infection. Nature 2022, 608, 593–602. [Google Scholar] [CrossRef]
  80. Mahase, E. COVID-19: What we know about the BA.4 and BA.5 omicron variants. BMJ 2022, 378, o1969. [Google Scholar] [CrossRef]
  81. Tallei, T.E.; Alhumaid, S.; AlMusa, Z.; Fatimawali; Kusumawaty, D.; Alynbiawi, A.; Alshukairi, A.N.; Rabaan, A.A. Update on the omicron sub-variants BA.4 and BA.5. Rev. Med. Virol. 2023, 33, e2391. [Google Scholar] [CrossRef] [PubMed]
  82. Chen, J.; Gao, K.; Wang, R.; Wei, G.W. Prediction and mitigation of mutation threats to COVID-19 vaccines and antibody therapies. Chem. Sci. 2021, 12, 6929–6948. [Google Scholar] [CrossRef] [PubMed]
  83. Han, X.; Ye, Q. The variants of SARS-CoV-2 and the challenges of vaccines. J. Med. Virol. 2022, 94, 1366–1372. [Google Scholar] [CrossRef] [PubMed]
  84. Llanes, A.; Restrepo, C.M.; Caballero, Z.; Rajeev, S.; Kennedy, M.A.; Lleonart, R. Betacoronavirus Genomes: How Genomic Information has been Used to Deal with Past Outbreaks and the COVID-19 Pandemic. Int. J. Mol. Sci. 2020, 21, 4546. [Google Scholar] [CrossRef] [PubMed]
  85. Eskandarzade, N.; Ghorbani, A.; Samarfard, S.; Diaz, J.; Guzzi, P.H.; Fariborzi, N.; Tahmasebi, A.; Izadpanah, K. Network for network concept offers new insights into host- SARS-CoV-2 protein interactions and potential novel targets for developing antiviral drugs. Comput Biol. Med. 2022, 146, 105575. [Google Scholar] [CrossRef] [PubMed]
  86. Gordon, D.E.; Jang, G.M.; Bouhaddou, M.; Xu, J.; Obernier, K.; White, K.M.; O’Meara, M.J.; Rezelj, V.V.; Guo, J.Z.; Swaney, D.L.; et al. A SARS-CoV-2 protein interaction map reveals targets for drug repurposing. Nature 2020, 583, 459–468. [Google Scholar] [CrossRef]
  87. Yaqinuddin, A.; Shafqat, A.; Kashir, J.; Alkattan, K. Effect of SARS-CoV-2 Mutations on the Efficacy of Antibody Therapy and Response to Vaccines. Vaccines 2021, 9, 914. [Google Scholar] [CrossRef]
  88. McBryde, E.S.; Meehan, M.T.; Adegboye, O.A.; Adekunle, A.I.; Caldwell, J.M.; Pak, A.; Rojas, D.P.; Williams, B.M.; Trauer, J.M. Role of modelling in COVID-19 policy development. Paediatr. Respir. Rev. 2020, 35, 57–60. [Google Scholar] [CrossRef]
  89. Atalan, A. Is the lockdown important to prevent the COVID-19 pandemic? Effects on psychology, environment and economy-perspective. Ann. Med. Surg. 2020, 56, 38–42. [Google Scholar] [CrossRef]
  90. Haug, N.; Geyrhofer, L.; Londei, A.; Dervic, E.; Desvars-Larrive, A.; Loreto, V.; Pinior, B.; Thurner, S.; Klimek, P. Ranking the effectiveness of worldwide COVID-19 government interventions. Nat. Hum. Behav. 2020, 4, 1303–1312. [Google Scholar] [CrossRef]
  91. Hsiang, S.; Allen, D.; Annan-Phan, S.; Bell, K.; Bolliger, I.; Chong, T.; Druckenmiller, H.; Huang, L.Y.; Hultgren, A.; Krasovich, E.; et al. The effect of large-scale anti-contagion policies on the COVID-19 pandemic. Nature 2020, 584, 262–267. [Google Scholar] [CrossRef] [PubMed]
  92. Bali, A.S.; He, A.J.; Ramesh, M. Health policy and COVID-19: Path dependency and trajectory. Policy Soc. 2022, 41, 83–95. [Google Scholar] [CrossRef]
  93. Dhami, S.; Thompson, D.; El Akoum, M.; Bates, D.W.; Bertollini, R.; Sheikh, A. Data-enabled responses to pandemics: Policy lessons from COVID-19. Nat. Med. 2022, 28, 2243–2246. [Google Scholar] [CrossRef] [PubMed]
  94. Mustafa, S.; Zhang, Y.; Zibwowa, Z.; Seifeldin, R.; Ako-Egbe, L.; McDarby, G.; Kelley, E.; Saikat, S. COVID-19 Preparedness and Response Plans from 106 countries: A review from a health systems resilience perspective. Health Policy Plan. 2022, 37, 255–268. [Google Scholar] [CrossRef] [PubMed]
  95. Chemaitelly, H.; Ayoub, H.H.; Coyle, P.; Tang, P.; Yassine, H.M.; Al-Khatib, H.A.; Smatti, M.K.; Hasan, M.R.; Al-Kanaani, Z.; Al-Kuwari, E.; et al. Protection of Omicron sub-lineage infection against reinfection with another Omicron sub-lineage. Nat. Commun. 2022, 13, 4675. [Google Scholar] [CrossRef]
  96. Dhawan, M.; Saied, A.A.; Mitra, S.; Alhumaydhi, F.A.; Emran, T.B.; Wilairatana, P. Omicron variant (B.1.1.529) and its sublineages: What do we know so far amid the emergence of recombinant variants of SARS-CoV-2? Biomed. Pharm. 2022, 154, 113522. [Google Scholar] [CrossRef]
  97. Iketani, S.; Liu, L.; Guo, Y.; Liu, L.; Chan, J.F.; Huang, Y.; Wang, M.; Luo, Y.; Yu, J.; Chu, H.; et al. Antibody evasion properties of SARS-CoV-2 Omicron sublineages. Nature 2022, 604, 553–556. [Google Scholar] [CrossRef]
Figure 1. COVID-19 database categorization and main functions. (a) Upon investigating 15 databases, these were categorized into three categories: genome and protein data, epidemiological data, and drug and target data. (b) There are four databases based on GISAID data. The main functions of the databases are clade/variant/lineage, genome browser (sequence), protein structure, visualization, and data analysis. (c) There are 11 databases that have collected their data independently. The main functions are clade/variant/lineage information, genome browser (sequence), protein structure, epidemiological data (cases, vaccine rates, deaths), visualization, data analysis tool, treatment (clinical trials, drugs), literature, and immunity. These main functions vary depending on the purpose of the database.
Figure 1. COVID-19 database categorization and main functions. (a) Upon investigating 15 databases, these were categorized into three categories: genome and protein data, epidemiological data, and drug and target data. (b) There are four databases based on GISAID data. The main functions of the databases are clade/variant/lineage, genome browser (sequence), protein structure, visualization, and data analysis. (c) There are 11 databases that have collected their data independently. The main functions are clade/variant/lineage information, genome browser (sequence), protein structure, epidemiological data (cases, vaccine rates, deaths), visualization, data analysis tool, treatment (clinical trials, drugs), literature, and immunity. These main functions vary depending on the purpose of the database.
Viruses 15 00692 g001
Figure 2. Comprehensive utilization of COVID-19 databases 1. Through an analysis of variants and deaths in certain periods of time, the extraction of important variants was carried out in the following process.
Figure 2. Comprehensive utilization of COVID-19 databases 1. Through an analysis of variants and deaths in certain periods of time, the extraction of important variants was carried out in the following process.
Viruses 15 00692 g002
Figure 3. Comprehensive utilization of COVID-19 databases 2. To analyze the differences between the two lineages (Delta b.1.617.2 and Omicron BA.1) and find proteins that interact with the variants, the following process was carried out.
Figure 3. Comprehensive utilization of COVID-19 databases 2. To analyze the differences between the two lineages (Delta b.1.617.2 and Omicron BA.1) and find proteins that interact with the variants, the following process was carried out.
Viruses 15 00692 g003
Figure 4. Comprehensive utilization of COVID-19 databases 3. To find data on antibodies and drugs that deal with variants, the following was carried out.
Figure 4. Comprehensive utilization of COVID-19 databases 3. To find data on antibodies and drugs that deal with variants, the following was carried out.
Viruses 15 00692 g004
Figure 5. Comprehensive utilization of COVID-19 databases 4. Country policies and responses against SARS-CoV-2 variants were evaluated.
Figure 5. Comprehensive utilization of COVID-19 databases 4. Country policies and responses against SARS-CoV-2 variants were evaluated.
Viruses 15 00692 g005
Table 1. Data volume categorized into genome and protein data, epidemiological data, and drug and target data.
Table 1. Data volume categorized into genome and protein data, epidemiological data, and drug and target data.
Epidemiological data (global)
World Health OrganizationCases: 623,893,894
Deaths: 6,553,936
Vaccinations: 12,814,704,622
18 October 2022
Johns Hopkins Coronavirus
Resource Center
Cases: 627,632,333
Deaths: 6,578,449
Vaccinations: 12,821,432,441
24 October 2022
Our World in DataCases: 627.54 million
Deaths: 6.58 million
Vaccinations: 12.86 billion
23 October 2022
Genome and protein data
GISAID (Global Initiative
on Sharing Avian Influenza Data)
10 million genome sequences of SARS-CoV-2 submitted to EpiCoV April 2022
Audacity
Global Phylogenetic tree comprised of 10,703,377 high quality genomes
26 October 2022
CoVizu High-quality genomes: 7,726,056 24 October 2022
NextstrainLatest global SARS-CoV-2 analysis (GISAID data): 2943 genomes 27 October 2022
Collected
December 2019–October 2022
Latest global SARS-CoV-2 analysis (open data): 3006 genomes 27 October 2022
Collected
Dec 2019–Oct 2022
Phylogeny of SARS-like betacoronaviruses including novel coronavirus SARS-CoV-2: 49 genomes27 October 2022
COVID-19 Data PortalViral sequences: Data types (14,849,714)27 October 2022
Host sequences (human and other hosts): Host sequences (30,694)
Expression: Data type (226)
Proteins (3772)
COVID-19 pathways, interactions, complexes, targets and compounds: Data types (7801)
NIH5,808,129 27 October 2022
Source
DNA (159,494)
RNA (5,642,181)
Type
Exome: 2
Genome: 65,958
Treatment (Clinical trials, drug)
NIHClinicalTrials.gov
(8357 studies)
27 October 2022
Status
Completed: 3013
Study phase
Early phase 1: 61
Phase 1: 671
Phase 2: 1494
Phase 3: 929
Phase 4: 258
Not applicable: 1920
Literature
COVID-19 Data PortalLiterature (805,970)21 October 2022
NIHPubMed: 307,261 results27 October 2022
PMC (PubMed Central): 429,092 results
LitCOVID: 300,174 publications in PubMed, 8000 journals
Table 2. COVID-19 database addresses and associated publications.
Table 2. COVID-19 database addresses and associated publications.
No.WebsiteAddress (Database Utilization)Ref
1GISAIDhttps://gisaid.org/[32]
2Cov-lineages.orghttps://cov-lineages.org/[33]
Pangolin: https://cov-lineages.org/resources/pangolin.html
Scorpio: https://github.com/cov-lineages/scorpio
Pando: http://pando.tools/
Civet: https://cov-lineages.org/resources/civet.html
Polecat: https://github.com/artic-network/polecat
pango.network: https://www.pango.network
3COVID CG https://covidcg.org/[34]
4COVID-19 Data portalhttps://www.covid19dataportal.org/[35]
5Nextstrainhttps://nextstrain.org/[36]
6NCBI SARS-CoV-2 Resourceshttps://www.ncbi.nlm.nih.gov/sars-cov-2/
LitCOVID: https://www.ncbi.nlm.nih.gov/research/coronavirus/[37]
PubMed: https://pubmed.ncbi.nlm.nih.gov/?term=covid-19
PubMed Central: https://www.ncbi.nlm.nih.gov/pmc/about/covid-19/[38]
BLAST: https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastSearch&BLAST_SPEC=Betacoronavirus
PubChem: https://pubchemdocs.ncbi.nlm.nih.gov/covid-19
ClinicalTrials.gov: https://clinicaltrials.gov/ct2/home
7PDBhttps://www.rcsb.org/[39]
8WHOhttps://covid19.who.int/[40]
9COVID-19 Research at UCSChttps://genome.ucsc.edu/covid19.html[41]
Usher: https://github.com/yatisht/usher[42]
UCSC Genome Browser view of SARS-CoV-2 genomic datasets:
https://genome.ucsc.edu/cgi-bin/hgTracks?db=wuhCor1&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=NC_045512v2%3A1%2D29902&hgsid=1506921125_d8K9do0hsuR7zvE950cXSU3hQqYV
UCSC cell browser: https://genome.ucsc.edu/singlecell.html[43]
10Our World in Datahttps://ourworldindata.org/[44]
11John Hopkins university coronavirus resource centerhttps://coronavirus.jhu.edu[45]
12DOCK CoV-2https://covirus.cc/drugs/[46]
13T-CoVhttps://t-cov.hse.ru[47]
14CovInterhttp://covrpii.idrblab.net/[48]
15ESChttps://clingen.igib.res.in/esc/[49]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Seong, D.Y.; Park, J.; Yi, K.; Hong, D. Systematic Guidelines for Effective Utilization of COVID-19 Databases in Genomic, Epidemiologic, and Clinical Research. Viruses 2023, 15, 692. https://doi.org/10.3390/v15030692

AMA Style

Seong DY, Park J, Yi K, Hong D. Systematic Guidelines for Effective Utilization of COVID-19 Databases in Genomic, Epidemiologic, and Clinical Research. Viruses. 2023; 15(3):692. https://doi.org/10.3390/v15030692

Chicago/Turabian Style

Seong, Do Young, Jongkeun Park, Kijong Yi, and Dongwan Hong. 2023. "Systematic Guidelines for Effective Utilization of COVID-19 Databases in Genomic, Epidemiologic, and Clinical Research" Viruses 15, no. 3: 692. https://doi.org/10.3390/v15030692

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop