Supervised Machine Learning Models to Identify Early-Stage Symptoms of SARS-CoV-2

Dritsas, Elias; Trigka, Maria

doi:10.3390/s23010040

Open AccessArticle

Supervised Machine Learning Models to Identify Early-Stage Symptoms of SARS-CoV-2

by

Elias Dritsas

^*

and

Maria Trigka

Department of Computer Engineering and Informatics, University of Patras, 26504 Patras, Greece

^*

Author to whom correspondence should be addressed.

Sensors 2023, 23(1), 40; https://doi.org/10.3390/s23010040

Submission received: 16 November 2022 / Revised: 7 December 2022 / Accepted: 16 December 2022 / Published: 21 December 2022

(This article belongs to the Special Issue Multimodal Data Fusion and Machine-Learning for Promotion of Health/Well-Being)

Download

Browse Figures

Versions Notes

Abstract

:

The coronavirus disease (COVID-19) pandemic was caused by the SARS-CoV-2 virus and began in December 2019. The virus was first reported in the Wuhan region of China. It is a new strain of coronavirus that until then had not been isolated in humans. In severe cases, pneumonia, acute respiratory distress syndrome, multiple organ failure or even death may occur. Now, the existence of vaccines, antiviral drugs and the appropriate treatment are allies in the confrontation of the disease. In the present research work, we utilized supervised Machine Learning (ML) models to determine early-stage symptoms of SARS-CoV-2 occurrence. For this purpose, we experimented with several ML models, and the results showed that the ensemble model, namely Stacking, outperformed the others, achieving an Accuracy, Precision, Recall and F-Measure equal to 90.9% and an Area Under Curve (AUC) of 96.4%.

Keywords:

healthcare; SARS-CoV-2; machine learning; prediction; data analysis

1. Introduction

Coronaviruses are a group of viruses that often cause generally mild respiratory infections in humans and animals. Most people are infected with coronaviruses at least once in their lives, having mild to moderate symptoms of the common cold. Rarely does a coronavirus mutate and spread from animals to humans, as has happened with SARS-CoV-2 today but also in the past with the SARS (2003) and MERS (2012) viruses. When a new virus infects humans, no one is immune, and everyone can become infected. This wide spread of the virus is also the reason why it has caused global concern, and the World Health Organization (WHO) declared the COVID-19 pandemic on March 2020 [1,2,3].

Respiratory droplets and aerosols produced by sneezing, coughing and direct or close contact with other people (usually less than two meters) are some common conditions under which the new strain of coronavirus can be spread from person to person causing them to become infected when they touch their nose, mouth or eyes. The virus’s survival is dependent on the material of the surface. In particular, it can survive for several hours on copper or cardboard and/or up to a few days on plastic or stainless steel. The average time between exposure to the virus and the onset of symptoms, known as the incubation period, for COVID-19, is currently estimated to be 5 to 6 days, or generally 1 to 14 days. According to an analysis made by the WHO, each patient may infect 1.4–2.5 other people (compared to the seasonal flu, where each patient infects an average of 1.3 other people) [4,5,6].

COVID-19 disease varies greatly in severity. There may be a complete absence of symptoms (asymptomatic patients), or symptoms such as fever, cough, sore throat, change or loss of taste and/or smell, general weakness, diarrhea, fatigue, and muscle pain may occur. In severe cases, symptoms may include severe lung infection, generalized infection and inflammatory reaction and require specialized medical care and support [7,8,9].

Moreover, people who manifest severe symptoms affecting the airways may need the support of mechanical ventilation, exposing them to infections other than COVID-19, such as pneumonia. People suffering from COVID-19 are also at higher risk of stroke or heart attack. In addition, some patients may show symptoms related to the nervous system, such as transient changes in personality or alertness levels [10,11,12].

In the general population, there are specific groups of people who are more prone to be infected and develop the disease. Some criteria of this discrimination are age (especially people over 60 years old), pregnancy and underlying diseases such as obesity, hypertension, diabetes, cardiovascular disease, long-term diseases affecting the lungs and airways, and diseases related to a burdened immune system. Symptoms in children tend to be milder than in adults. However, children remain carriers of the virus, and some of them become seriously ill [13,14].

Vaccination is the most effective way to prevent the severe complications of COVID-19 combined with measures such as wearing a mask, maintaining physical distance, good indoor ventilation and regular hand washing (soap or alcohol-based), which help to avoid the transfer of the virus from the hands to the body through the eyes, nose or mouth. Vaccinated people are less likely to manifest severe symptoms of the disease or to be hospitalized. That is why public health officials are urging all eligible people to get fully immunized against COVID-19 [15,16,17].

Drugs are now becoming available to treat COVID-19 that directly target the virus. They are mainly used to prevent severe manifestations of the disease in high-risk groups. The primary treatment for most patients with severe disease remains supportive care, such as the use of oxygen therapy and the management of fluid levels. In addition, the global research community has emphasized the development and mass production of effective drugs and vaccines [18,19,20].

Machine Learning is a field of Artificial Intelligence (AI) that deals with the study and construction of computational algorithms that can automatically be improved through experience. Machine Learning algorithms create one model based on sample data, also known as training data, in order to make predictions or make decisions without being explicitly programmed on how to do it. Such algorithms have a field of application in various sectors. Machine Learning now has a significant contribution in the field of medicine for the prediction of various diseases and the early diagnosis of several chronic conditions such as diabetes (as classification [21,22] or times-series task for continuous glucose values forecasting [23,24]), high blood pressure (hypertension) [25,26], cholesterol [27,28], chronic obstructive pulmonary disease (COPD) [29], stroke [30], cardiovascular diseases (CVDs) [31], acute liver failure (ALF) [32], acute lymphoblastic leukemia [33], sleep disorders [34,35], hepatitis C [36], lung cancer [37], chronic kidney disease (CKD) [38], etc.

In this scientific article, following a supervised learning procedure, an ML-based framework will be described and analyzed in order to identify early-stage symptoms of SARS-CoV-2 occurrence. The key aspects of the adopted methodology are the following:

Firstly, a data balancing approach is applied to address the non-uniform distribution of the samples in two classes, and thus, design effective classifiers. For this purpose, we exploited the Synthetic Minority Oversampling Technique (SMOTE) [39]. This method randomly selects instances of the whole data and creates new samples based on the K-Nearest Neighbor.
Secondly, a features analysis is made in order to rank their importance by selecting three different methods. In addition, we measure nominal features’ frequency of occurrence in order to identify their relatedness with the SARS-CoV-2 class.
Thirdly, the performance of various ML models’ performance is evaluated and compared in terms of Accuracy, Precision, Recall, F-Measure, and AUC. All metrics show that the stacking ensemble method prevailed over the other models; thus it is the main proposition of this analysis.
Finally, a comparison with a published work on the same dataset and features that we relied on is performed, showing the superiority of our ML models in terms of Accuracy and AUC.

The following sections of this research work are formulated as follows. Section 2 describes the dataset we relied on and analyzes the adopted process. Furthermore, in Section 3, we present and discuss the research outcomes. Section 4 presents some works that exploit the ML models and techniques in order to identify early-stage symptoms of SARS-CoV-2 and COVID-19 occurrence. Finally, Section 5 summarizes the submitted article and sets future directions.

2. Materials and Methods

2.1. Dataset Description

In order to evaluate the ML models, we relied on a publicly available dataset [40]. The specific dataset includes 6512 participants, of which the number of men is 3367 (51.7%) and women 3145 (48.3%). The target class is SARS-CoV-2, which indicates if the participant is positive to the SARS-CoV-2 virus or not. The number of participants who were diagnosed positive for SARS-CoV-2 virus is 1572 (24.1%). The description of the dataset’s characteristics is detailed in Table 1.

2.2. Data Preprocessing

On the dataset [40] we experimented with, we applied SMOTE, an oversampling technique for increasing the number of cases in the dataset in a balanced way, and the synthetic samples [52] are generated for the minority class. SMOTE is based on the K-Nearest Neighbors model with K equal to 5 [53]. The instances in the SARS-CoV-2 class are oversampled such that the subjects in the two classes are uniformly distributed. After the implementation of SMOTE (see Algorithm 1), the number of participants is 9880. Now, the dataset is balanced, and the target class SARS-CoV-2 includes 4940 SARS-CoV-2 Positive and 4940 Non-SARS-CoV-2 Positive instances.

Algorithm 1: SMOTE

Input: M (number of samples in the minority class), N (% ratio of synthetic minority samples for class balancing), K (number of nearest neighbors),

s_{s y n}

synthetic instance;

Choose randomly a subset

S

of the minority class data of size

S = \frac{N}{100} M

(synthetic samples in the minority class) such that the class labels are uniformly distributed;

for all

s_{i} \in S

do

(1) Find the K nearest neighbors;

(2) Randomly select one of KNNs, called

{\hat{s}}_{i}

;

(3) Calculate the distance

d_{i, k} = {\hat{s}}_{i} - s_{i}

between the randomly selected NN

{\hat{s}}_{i}

and the instance

s_{i}

;

(4) The new synthetic instance is generated as

s_{s y n} = s_{i} + δ d_{i, k}

(where

δ = r a n d (0, 1)

is a random number between 0 and 1);

end for

Repeat steps number 2–4 until the desired proportion of minority class is met.

2.3. Features Analysis

The features analysis will move into two axes, including participants’ prevalence per feature in terms of the target class and importance evaluation.

Table 2 shows the percentage frequency of occurrence of participants’ in all features in terms of the two states of the class label. In addition, the mean age of participants in the balanced data is 44.39, and the standard deviation is 14.46 years old. Their prevalence, in each age group per class label (No, Yes), is shown in Figure 1. As for gender, men suffer from SARS-CoV-2 10% more than women. In addition, fever, cough and lung infection are the most frequently occurring symptoms in the class ‘Yes’, although these ones are not solely related to SARS-CoV-2.

For their importance ranking, we employed two methods, Information Gain and Random Forest. The respective outcomes are captured in Table 3.

The InfoGain [54] estimates the worth of an attribute V by measuring the information gain with respect to the class variable C as

I n f o G a i n (C, V) = H (C) - H (C | V)

. The first term defines the entropy of the class variable C which can be determined as

H (C) = - \sum_{c \in C} p (c) l o g_{2} (p (c))

, where

p (c)

is the probability of

c \in C = {0, 1}

, respectively. In addition, the second term

H (C | V) = - \sum_{v \in V} p (v) \sum_{c \in C} p (c | v) l o g 2 (p (c | v))

is the conditional entropy of the class variable C given an attribute V, where

p (v)

is the probability of value v and

p (c | v)

is the conditional probability of class value c given v.

In the Random Forest method, the purity of the leaves captures the feature importance. Feature importance is averaged among all the trees and normalized such that the sum of the importance scores is equal to 1 [55]. Their importance increases with the increase in leaf purity.

Observing the ranking scores in Table 3, we see that both methods identify cough as the feature with the highest importance, while fever and lung_infection follow reverse ranking order comparing the two methods. Additionally, pneumonia, runny_nose and travel_history are the next most important features whose order differs among the InfoGain and Random Forest. In addition, diarrhea and muscle_soreness have been listed last in order with a rank close to zero. For the models’ training and testing, all features were considered.

2.4. Machine Learning Models

In this submission, we experimented with various ML models to uncover which one outperforms the rest by evaluating their prediction performance. Specifically, we focused on Naive Bayes (NB) [56] and Logistic Regression (LR) [57] models, which are probabilistic classifiers. Furthermore, we used the well-known kernel-based (linear, non-linear) classifier Support Vector Machine (SVM) [58] and the Sequential Minimal Optimization (SMO) [59] that quickly solves the SVM Quadratic Programming problem. In addition, Stochastic Gradient Descent (SGD) [60] learning of a linear classifier under SVM convex loss function was applied.

Moreover, we used Decision-Tree-based models such as Hoeffding Tree (VFDT) [61], J48 [62], Random Tree (RT) [63], XGBoost [64], and Gradient Boosting Machine (GBM) [65]. From Ensemble ML algorithms [66], Bagging [67], Random Forest (RF) [68], Rotation Forest (RotF) [69], AdaBoostM1 [70], Voting [71], and Stacking [72] were exploited. Finally, a simple Artificial Neural Network (ANN) [73], the Multi-Layer Perceptron (MLP) [74] and K-Nearest Neighbors (kNN) [53], a distance-based classifier, were evaluated.

2.5. Evaluation Metrics

To evaluate the performance of the ML models, we utilized the most common metrics including Accuracy, Recall, Precision, F-Measure, and AUC [75]. The definition of these metrics is based on the Confusion Matrix, which consists of the elements True Positive (TP), True Negative (TN), False Positive (FP) and False Negative (FN). More specifically, they are determined as follows:

Accuracy: It is the percentage of correct predictions for the test data.

$\begin{matrix} Accuracy = \frac{TN + TP}{TN + TP + FN + FP} \end{matrix}$

(1)
Recall: It corresponds to the proportion of participants diagnosed with SARS-CoV-2 and correctly considered positive relative to all positive participants.

$\begin{matrix} Recall = \frac{TP}{TP + FN} \end{matrix}$

(2)
Precision: It indicates how many of those who were positive to SARS-CoV-2 belong to this class.

$\begin{matrix} Precision = \frac{TP}{TP + FP} \end{matrix}$

(3)
F-Measure: It is the harmonic mean of the Precision and Recall and sums up the predictive performance of a model.

$\begin{matrix} F - Measure = 2 \frac{Precision \cdot Recall}{Precision + Recall} \end{matrix}$

(4)
The Area Under the Curve (AUC) is the measure of the ability of a classifier to distinguish between classes. It is a metric that varies in [0, 1].

3. Results

3.1. Experiments Setup

In order to evaluate the classification models, we worked on the Weka tool that offers an Environment for Knowledge Analysis [76]. Weka is an open-access software that comprises a set of ML algorithms for various data mining tasks. It contains tools for data preparation, classification, regression, clustering, association rules mining, and visualization.

The computer system we experimented with has the following specifications: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80 GHz, RAM 16 GB, Windows 11 Home, 64-bit OS and x64 processor. For the experiment results, 10-fold cross-validation was applied to measure the effectiveness of the models on the balanced dataset of 9880 cases after SMOTE. In Table 4, we depict the optimal parameters’ settings of the ML models acquired such that the Recall and AUC metrics are as high as possible. The outcomes that will be presented in the following were derived under such an assumption. In addition, the SMO, GBM, AdaBoostM1 and RotF models had as base classifiers the RF. Concerning the ensemble methods, the Stacking model had the RF and J48 models as base classifiers and as meta classifier the LR. The Bagging model had as base classifier the RF, and finally, the Voting model had as base classifiers the RF and J48 models.

3.2. Performance Evaluation

The performance assessment of the classifiers we experimented with was performed after SMOTE and 10-fold cross-validation. Plenty of well-known ML models including NB, SVM, LR, ANN, KNN, SGD, SMO, VFDT, J48, RF, RT, XGBoost, GBM, RotF, AdaBoostM1, Stacking, Bagging, and Voting were exploited and compared to identify early-stage symptoms of SARS-CoV-2 occurrence. The ML models’ assessment was accomplished with the aid of Accuracy, Precision, Recall, F-Measure and AUC metrics.

In Table 5, we provide a performance evaluation of ML models after SMOTE with 10-fold cross-validation. The lowest performance of the models we relied on is presented by the SVM and SMO with an Accuracy, Recall, F-Measure and AUC equal to 88.3%, and Precision of 88.5%. The KNN, RF, RT, XGBoost, GBM, RotF, AdaBoostM1, Stacking, Bagging, and Voting models show similar accuracy rates with rates greater than 90.4%. Finally, the Stacking model outperforms the others, achieving an Accuracy, Precision, Recall and F-Measure equal to 90.9% and an AUC of 96.4%, which constitutes our main proposition for this submission.

In terms of the AUC metric, our proposed models achieved percentages greater than 91.1%, except for SVM and SMO, which obtained a percentage of 88.3%. As we can see from Figure 2, the performance of RotF, Stacking, Bagging, and Voting models is almost identical, which is also confirmed and numerically.

Next, we isolate study [77] where the authors experimented with the same dataset [40] and features we relied on. Based on this work, in Table 6, we present its ML models’ performance in terms of Accuracy and AUC from which it is shown that our proposed models outperformed (more or less) in both metrics. As a final note, we see that the proposed models are characterized by high separation ability (according to AUC) and high prediction accuracy of an unclassified instance.

4. Discussion

In this section, we will carry out a brief discussion of works that have fields of application AI/ML techniques and models to identify early-stage symptoms of SARS-CoV-2 infected patients and prediction of the disease occurrence. AI tools are a solution for patient screening where PCR-based diagnostic tools [78] are limited.

First, in [79], the authors used the blood profile of the potential patient to train Logistic Regression, Glint, Random Forest and Artificial Neural Network for patients in regular ward testing and not admitted to hospital (community) testing for SARS-CoV-2 positive. Sensitivity, Specificity, Accuracy and AUC were utilized to validate the models’ performance. Second, [80] demographic and several routine laboratory measurements were used to develop promising ML models (such as Logistic Regression, Decision Trees, Random Forest and Gradient Boosting Decision Tree) that can provide an accurate prediction of SARS-CoV-2 infection status.

In addition, in [81], the authors analyze from various perspectives (such as prevention of viral spread, care management, vaccines, etc.) the contribution of machine learning approaches to SARS-CoV-2. Diagnosis and patient screening are the most commonly occurred applications of AI/ML to SARS-CoV-2 based on medical images such as computed tomography, X-ray and other clinical measurements by applying Convolutional Neural Networks, Support Vector Machine, Random Forest, and MultiLayer Perceptron. In [82], the XGBoost model was applied with an accuracy higher than 90% in order to investigate the probability of mortality of an individual SARS-CoV-2 patient.

The researchers in [83] suggested the combination of antibody responses to multiple antigens to identify individuals with previous SARS-CoV-2 infection using ML classifiers trained with multiplex data. The Random Forests algorithm was selected due to its superiority over other classifiers such as Logistic Regression.

Furthermore, the research study in [84] employed ML for a different purpose against COVID-19. In particular, they combined in silico methods such as virtual drug screening, molecular docking and supervised machine learning algorithms to identify novel drug candidates.

Moreover, the authors in [85] developed a web platform to estimate anti-SARS-CoV-2 activities and identify active molecules for COVID-19 treatment with the support of machine learning. Several prediction models were developed for eleven models of viral entry, replication, live and in vitro infectivity, and human cell toxicity, employing three categories of features (chemical fingerprints, physicochemical and topological pharmacophore descriptors) and 22 distinct ML classifiers.

In [86], ML models were trained and validated using basic blood test measurements which were compared to reference RT-PCR testing to predict COVID-19 infection status. In addition, the authors explored the improvement provided in the basic clinical model by the use of chest radiographs. The evaluation was made via the employment of AUC, Sensitivity, and Specificity. The categorical gradient boosting model was trained to classify whether the patient has COVID-19 and other viral pneumonia, bacterial pneumonia or non-pneumonia.

Contrary to the abovementioned studies, in this work, we considered non-biochemical data acquired by a non-invasive process. Actually, it is a dataset that captures the most relevant symptoms of SARS-CoV-2. From this point of view, clinical features are considered to train and test the ML models. The authors in [77], exploiting the same dataset [40] that this study considered, applied XGBoost, Gradient Boosting Machine, Support Vector Machine, Random Forest and Decision Tree in order to identify early-stage symptoms of SARS-CoV-2 infected patients. Their experimental results showed that the XGBoost algorithm outperforms the other ones in terms of Accuracy and AUC. However, in our study, more efficient classifiers were selected with an emphasis on ensemble ones. We assumed 10-fold cross-validation on a balanced dataset and not percentage data splitting (as the comparing study considered) and emphasized ensemble models which the previous work did not apply along with a graphical illustration of AUC curves. Finally, comparing the performance of these models with the ones applied here, our trained and tested classifiers prevailed in both metrics.

5. Conclusions

The coronavirus disease (COVID-19) pandemic was caused by the SARS-CoV-2 virus. The virus was first reported in the Wuhan region of China at the end of December 2019. In severe cases, pneumonia, acute respiratory distress syndrome, multiple organ failure or even death may occur from the virus. Machine Learning now has a significant contribution in the field of medicine for the prediction of various diseases and the early diagnosis of several chronic conditions.

In the present article, a supervised ML-based framework is presented and analyzed in order to identify early-stage symptoms of SARS-CoV-2 occurrence. For this purpose, a variety of well-known ML models and techniques were exploited to confirm the most effective model in terms of Accuracy, Precision, Recall, F-Measure and AUC. The experimental results showed that the Stacking model outperformed the others, achieving an Accuracy, Precision, Recall and F-Measure equal to 90.9% and an AUC of 96.4% after SMOTE with 10-fold cross-validation. Our proposed models prevailed in relation to the compared models of research work [77].

At this point, we have to mention some limitations of this submission. Firstly, we relied on a publicly available dataset from Kaggle [40] to evaluate the performance of the models from an ML technical point of view. In a different dataset from Electronic Health Records (EHR), richer features could help to draw more useful medical conclusions. However, it should be noted that medical data is sensitive, and access is restricted or hardly gained. In addition, since the size of the dataset was not extensive enough, in future studies, using much larger datasets could further improve the predictive accuracy.

In a future direction of the current study, we aim to design and evaluate each model by dividing the initial data into subsets based on the age groups or the gender type of the subjects. Such an approach will help to analyze the relationship between COVID-19 and patient age and gender. In addition, since the size of the dataset was not extensive enough, in future study, our purpose is to experiment with larger datasets that could further improve the predictive accuracy. Finally, we plan to expand the ML framework by applying Deep Learning models such as Convolution Neural Networks and Long Short-Term Memory Networks and comparing their predictive ability in terms of the aforementioned metrics.

Author Contributions

E.D. and M.T. conceived of the idea, designed and performed the experiments, analyzed the results, drafted the initial manuscript and revised the final manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

References

Li, X.; Xu, S.; Yu, M.; Wang, K.; Tao, Y.; Zhou, Y.; Shi, J.; Zhou, M.; Wu, B.; Yang, Z.; et al. Risk factors for severity and mortality in adult COVID-19 inpatients in Wuhan. J. Allergy Clin. Immunol. 2020, 146, 110–118. [Google Scholar] [CrossRef] [PubMed]
Guarner, J. Three emerging coronaviruses in two decades: The story of SARS, MERS, and now COVID-19. Am. J. Clin. Pathol. 2020, 153, 420–421. [Google Scholar] [CrossRef] [PubMed]
WHO Covid. Available online: https://covid19.who.int/ (accessed on 4 November 2022).
Shereen, M.A.; Khan, S.; Kazmi, A.; Bashir, N.; Siddique, R. COVID-19 infection: Emergence, transmission, and characteristics of human coronaviruses. J. Adv. Res. 2020, 24, 91–98. [Google Scholar] [CrossRef] [PubMed]
Meyerowitz-Katz, G.; Merone, L. A systematic review and meta-analysis of published research data on COVID-19 infection fatality rates. Int. J. Infect. Dis. 2020, 101, 138–148. [Google Scholar] [CrossRef] [PubMed]
Jones, N. How COVID-19 is changing the cold and flu season. Nature 2020, 588, 388–390. [Google Scholar] [CrossRef]
Nasserie, T.; Hittle, M.; Goodman, S.N. Assessment of the frequency and variety of persistent symptoms among patients with COVID-19: A systematic review. JAMA Netw. Open 2021, 4, e2111417. [Google Scholar] [CrossRef]
Struyf, T.; Deeks, J.J.; Dinnes, J.; Takwoingi, Y.; Davenport, C.; Leeflang, M.M.; Spijker, R.; Hooft, L.; Emperador, D.; Domen, J.; et al. Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19. Cochrane Database Syst. Rev. 2022, 5, CD013665. [Google Scholar]
Nehme, M.; Braillard, O.; Alcoba, G.; Aebischer Perone, S.; Courvoisier, D.; Chappuis, F.; Guessous, I.; TEAM†, C. COVID-19 symptoms: Longitudinal evolution and persistence in outpatient settings. Ann. Intern. Med. 2021, 174, 723–725. [Google Scholar] [CrossRef]
Chang, R.; Elhusseiny, K.M.; Yeh, Y.C.; Sun, W.Z. COVID-19 ICU and mechanical ventilation patient characteristics and outcomes—A systematic review and meta-analysis. PLoS ONE 2021, 16, e0246318. [Google Scholar] [CrossRef]
Ñamendys-Silva, S.A. Respiratory support for patients with COVID-19 infection. Lancet Respir. Med. 2020, 8, e18. [Google Scholar] [CrossRef]
Machado-Curbelo, C. Severe COVID-19 cases: Is respiratory distress partially explained by central nervous system involvement? Medicc Rev. 2022, 22, 38–39. [Google Scholar]
CDC COVID-19 Response Team. Severe outcomes among patients with coronavirus disease 2019 (COVID-19)—United States, February 12–March 16, 2020. Morb. Mortal. Wkly. Rep. 2020, 69, 343. [Google Scholar]
Emami, A.; Javanmardi, F.; Pirbonyeh, N.; Akbari, A. Prevalence of underlying diseases in hospitalized patients with COVID-19: A systematic review and meta-analysis. Arch. Acad. Emerg. Med. 2020, 8, e35. [Google Scholar]
DeRoo, S.S.; Pudalov, N.J.; Fu, L.Y. Planning for a COVID-19 vaccination program. Jama 2020, 323, 2458–2459. [Google Scholar] [CrossRef]
Lotfi, M.; Hamblin, M.R.; Rezaei, N. COVID-19: Transmission, prevention, and potential therapeutic opportunities. Clin. Chim. Acta 2020, 508, 254–266. [Google Scholar] [CrossRef]
Cirrincione, L.; Plescia, F.; Ledda, C.; Rapisarda, V.; Martorana, D.; Moldovan, R.E.; Theodoridou, K.; Cannizzaro, E. COVID-19 pandemic: Prevention and protection measures to be adopted at the workplace. Sustainability 2020, 12, 3603. [Google Scholar] [CrossRef]
Agarwal, A.; Rochwerg, B.; Lamontagne, F.; Siemieniuk, R.A.; Agoritsas, T.; Askie, L.; Lytvyn, L.; Leo, Y.S.; Macdonald, H.; Zeng, L.; et al. A living WHO guideline on drugs for COVID-19. bmj 2020, 370, m3379. [Google Scholar] [CrossRef]
Stasi, C.; Fallani, S.; Voller, F.; Silvestri, C. Treatment for COVID-19: An overview. Eur. J. Pharmacol. 2020, 889, 173644. [Google Scholar] [CrossRef]
De, P.; Chakraborty, I.; Karna, B.; Mazumder, N. Brief review on repurposed drugs and vaccines for possible treatment of COVID-19. Eur. J. Pharmacol. 2021, 898, 173977. [Google Scholar]
Fazakis, N.; Kocsis, O.; Dritsas, E.; Alexiou, S.; Fakotakis, N.; Moustakas, K. Machine learning tools for long-term type 2 diabetes risk prediction. IEEE Access 2021, 9, 103737–103757. [Google Scholar] [CrossRef]
Dritsas, E.; Trigka, M. Data-Driven Machine-Learning Methods for Diabetes Risk Prediction. Sensors 2022, 22, 5304. [Google Scholar] [CrossRef] [PubMed]
Alexiou, S.; Dritsas, E.; Kocsis, O.; Moustakas, K.; Fakotakis, N. An approach for Personalized Continuous Glucose Prediction with Regression Trees. In Proceedings of the 2021 6th South-East Europe Design Automation, Computer Engineering, Computer Networks and Social Media Conference (SEEDA-CECNSM), Preveza, Greece, 24–26 September 2021; pp. 1–6. [Google Scholar]
Dritsas, E.; Alexiou, S.; Konstantoulas, I.; Moustakas, K. Short-term Glucose Prediction based on Oral Glucose Tolerance Test Values. In Proceedings of the HEALTHINF, Online Streaming, 9–11 February 2022; pp. 249–255. [Google Scholar]
Dritsas, E.; Fazakis, N.; Kocsis, O.; Fakotakis, N.; Moustakas, K. Long-Term Hypertension Risk Prediction with ML Techniques in ELSA Database. In Proceedings of the International Conference on Learning and Intelligent Optimization, Online, 20–25 June 2021; pp. 113–120. [Google Scholar]
Dritsas, E.; Alexiou, S.; Moustakas, K. Efficient Data-driven Machine Learning Models for Hypertension Risk Prediction. In Proceedings of the 2022 International Conference on INnovations in Intelligent SysTems and Applications (INISTA), Biarritz, France, 8–12 August 2022; pp. 1–6. [Google Scholar]
Fazakis, N.; Dritsas, E.; Kocsis, O.; Fakotakis, N.; Moustakas, K. Long-term Cholesterol Risk Prediction using Machine Learning Techniques in ELSA Database. In Proceedings of the IJCCI, Online Streaming, 25–27 October 2021; pp. 445–450. [Google Scholar]
Dritsas, E.; Trigka, M. Machine learning methods for hypercholesterolemia long-term risk prediction. Sensors 2022, 22, 5365. [Google Scholar] [CrossRef] [PubMed]
Dritsas, E.; Alexiou, S.; Moustakas, K. COPD Severity Prediction in Elderly with ML Techniques. In Proceedings of the 15th International Conference on PErvasive Technologies Related to Assistive Environments, Corfu, Greece, 29 June–1 July 2022; pp. 185–189. [Google Scholar]
Dritsas, E.; Trigka, M. Stroke risk prediction with machine learning techniques. Sensors 2022, 22, 4670. [Google Scholar] [CrossRef] [PubMed]
Dritsas, E.; Alexiou, S.; Moustakas, K. Cardiovascular Disease Risk Prediction with Supervised Machine Learning Techniques. In Proceedings of the ICT4AWE, Online Streaming, 23–25 April 2022; pp. 315–321. [Google Scholar]
Musunuri, B.; Shetty, S.; Shetty, D.K.; Vanahalli, M.K.; Pradhan, A.; Naik, N.; Paul, R. Acute-on-chronic liver failure mortality prediction using an artificial neural network. Eng. Sci. 2021, 15, 187–196. [Google Scholar] [CrossRef]
Das, P.K.; Pradhan, A.; Meher, S. Detection of acute lymphoblastic leukemia using machine learning techniques. In Machine Learning, Deep Learning and Computational Intelligence for Wireless Communication; Springer: Berlin/Heidelberg, Germany, 2021; pp. 425–437. [Google Scholar]
Konstantoulas, I.; Kocsis, O.; Dritsas, E.; Fakotakis, N.; Moustakas, K. Sleep Quality Monitoring with Human Assisted Corrections. In Proceedings of the IJCCI, Online Streaming, 25–27 October 2021; pp. 435–444. [Google Scholar]
Konstantoulas, I.; Dritsas, E.; Moustakas, K. Sleep Quality Evaluation in Rich Information Data. In Proceedings of the 2022 13th International Conference on Information, Intelligence, Systems & Applications (IISA), Corfu, Greece, 18–20 July 2022; pp. 1–4. [Google Scholar]
Kashif, A.A.; Bakhtawar, B.; Akhtar, A.; Akhtar, S.; Aziz, N.; Javeid, M.S. Treatment response prediction in hepatitis C patients using machine learning techniques. Int. J. Technol. Innov. Manag. 2021, 1, 79–89. [Google Scholar] [CrossRef]
Dritsas, E.; Trigka, M. Lung Cancer Risk Prediction with Machine Learning Models. Big Data Cogn. Comput. 2022, 6, 139. [Google Scholar] [CrossRef]
Dritsas, E.; Trigka, M. Machine learning techniques for chronic kidney disease risk prediction. Big Data Cogn. Comput. 2022, 6, 98. [Google Scholar] [CrossRef]
Ishaq, A.; Sadiq, S.; Umer, M.; Ullah, S.; Mirjalili, S.; Rupapara, V.; Nappi, M. Improving the prediction of heart failure patients’ survival using SMOTE and effective data mining techniques. IEEE Access 2021, 9, 39707–39716. [Google Scholar] [CrossRef]
SARS-CoV-2 Prediction Dataset. Available online: https://www.kaggle.com/datasets/martuza/early-stage-symptoms-of-covid19-patients (accessed on 4 November 2022).
Mukherjee, S.; Pahan, K. Is COVID-19 gender-sensitive? J. Neuroimmune Pharmacol. 2021, 16, 38–47. [Google Scholar] [CrossRef]
Penna, C.; Mercurio, V.; Tocchetti, C.G.; Pagliaro, P. Sex-related differences in COVID-19 lethality. Br. J. Pharmacol. 2020, 177, 4375–4385. [Google Scholar] [CrossRef]
Gul, M.H.; Htun, Z.M.; Inayat, A. Role of fever and ambient temperature in COVID-19. Expert Rev. Respir. Med. 2021, 15, 171–173. [Google Scholar] [CrossRef]
Topol, E.J. Is my cough COVID-19? Lancet 2020, 396, 1874. [Google Scholar] [CrossRef]
Iacobucci, G. COVID-19: Runny nose, headache, and fatigue are commonest symptoms of omicron, early data show. BMJ 2021, 375, n3103. [Google Scholar] [CrossRef]
Sun, P.; Qie, S.; Liu, Z.; Ren, J.; Li, K.; Xi, J. Clinical characteristics of hospitalized patients with SARS-CoV-2 infection: A single arm meta-analysis. J. Med. Virol. 2020, 92, 612–617. [Google Scholar] [CrossRef] [Green Version]
Gattinoni, L.; Gattarello, S.; Steinberg, I.; Busana, M.; Palermo, P.; Lazzari, S.; Romitti, F.; Quintel, M.; Meissner, K.; Marini, J.J.; et al. COVID-19 pneumonia: Pathophysiology and management. Eur. Respir. Rev. 2021, 30, 210138. [Google Scholar] [CrossRef]
Shang, H.; Bai, T.; Chen, Y.; Huang, C.; Zhang, S.; Yang, P.; Zhang, L.; Hou, X. Outcomes and implications of diarrhea in patients with SARS-CoV-2 infection. Scand. J. Gastroenterol. 2020, 55, 1049–1056. [Google Scholar] [CrossRef]
Fan, D.P.; Zhou, T.; Ji, G.P.; Zhou, Y.; Chen, G.; Fu, H.; Shen, J.; Shao, L. Inf-net: Automatic covid-19 lung infection segmentation from ct images. IEEE Trans. Med. Imaging 2020, 39, 2626–2637. [Google Scholar] [CrossRef]
Lemey, P.; Hong, S.L.; Hill, V.; Baele, G.; Poletto, C.; Colizza, V.; O’toole, Á.; McCrone, J.T.; Andersen, K.G.; Worobey, M.; et al. Accommodating individual travel history and unsampled diversity in Bayesian phylogeographic inference of SARS-CoV-2. Nat. Commun. 2020, 11, 5110. [Google Scholar] [CrossRef]
Wu, S.; Wang, Y.; Jin, X.; Tian, J.; Liu, J.; Mao, Y. Environmental contamination by SARS-CoV-2 in a designated hospital for coronavirus disease 2019. Am. J. Infect. Control 2020, 48, 910–914. [Google Scholar] [CrossRef]
Dritsas, E.; Fazakis, N.; Kocsis, O.; Moustakas, K.; Fakotakis, N. Optimal Team Pairing of Elder Office Employees with Machine Learning on Synthetic Data. In Proceedings of the 2021 12th International Conference on Information, Intelligence, Systems & Applications (IISA), Chania Crete, Greece, 12–14 July 2021; pp. 1–4. [Google Scholar]
Wazery, Y.M.; Saber, E.; Houssein, E.H.; Ali, A.A.; Amer, E. An efficient slime mould algorithm combined with k-nearest neighbor for medical classification tasks. IEEE Access 2021, 9, 113666–113682. [Google Scholar] [CrossRef]
Dong, R.H.; Yan, H.H.; Zhang, Q.Y. An Intrusion Detection Model for Wireless Sensor Network Based on Information Gain Ratio and Bagging Algorithm. Int. J. Netw. Secur. 2020, 22, 218–230. [Google Scholar]
Li, X.; Chen, W.; Zhang, Q.; Wu, L. Building auto-encoder intrusion detection system based on random forest feature selection. Comput. Secur. 2020, 95, 101851. [Google Scholar] [CrossRef]
Ampomah, E.K.; Nyame, G.; Qin, Z.; Addo, P.C.; Gyamfi, E.O.; Gyan, M. Stock market prediction with gaussian naïve bayes machine learning algorithm. Informatica 2021, 45, 243–256. [Google Scholar] [CrossRef]
Sievering, A.W.; Wohlmuth, P.; Geßler, N.; Gunawardene, M.A.; Herrlinger, K.; Bein, B.; Arnold, D.; Bergmann, M.; Nowak, L.; Gloeckner, C.; et al. Comparison of machine learning methods with logistic regression analysis in creating predictive models for risk of critical in-hospital events in COVID-19 patients on hospital admission. BMC Med. Inform. Decis. Mak. 2022, 22, 309. [Google Scholar] [CrossRef] [PubMed]
Cervantes, J.; Garcia-Lamont, F.; Rodríguez-Mazahua, L.; Lopez, A. A comprehensive survey on support vector machine classification: Applications, challenges and trends. Neurocomputing 2020, 408, 189–215. [Google Scholar] [CrossRef]
Nakanishi, K.M.; Fujii, K.; Todo, S. Sequential minimal optimization for quantum-classical hybrid algorithms. Phys. Rev. Res. 2020, 2, 043158. [Google Scholar] [CrossRef]
Amiri, M.M.; Gündüz, D. Machine learning at the wireless edge: Distributed stochastic gradient descent over-the-air. IEEE Trans. Signal Process. 2020, 68, 2155–2169. [Google Scholar] [CrossRef] [Green Version]
Ducange, P.; Marcelloni, F.; Pecori, R. Fuzzy Hoeffding Decision Tree for Data Stream Classification. Int. J. Comput. Intell. Syst. 2021, 14, 946–964. [Google Scholar] [CrossRef]
Posonia, A.M.; Vigneshwari, S.; Rani, D.J. Machine Learning based Diabetes Prediction using Decision Tree J48. In Proceedings of the 2020 3rd International Conference on Intelligent Sustainable Systems (ICISS), Thoothukudi, India, 3–5 December 2020; pp. 498–502. [Google Scholar]
Lestari, F.P.; Haekal, M.; Edison, R.E.; Fauzy, F.R.; Khotimah, S.N.; Haryanto, F. Epileptic seizure detection in EEGs by using random tree forest, naïve Bayes and KNN classification. J. Phys. Conf. Ser. 2020, 1505, 012055. [Google Scholar] [CrossRef]
Palša, J.; Ádám, N.; Hurtuk, J.; Chovancová, E.; Madoš, B.; Chovanec, M.; Kocan, S. MLMD—A Malware-Detecting Antivirus Tool Based on the XGBoost Machine Learning Algorithm. Appl. Sci. 2022, 12, 6672. [Google Scholar] [CrossRef]
Hew, K.F.; Hu, X.; Qiao, C.; Tang, Y. What predicts student satisfaction with MOOCs: A gradient boosting trees supervised machine learning and sentiment analysis approach. Comput. Educ. 2020, 145, 103724. [Google Scholar] [CrossRef]
Dong, X.; Yu, Z.; Cao, W.; Shi, Y.; Ma, Q. A survey on ensemble learning. Front. Comput. Sci. 2020, 14, 241–258. [Google Scholar] [CrossRef]
Lin, E.; Lin, C.H.; Lane, H.Y. Applying a bagging ensemble machine learning approach to predict functional outcome of schizophrenia with clinical symptoms and cognitive functions. Sci. Rep. 2021, 11, 6922. [Google Scholar] [CrossRef]
Abdulkareem, N.M.; Abdulazeez, A.M. Machine learning classification based on Radom Forest Algorithm: A review. Int. J. Sci. Bus. 2021, 5, 128–142. [Google Scholar]
Rodríguez, J.J.; Juez-Gil, M.; López-Nozal, C.; Arnaiz-González, Á. Rotation Forest for multi-target regression. Int. J. Mach. Learn. Cybern. 2022, 13, 523–548. [Google Scholar] [CrossRef]
Yuan, W.; Yang, R.; Yu, J.; Zeng, Q.; Yao, Z. Control method of spray curing system for cement concrete members based on the AdaBoost. M1 algorithm. Constr. Innov. 2021. ahead-of-print. [Google Scholar] [CrossRef]
Bharati, S.; Podder, P.; Thanh, D.N.H.; Prasath, V. Dementia classification using MR imaging and clinical data with voting based machine learning models. Multimed. Tools Appl. 2022, 81, 25971–25992. [Google Scholar] [CrossRef]
Satapathy, S.K.; Bhoi, A.K.; Loganathan, D.; Khandelwal, B.; Barsocchi, P. Machine learning with ensemble stacking model for automated sleep staging using dual-channel EEG signal. Biomed. Signal Process. Control 2021, 69, 102898. [Google Scholar] [CrossRef]
Mangini, S.; Tacchino, F.; Gerace, D.; Bajoni, D.; Macchiavello, C. Quantum computing models for artificial neural networks. Europhys. Lett. 2021, 134, 10002. [Google Scholar] [CrossRef]
Rosay, A.; Riou, K.; Carlier, F.; Leroux, P. Multi-layer perceptron for network intrusion detection. Ann. Telecommun. 2022, 77, 371–394. [Google Scholar] [CrossRef]
Miao, J.; Zhu, W. Precision–recall curve (PRC) classification trees. Evol. Intell. 2022, 15, 1545–1569. [Google Scholar] [CrossRef]
Weka Tool. Available online: https://www.weka.io/ (accessed on 4 November 2022).
Ahamad, M.M.; Aktar, S.; Rashed-Al-Mahfuz, M.; Uddin, S.; Liò, P.; Xu, H.; Summers, M.A.; Quinn, J.M.; Moni, M.A. A machine learning model to identify early stage symptoms of SARS-CoV-2 infected patients. Expert Syst. Appl. 2020, 160, 113661. [Google Scholar] [CrossRef] [PubMed]
Hansen, C.H.; Michlmayr, D.; Gubbels, S.M.; Mølbak, K.; Ethelberg, S. Assessment of protection against reinfection with SARS-CoV-2 among 4 million PCR-tested individuals in Denmark in 2020: A population-level observational study. Lancet 2021, 397, 1204–1212. [Google Scholar] [CrossRef] [PubMed]
Banerjee, A.; Ray, S.; Vorselaars, B.; Kitson, J.; Mamalakis, M.; Weeks, S.; Baker, M.; Mackenzie, L.S. Use of machine learning and artificial intelligence to predict SARS-CoV-2 infection from full blood counts in a population. Int. Immunopharmacol. 2020, 86, 106705. [Google Scholar] [CrossRef] [PubMed]
Yang, H.S.; Hou, Y.; Vasovic, L.V.; Steel, P.A.; Chadburn, A.; Racine-Brzostek, S.E.; Velu, P.; Cushing, M.M.; Loda, M.; Kaushal, R.; et al. Routine laboratory blood tests predict SARS-CoV-2 infection using machine learning. Clin. Chem. 2020, 66, 1396–1404. [Google Scholar] [CrossRef] [PubMed]
Mottaqi, M.S.; Mohammadipanah, F.; Sajedi, H. Contribution of machine learning approaches in response to SARS-CoV-2 infection. Informatics Med. Unlocked 2021, 23, 100526. [Google Scholar] [CrossRef]
Yan, L.; Zhang, H.T.; Goncalves, J.; Xiao, Y.; Wang, M.; Guo, Y.; Sun, C.; Tang, X.; Jing, L.; Zhang, M.; et al. An interpretable mortality prediction model for COVID-19 patients. Nat. Mach. Intell. 2020, 2, 283–288. [Google Scholar] [CrossRef]
Rosado, J.; Pelleau, S.; Cockram, C.; Merkling, S.H.; Nekkab, N.; Demeret, C.; Meola, A.; Kerneis, S.; Terrier, B.; Fafi-Kremer, S.; et al. Multiplex assays for the identification of serological signatures of SARS-CoV-2 infection: An antibody-based diagnostic and machine learning study. Lancet Microbe 2021, 2, e60–e69. [Google Scholar] [CrossRef]
Kadioglu, O.; Saeed, M.; Greten, H.J.; Efferth, T. Identification of novel compounds against three targets of SARS CoV-2 coronavirus by combined virtual screening and supervised machine learning. Comput. Biol. Med. 2021, 133, 104359. [Google Scholar] [CrossRef]
Kc, G.B.; Bocci, G.; Verma, S.; Hassan, M.M.; Holmes, J.; Yang, J.J.; Sirimulla, S.; Oprea, T.I. A machine learning platform to estimate anti-SARS-CoV-2 activities. Nat. Mach. Intell. 2021, 3, 527–535. [Google Scholar] [CrossRef]
Du, R.; Tsougenis, E.D.; Ho, J.W.; Chan, J.K.; Chiu, K.W.; Fang, B.X.; Ng, M.Y.; Leung, S.T.; Lo, C.S.; Wong, H.Y.F.; et al. Machine learning application for the prediction of SARS-CoV-2 infection using blood tests and chest radiograph. Sci. Rep. 2021, 11, 14250. [Google Scholar] [CrossRef]

Figure 1. Percentage distribution of participants per age group.

Figure 2. AUC ROC Curves.

Table 1. Dataset Description.

Feature	Type	Description
gender [41]	nominal	This feature illustrates the participants’ gender.
age (years) [42]	numeric	The age range of the participants is 0–96 years.
fever [43]	nominal	If the participant has body temperature greater than 38 °C.
cough [44]	nominal	If the participant has a severe cough.
runny_nose [45]	nominal	If the participant has a runny nose.
muscle_soreness [46]	nominal	If the participant has muscle soreness.
pneumonia [47]	nominal	If the participant has pneumonia.
diarrhea [48]	nominal	If the participant has diarrhea.
lung_infection [49]	nominal	If the participant has lung infection.
travel_history [50]	nominal	If the participant has travel history.
isolation_history [51]	nominal	If the participant received isolation treatment in designated hospitals.
SARS-CoV-2	nominal	This feature illustrates if the participant is positive to SARS-CoV-2.

Table 2. Participants distribution per feature value and class label in the balanced dataset after SMOTE.

		SARS-CoV-2 Class Label
Feature	Value	No	Yes
gender	female	24.05%	20.09%
gender	male	25.95%	29.91%
fever	No	35.50%	9.93%
fever	Yes	14.50%	40.07%
cough	No	42.03%	11.12%
cough	Yes	7.97%	38.88%
runny_nose	No	49.52%	34.30%
runny_nose	Yes	0.48%	15.70%
muscle_soreness	No	49.81%	49.93%
muscle_soreness	Yes	0.19%	0.07%
pneumonia	No	50.00%	34.53%
pneumonia	Yes	0.00%	15.47%
diarrhea	No	49.69%	49.94%
diarrhea	Yes	0.31%	0.06%
lung_infection	No	49.30%	25.47%
lung_infection	Yes	0.70%	24.53%
travel_history	No	12.98%	31.81%
travel_history	Yes	37.02%	18.19%
isolation_treatment	No	41.88%	34.63%
isolation_treatment	Yes	8.12%	15.37%

Table 3. Features importance evaluation using Information Gain and Random Forest.

InfoGain		Random Forest
Feature	Rank	Feature	Rank
cough	0.298465	cough	0.30911
lung_infection	0.262035	fever	0.25567
fever	0.199997	lung_infection	0.23836
pneumonia	0.175181	travel_history	0.18836
runny_nose	0.15083	pneumonia	0.15466
travel_history	0.106258	runny_nose	0.15223
isolation_treatment	0.021412	age_year	0.0874
gender	0.004587	isolation_treatment	0.07257
age_year	0.004074	gender	0.03957
diarrhea	0.001355	diarrhea	0.00253
muscle_soreness	0.000421	muscle_soreness	0.00121

Table 4. Machine Learning Models’ Parameters Settings.

Model	Parameters
NB	useKernelEstimator: False useSupervisedDiscretization: True
SVM	eps = 0.001, gamma = 0.0 kernel type: linear, loss = 0.1
LR	ridge = $10^{- 8}$ useConjugateGradientDescent: True
ANN	hidden layers: ‘a’ learning rate = 0.1 momentum = 0.2 training time = 200
SGD	epochs: 500 loss function: Log loss (logistic regression)
SMO	calibrator: Random Forest kernel: PolyKernel
KNN	K=1 Search Algorithm: LinearNNSearch Euclidean cross-validate = True
VFDT	leaf Prediction Strategy: Naive Bayes split criterion: Gini split
J48	reducedErrorPruning: False savelnstanceData: True useMDLCorrection: True, subtreeRaising: True binarySplits = True, collapseTree = True
RF	breakTiesRadomly: True numIterations = 100, numFeatures = 0 StoreOutOfBagPredictions: True
RT	maxDepth = 0, minNum = 1.0 minVarianceProp = 0.001
XGBoost	batchSize: 100 numDecimalPlaces: 2
GBM	classifier: Random Forest useEstimatePrior: True useResampling: True
AdaBoostM1	classifier: Random Forest, resume: True useResampling: True
RotF	classifier: Random Forest NumberOfGroups: True ProjectionFilter: PrincipalComponents
Stacking	classifiers: Random Forest and J48 metaClassifier: Logistic Regression numFolds = 10
Bagging	classifiers: Random Forest PrintClassifiers: True StoreOutOfBagPredictions: True
Voting	classifiers: Random Forest and J48 CombinationRule: AverageOfProbabilities

Table 5. Performance evaluation of ML models after SMOTE with 10-fold cross-validation.

	Accuracy	Precision	Recall	F-Measure	AUC
NB	0.871	0.872	0.871	0.871	0.911
SVM	0.883	0.885	0.883	0.883	0.883
LR	0.885	0.885	0.885	0.885	0.933
ANN	0.898	0.898	0.898	0.898	0.947
SGD	0.886	0.887	0.886	0.886	0.934
SMO	0.883	0.885	0.883	0.883	0.883
KNN	0.904	0.904	0.904	0.904	0.951
VFDT	0.885	0.886	0.885	0.885	0.939
J48	0.898	0.898	0.898	0.898	0.953
RF	0.906	0.907	0.906	0.906	0.960
RT	0.904	0.904	0.904	0.904	0.941
XGBoost	0.905	0.905	0.905	0.905	0.941
GBM	0.904	0.904	0.904	0.904	0.958
AdaBoostM1	0.905	0.905	0.905	0.905	0.956
RotF	0.907	0.907	0.907	0.907	0.962
Stacking	0.909	0.909	0.909	0.909	0.964
Bagging	0.905	0.906	0.905	0.905	0.963
Voting	0.907	0.908	0.907	0.907	0.963

Table 6. Comparison of our ML models with [77] in terms of Accuracy and AUC.

	Accuracy		AUC
	Our Models	[77]	Our Models	[77]
XGBoost	0.905	0.880	0.941	0.850
GBM	0.904	0.860	0.958	0.880
SVM	0.883	0.860	0.883	0.880
RF	0.906	0.860	0.960	0.830
DT	0.885	0.860	0.939	0.820

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Dritsas, E.; Trigka, M. Supervised Machine Learning Models to Identify Early-Stage Symptoms of SARS-CoV-2. Sensors 2023, 23, 40. https://doi.org/10.3390/s23010040

AMA Style

Dritsas E, Trigka M. Supervised Machine Learning Models to Identify Early-Stage Symptoms of SARS-CoV-2. Sensors. 2023; 23(1):40. https://doi.org/10.3390/s23010040

Chicago/Turabian Style

Dritsas, Elias, and Maria Trigka. 2023. "Supervised Machine Learning Models to Identify Early-Stage Symptoms of SARS-CoV-2" Sensors 23, no. 1: 40. https://doi.org/10.3390/s23010040

APA Style

Dritsas, E., & Trigka, M. (2023). Supervised Machine Learning Models to Identify Early-Stage Symptoms of SARS-CoV-2. Sensors, 23(1), 40. https://doi.org/10.3390/s23010040

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Supervised Machine Learning Models to Identify Early-Stage Symptoms of SARS-CoV-2

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset Description

2.2. Data Preprocessing

2.3. Features Analysis

2.4. Machine Learning Models

2.5. Evaluation Metrics

3. Results

3.1. Experiments Setup

3.2. Performance Evaluation

4. Discussion

5. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI