Radiomics for Machine Learning—A Multi-Class System for the Automatic Detection of COVID-19 and Community-Acquired Pneumonia from Computed Tomography Images

Paschaloudi, Vasileia; Fotopoulos, Dimitris; Chouvarda, Ioanna

doi:10.3390/biomedinformatics5020021

Open AccessArticle

Radiomics for Machine Learning—A Multi-Class System for the Automatic Detection of COVID-19 and Community-Acquired Pneumonia from Computed Tomography Images

by

Vasileia Paschaloudi

^*

,

Dimitris Fotopoulos

and

Ioanna Chouvarda

Lab of Computing, Medical Informatics and Biomedical-Imaging Technologies, School of Medicine, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece

^*

Author to whom correspondence should be addressed.

BioMedInformatics 2025, 5(2), 21; https://doi.org/10.3390/biomedinformatics5020021

Submission received: 12 February 2025 / Revised: 4 April 2025 / Accepted: 24 April 2025 / Published: 26 April 2025

Download

Browse Figures

Versions Notes

Abstract

Background: Radiomic features have been extensively used with machine learning and other Artificial Intelligence methods in medical imaging problems. Coronavirus Disease 2019 (COVID-19), which has been spreading worldwide since 2020, has motivated scientists to develop automatic COVID-19 recognition systems, to enhance the clinical routine in overcrowded hospitals. Purpose: To develop an automated system of recognizing COVID-19 and Community-Acquired Pneumonia (CAP) using radiomic features extracted from whole lung chest Computed Tomography (CT) images. Radiomic feature extraction from whole lung CTs simplifies the image segmentation for the malignancy region of interest (ROI). Methods: In this work, we used radiomic features extracted from CT images representing whole lungs to train various machine learning models that are capable of identifying COVID-19 images, CAP images and healthy cases. The CT images were derived from an open access data set, called COVID-CT-MD, containing 76 Normal cases, 169 COVID-19 cases and 60 CAP cases. Results: Four two-class models and one three-class model were developed: Normal–COVID, COVID–CAP, Normal–CAP, Normal–Disease and Normal–COVID–CAP. Different algorithms and data augmentation were used to train each model 20 times on a different data set split, and, finally, the model with the best average performance was selected for each case. The performance metrics of Accuracy, Sensitivity and Specificity were used to assess the performance of the different systems. Since COVID-19 and CAP share similar characteristics, it is challenging to develop a model that can distinguish these diseases. Result: The results were promising for the models finally selected for each case. The accuracy for the independent test set was 83.11% in the Normal–COVID case, 88.77% in the COVID–CAP case, 93.97% in the Normal–CAP case and 94.13% in the Normal–Disease case, when referring to two-class cases, while, in the three-class case, the accuracy was 78.55%. Conclusion: The results obtained suggest that radiomic features extracted from whole lung CT images can be successfully used to distinguish COVID-19 from other pneumonias and normal lung cases.

Keywords:

radiomics; image recognition; machine learning; COVID-19 recognition; CAP recognition

1. Introduction

Radiomics is a relatively new research field that deals with quantified information, extracted from medical images [1]. Radiomic features have been extensively used with machine learning in medical imaging problems. They can identify shape, statistical or textural features of a region of interest, such as a lesion or an organ, and, alone or in combination with demographic, histological, genomic data, they can be used in medical decision making [2,3]. The importance of radiomics lies in the fact that it is a non-invasive method for the medical diagnosis and prognosis of disease progression and can be used to objectively characterize patients’ CT images to support medical decision making.

The potential of radiomics has been explored and exploited in many areas of medical research. Cancer research is one of them. In the study of [4], Van Griethuysen, JJM et al. present a computational radiomic system to decode the radiographic phenotype. A review of research works that used radiomics in a Lung Cancer Precision Medicine is presented in the study of [5]. In [6], Bevilacqua A. et al. present a study of a [68Ga] Ga-DOTANOC PET/CT radiomic model for the non-invasive prediction of a tumor grade in pancreatic neuroendocrine tumors. Their results, although preliminary, are quite promising. In the study of [7], the diagnostic accuracy of textural features extracted from dual-energy contrast-enhanced mammography (CEM) images is estimated. Univariate and multivariate statistical analyses including Artificial Intelligence approaches are performed.

The literature review related to the COVID-19 disease classification from the lung CT images: since the end of 2019, when COVID-19 spread around the world, a lot of research has been conducted in the field of COVID-19 detection using Artificial Intelligence (AI) techniques. Chest CT examinations have been used as input for AI models in several studies. Many models used deep learning architecture for COVID-19 diagnosis and disease progression, and some of them exploited the radiomic capabilities in the research area.

Lin Li et al. in the study of [8] developed a deep learning model that extracts visual features from chest CTs for COVID-19 detection. Their model accurately distinguishes COVID-19 from community-acquired pneumonia and other non-pneumonic lung diseases. In the study of [9], Rajpoot et al. propose an ensemble convolutional neural network (CNN) model that incorporates several CNN models with explicable AI techniques (DenseNet169, ResNet50, and VGG16) for COVID-19 recognition. In their approach, they evaluate CNN models on the publicly available X-ray data set, COVIDx CXR-3, and the CT scan data set for SARS-CoV-2. Their proposed model shows strong performance in recognizing COVID-19 from CT scans and X-rays (accuracy 96.18% and 99%, respectively). In the study of [10], Zhao et al. propose a convolutional neural network for COVID-19 testing. The authors investigated how transfer learning can improve the performance of convolutional neural networks on COVID-19 testing using CT images and found that pre-trained models trained on larger out-of-domain datasets have better performance in COVID-19 detection. The accuracy of their best model is 99.2%.

In the study of [11], Xu X. et al. developed a deep learning system to screen COVID-19, influenza and healthy cases. They segmented candidate infection regions from the lung CT image set using the 3D deep learning model and then assigned the segmented images to the three groups along with the corresponding confidence scores. The overall accuracy of the system was 86.7%.

The literature review related to COVID-19 disease classification and radiomics: Shiri I. et al. conducted a CT-based radiomic analysis to assess the ability of their model in predicting the overall survival of patients with COVID-19 using a large multi-institutional data set in the study of [12].

Bai et al. in the study of [13] aimed to develop an AI system to differentiate COVID-19 from other pneumonia on chest CT and to evaluate radiologist performance with and without AI. The system was based on the deep neural network architecture EfficientNet version B4. When tested in different hospitals, it showed an accuracy of 87% in the independent test. AI augmentation also improved radiologists’ performance in distinguishing COVID-19 from the pneumonia of other etiologies.

In the study of [14], Shi F. et al. developed a machine learning framework to distinguish COVID-19 from CAP in CT images. CT images were first segmented into infections and lung fields using a deep learning model, and, then, a set of hand-crafted location-specific features were designed, and prediction models were generated. An infection size-aware Random Forest (iSARF) method was proposed to discriminate COVID-19 from CAP. The results show high performance with 89.4% accuracy, 90.7% sensitivity, and 87.2% specificity.

Chen H.J. et al. in the study of [15] developed a machine learning-based CT radiomic model to differentiate COVID-19 from other pneumonias. They used a semi-automatic segmentation procedure to delineate the volume of interest (VOI) and extracted radiomic features. They formed four groups of features: radiomic, radiological, quantifying and clinical features, and they built models for each group separately. They also combined the 4 feature groups and built an integrated Support Vector Machine (SVM) model. The performance of the integrated model outperformed all other models, giving an accuracy of 84.3% on the test set.

Wu Q. et al. in the study of [16] evaluated CT-derived radiomic features and clinical risk factors in predicting poor outcome in patients with COVID-19. They used automatically segmented whole lung CT images to extract radiomic features. They divided the patients into early- and late-phase groups and tested the prognostic ability of clinical risk factors, radiomic features and the combination of both. In the early-phase group, the combination of clinical and radiomic features showed the best performance in predicting the probability of a 28-day poor outcome (Area Under the receiver operating characteristic Curve: AUC = 0.862). In the late-phase group, radiomics alone performed similarly to the combination of clinical and radiomic features in predicting the 28-day poor outcome (AUC = 0.976).

Objectives of this study with research questions: In this study, we address the diagnostic capacity of radiomic features derived from whole lung organ, rather than lesions, which is addressed by the majority of previous works. A rigorous analysis is presented, taking into account different model design choices.

Focusing on lungs, we used an open CT scan data set, referred to as COVID-CT-MD, consisting not only of COVID-19 cases but also of Normal and CAP-infected participants [17]. Since COVID-19 and CAP share similar characteristics [18], it is challenging to develop a system that could distinguish the two cases. We extracted radiomic features from whole lung images. We used these radiomic features to develop systems that can distinguish with high accuracy (a) Normal cases versus COVID-19 cases, (b) COVID-19 versus CAP cases, (c) Normal cases versus CAP cases, (d) Normal cases versus diseased cases, and, also, (e) the three classes among each other (multi-class system). Different classifiers are considered, as well as the impact of data augmentation. Limitations and research gaps: Many studies in the field separate the infected lung regions to use as input for deep learning architectures. Such methods increase the complexity of the system. In addition, such methods require immense amounts of data for deep learning training. The proposed method offers simplicity as the whole lung is automatically segmented as a ROI. In addition, the data represent 305 patient CT scans, which is a relatively small amount of data.

Most methods focus on two-class classification and try to distinguish COVID-19 from other pneumonia. In our proposed method, we build several models, both two-class and three-class models. While two-class models show reliable performance, our three-class system lacks accuracy compared to other three-class systems. Some thoughts on how to improve its performance are discussed in the Discussion Section. Another limitation of our study is that the model is tested on a specific data set, which limits us in testing whether it generalizes well. The experimental results show that radiomics can be used for distinguishing Normal versus COVID-19 and CAP-infected patients with good accuracy.

This approach has a twofold importance: (a) scientific, as it shows how disease is characterized in the organ rather than in a lesion area; (b) practical, as, in case detailed lesion annotation is not necessary, at least in a first screening phase, thus avoiding clinical routine bottlenecks.

2. Materials and Methods

The data set: The data set used is an open access data set, called COVID-CD-MD [16], consisting of a total of 305 cases at diagnosis phase. Among these cases, 76 cases are normal (24.9%), 169 cases are COVID-19 (55.4%), and 60 cases are CAP (19.7%). There are 183 male patients, while the number of female patients is 122. The average patient age is 50 years with a standard deviation of 16 years. The age range of the patients is 12 to 94 years.

Each of these CT image series was automatically segmented with the open-access tool 3D Slicer [19] to provide the left and right lung volumes; the respective lung masks were extracted from the two volumes and used for feature extraction.

In Figure 1, we can see examples of various classes.

Data preprocessing: Before calculation of radiomic features, all images were set to HU units and resampled via B-splines to achieve common pixel spacing [1, 1, 1]. These were implemented as options in the pyradiomic pipeline.

Feature extraction: Calculation of radiomic features: Radiomic features were calculated separately from the two 3D volumes (left and right lungs), and 1221 features were extracted for each lung lobe.

Extracted features were of the following types: First Order Statistics, Shape-based (3D)/Shape-based (2D), Gray Level Cooccurrence Matrix (GLCM), Gray Level Run Length Matrix (GLRLM), Gray Level Size Zone Matrix (GLSZM), Gray Level Dependence Matrix (GLDM), on the original and Wavelet-filtered images. An extended tutorial about these features can be found in the study of [20].

Keeping in mind that lesions may be patchy and focused, or spread in the organ, two approaches were used for generating feature sets, employing bilateral lung information: For each one of the radiomic features initially extracted, the left and right lung lobe values were combined into one per patient, either by calculating

The Relative Difference in the feature values between the two volumes (RL features); Frl = 2 ∗ abs(Fl − Fr)/(Fl + Fr).
The average feature values between the two volumes (AV features). Fav = (Fl + Fr)/2.

Feature value outliers were detected and removed. The features were normalized within the range [−1, 1].

In addition, as the classes were imbalanced, the use of SMOTE technique (Synthetic Minority Oversampling Technique) [21] as data augmentation method was examined. Feature Selection: Feature selection took place on the whole data set.

Initially, t-test was used to find the most significant features of each pair of the following case sets: Normal–COVID, COVID–CAP, Normal–CAP and Normal–Disease pairs. For each pair of case sets, a feature set was created, containing only the statistically significant features of the case set. These feature sets were then used, alone or in combination of two, either for further feature reduction or as input to training models.

For further feature selection based on wrapper methods, Recursive Feature Elimination (RFE) and Boruta methods were applied to the feature sets derived from t-test. While statistically significant differences may be presented in numerous features, these features are also to a large extent correlated, and, therefore, the wrapper methods eventually select a much smaller number of features as adequate. RFE recursively removes features that are not useful to a model. Boruta is based on RF and finds the most important features, i.e., the ones that are consistently more important than artificially generated useless features. Both AV and RL features are selected. These sets of features were used in both two-class and multi-class cases.

In multi-class cases, feature sets generated for the two-class cases, or combinations of them, were used as data sets. In addition, RFE and Boruta methods were applied to the sum of the original radiomic features (1221 RL and 1221 AV) to find significant features and create additional feature sets.

Data set split: After feature selection, the data set was split into training and test sets for each experiment.

In all experiments, the train set was 75% of the whole data set, and the test set was 25%.

Performance metrics: In each experiment, the train–test split was repeated twenty times, each time changing the split by using a different set.seed(x) value. The set.seed(x) R command helps in creating repeatable data sets for analysis. By changing the seed x, we could obtain different splits (different training/testing data sets) each time. The average performance statistics taken from the twenty replicates were calculated for each case experiment, allowing us to present a more robust result. In addition to Average Accuracy (Avg Acc), Average Sensitivity (Avg Sens), Average Specificity (Avg Spec) and Average Balanced Accuracy (Avg Bal Acc), obtained from the 20 replicate trials, there are also metrics that we are interested in. Average Sensitivity denotes the percentage of positives recognized correctly, which is very important for infectious diseases, such as COVID-19. In two-class experiments, each metric is represented by a single value for each run, and the calculation of the average is straightforward. In three-class cases, however, the Average Sensitivity, Average Specificity and Average Balanced Accuracy are calculated over the 20 runs for each class and then over the three classes. This means that each run produces a value for the above metrics for each class. We calculate each average metric for the 20 runs by averaging over the 20 runs. For the final value, we average the metric over the three classes, and this is the result presented in the three-class Experimental Results Table.

Data imbalance: Our data set is considered unbalanced, as it consists of quite different numbers of normal, COVID, and CAP cases. Unbalanced sets may cause problems in system accuracy. Several approaches are proposed to deal with an unbalanced data set [22,23]. To address the unbalanced data set in our work, we used two methods.

(A) Model weights: This method was used in both two-class and multi-class models. Model weights can define the extent to which each class affects the learning process. Model weights in our system follow the formula below [24]:

Wj = nr_of_training_samples/(nr_of_classes ∗ nr_samples_j),

(1)

where Wj is the weight for class j.

nr_of_training_samples represents the number of total training samples.
nr_of_classes represents the number of classes.
nr_samples_j represents the number of samples in class j.

(B) SMOTE technique: This method was used in both two-class and multi-class models. SMOTE is a method that artificially generates new examples of the minority class using the nearest neighbors of those cases. Furthermore, the majority-class examples are also under-sampled, resulting in a more balanced data set. The SMOTE technique should be applied only to the training set after data splitting into training/testing sets, to avoid the case where synthetic data (based on the original training data) could end up in the testing set. In such a case, misleading results could be produced.

Model training: The classifiers used to train the two-class models were Random Forest (RF) and Lasso and Elastic-Net Regularized Generalized Linear Model (Glmnet), while Random Forest and eXtreme Gradient Boosting (XGBoost) were used to train the multi-class models.

The Random Forest classifier is widely used in medical classification problems because of its robust predictive ability. It involves the merging of multiple tree classifiers, each constructed using a separate random vector sampled from the input vector, allowing an input vector to be classified with increased accuracy. It has been successfully used in COVID-19 recognition from CT images [25].

Lasso and Elastic-Net Regularized Generalized Linear Model (Glmnet) has also been used for COVID-19 recognition, using gene expression [26].

EXtreme Gradient Boosting (XGBoost) is a decision tree-based machine learning algorithm that uses a gradient boosting structure. XGBoost has been widely used in machine learning classification problems, including COVID-19 recognition [27].

All models were trained using cross-validation (10-fold, repeated 25 times).

The system flowchart consisting of data collection, automatic ROI segmentation radiomic feature extraction, radiomic feature transformation, model building, and evaluation is shown in Figure 2.

3. Results

Two different types of experiments were performed: Two-class experiments and three-class experiments.

The type and number of features in each feature set generated by the different feature selection methods are shown in Table 1. These feature sets are used, alone or in combination of two, in model training.

3.1. Two-Class Experiments

3.1.1. Normal–COVID

Initial experiments used 562 features derived from the t-test. However, machine learning and radiomics aim to use fewer features in model training. For further feature reduction, the RFE method was applied on the 562 features derived from the t-test to select the most important features, resulting in only 6 AV features. These six features are shown in Figure 3. The bilateral average texture features selected (mostly wavelet) show a generally higher lung texture in COVID-19 than in a normal CT scan, rather than differences between the two lobes.

The Boruta method was alternatively applied on the 562 features obtained from the t-test for feature reduction. In that case, 32 features were selected.

Both data sets, derived from either RFE or Boruta methods, were fed to Random Forest and Glmnet algorithms for training. Experiments were performed to test the impact of the SMOTE method, when applied to the above data sets, and proved that, in all Normal–COVID cases, SMOTE produces better Average Accuracy results for both data sets of 6 and 32 features.

The best results in terms of accuracy were derived when we used Glmnet to train a model using the 32 feature data set obtained from the Boruta method, with the training set augmented with SMOTE. The Average Accuracy over the 20 runs in this case reached 83.11%. The Average Specificity is 83.21%.

Each run generates a model. Out of the 20 models of the best Normal–COVID case, we selected the last created model to present its 10 most important features (Figure 4).

All results, except for the initial 562 feature data set test results, are presented in Table A1.

3.1.2. COVID–CAP

Following the Normal–COVID case method, experiments were initially performed on the 654 features selected by t-test. When RFE was applied to the data set containing the 654 features selected by t-test, 20 features were finally selected, of which only 2 were RL features (2 RL + 18 AV). The differences between the COVID–CAP cases, regarding the 20 features, selected by RFE, are presented in Figure 5.

The Boruta method, also applied to the 654 features selected by t-test, resulted in 42 important features, which eventually formed a new 42 feature data set.

The SMOTE method was tested on both data sets of 20 and 42 features. SMOTE was tested with different parameters, generating either 90 COVID—90 CAP cases or 135 COVID—135 CAP cases. All data sets were fed to Random Forest and Glmnet models for training.

The augmented sets, containing either 90 or 135 examples of each type of patient, proved to produce better results in the Random Forest model than in the Glmnet model.

When SMOTE was used, the Average Specificity was over 82% and the Average Sensitivity was over 80%. Although we also had good results for these metrics in experiments where SMOTE was not used, the results with SMOTE outperformed the results without it.

In conclusion, SMOTE did not help in all tested models. However, it helped with the Random Forest model, which was trained on the 20 RFE feature data set, producing the best results for the COVID–CAP case: Average Accuracy (88.77%). The Average Sensitivity in this case was 90.95%, while the Average Specificity reached 82.67%.

Figure 6 shows the 10 most important features for a model selected out of the 20 models created for each run of the best COVID–CAP case.

All COVID–CAP results are presented in Table A2.

3.1.3. Normal–CAP

For the Normal–CAP case, the t-test selected 1053 features. Experiments were not performed with such a high number of features. Instead, RFE and Boruta methods were applied on the 1053 feature data set for feature reduction.

By applying RFE to the data set of 1053 features, a data set of 35 features was derived. Of these features, 11 were RL, and 24 were AV. Boruta, on the other hand, selected 73 features, among which 32 were RL and 41 were AV.

SMOTE was used to augment the trainset of either 35 features from RFE or 73 features from Boruta methods.

From the experiments performed, we concluded that SMOTE helped in increasing system performance on the 73 feature data set derived from the Boruta method, for both training algorithms, i.e., Random Forest and Glmnet. The best results for the 73 feature data set were obtained when SMOTE was used with RF. The Average Accuracy on 20 runs in that case was 93.82%.

However, the SMOTE method did not improve system accuracy in the 35 feature data set experiments. The best results that derived from the latter experiments came from a model that used 35 RFE features and Random Forest for training, without data augmentation. The Average Accuracy over 20 runs in this case was 93.97%. The Average Sensitivity and Specificity of this model were 96.8% and 90.3%, respectively. Comparing the two models, (a) 73 feature data set from Boruta augmented with SMOTE and trained with Random Forest and (b) 35 feature data set from RFE and trained with Random Forest without SMOTE augmentation, we conclude that both models give very close results. However, since the number of features used in case (b) is less than half of the number of features used in case (b), we keep the case (b) model as our best choice.

To find the importance of the features for the best-case models, we selected 1 model out of the 20 created, one for each run, and plotted the 10 most important features. The variable importance is plotted in Figure 7.

Normal–CAP experimental results are shown in Table A3.

3.1.4. Normal–Disease

As a last two-class case, we checked the Normal and Disease groups, where the latter group comprised both COVID and CAP data.

For the Normal–Disease case, the t-test selected 736 features. The selected features were used as input to RFE and Boruta methods for further feature reduction.

RFE resulted in 40 features. Among these 5 were RL features, and 35 were AV.

Boruta resulted in 37 features, among which 8 were RL, and 29 were AV.

The impact of SMOTE was tested on both trainsets of either 40 features from RFE or 37 features from Boruta methods.

Model weights were also tested, without producing any improvement in system performance.

From the experiments performed, we concluded that SMOTE helped to slightly increase system performance on both 40 feature and 37 feature data sets derived from the RFE and Boruta methods, respectively, only when RF was used as the training algorithm. On the other hand, system performance was significantly reduced when SMOTE was used to train the above data sets with the Glmnet algorithm. The best results of all Normal–Disease experiments were derived from the 37 Boruta features data set, trained with Glmnet, without using SMOTE. The Average Accuracy of the 20 runs for this case was 94.63%.

From the 20 models created for each run of the best case, we selected a representative model and calculated the variable importance. Figure 8 shows the 10 most important features for this model.

Table A4 shows all experimental results.

3.1.5. Summary for Two-Class Cases

The best performance for models of Normal–COVID and COVID–CAP cases was achieved when SMOTE was used for data augmentation. However, this was not the case in Normal–CAP models: SMOTE had not improved the average accuracy in that case. In the Normal–CAP case, however, Normal and CAP patients are quite equivalent in numbers. Consequently, we cannot consider the data set as unbalanced, as it is the case in Normal–COVID and COVID–CAP pairs, so there was no reason to use SMOTE. In the Normal–Disease case, SMOTE did not improve the system’s Average Accuracy, which, however, was already high (94.63%).

Conclusively, we used three methods of feature selection:

t-test;
RFE;
Boruta.

The t-test was used for the initial feature selection. The resulting feature subset was then used for further feature selection using either RFE or Boruta methods.

We used SMOTE to augment the training examples. SMOTE was applied on sets consisting of RFE or Boruta extracted features.

On all possible models and training sets we tested two training methods:

Random Forest;
Glmnet.

3.2. Multi-Class Experiments

For the multi-class case, we performed several experiments, using different feature sets. We used feature sets that had been proven effective in terms of system performance in two-class experiments or combinations of these sets. We also created two new data sets, as results of applying RFE or Boruta to the set containing the sum of the initial RL and AV features (2442 in total). Thus, we performed experiments with the following data sets:

A 26 feature data set: Combination of the above 6 features with the 20 features selected from RFE in the COVID–CAP case;
A 52 feature data set: A total of 20 features selected from RFE in the COVID–CAP case combined with the 35 features selected from RFE in the Normal–CAP case, excluding 3 common AV features;
A 35 feature data set: A total of 35 features selected by RFE in the Normal–CAP case;
A 80 feature data set: A total of 80 features selected by RFE, when RFE was applied to a data set containing the sum of the initial RL and AV features, i.e., 2442 features;
A 61 feature data set: A total of 61 features selected by Boruta, when Boruta was applied to a data set containing the sum of the initial RL and AV features, i.e., 2442 features.

From the experiments that were performed, we concluded that the developed models performed better when the training data included features from two pairs of data set (1. and 2. sets). Features selected by applying RFE or Boruta on the data set of the total number of original features (4. and 5.) had not helped in improving system performance.

The impact of SMOTE was tested for each model, as the data sets were imbalanced.

Of all the above three-class cases of experiments, the best Average Accuracy over 20 runs was achieved when we used the 26 feature data set, with the XGBoost algorithm for training and without using the SMOTE technique. The Average Accuracy over the 20 runs was 78.55%, and this is our best three-class case. In this case, the Average Balanced Accuracy for the three classes, Normal, COVID and CAP, is 83.16%, 79.24% and 84.69%, respectively, which means that the CAP class has a better performance than the COVID class.

From the 20 models created for each run of the best case, we selected a representative model and calculated the variable importance. Figure 9 shows the 10 most important features for this model.

Figure 10 shows the eight most important features for the best Normal–COVID–CAP case. The most important features are plotted comparatively for the three classes and show the differences in lung texture in Normal, COVID and CAP CTs. It is worth noting that, for at least one feature, COVID and CAP cases have similar distributions, which is a drawback for the system in achieving high performance.

Multi-class results are shown in Table A5.

A summary of the best models for both two- and three-class cases is presented in Table 2. The tuning parameters for the best models are also included in the summary table.

4. Discussion

CT imaging is a non-invasive technique that helps healthcare professionals detect various diseases. AI has been widely used in segmenting Regions of Interest (ROIs) and training disease recognition models. When automated whole lung segmentation is performed on CT pulmonary images, it can save time and human resources. On the other hand, radiomic features are characterized by objectivity as they are quantified image information, extracted from the whole image.

Since CT images representing COVID cases share many features with other pneumonias, it is important to distinguish these two types of pneumonias. In the related research we found, the interest is mostly driven to distinguish between COVID and CAP or other pneumonias, as in Bai [12], rather than COVID–CAP–Normal cases.

The aim of this research work was to develop machine learning models capable of automatically detecting COVID-19, CAP and Normal cases from whole lung CT patient images. The data used come from an open access data set of patient CT images. Whole lung segmentations were generated automatically. For each patient image, a plethora of radiomic features was extracted from both lungs separately. Each pair of radiomic features, extracted from the left and right lungs, was transformed into one feature representing both lungs. Feature elimination methods were used not only to reduce the complexity of the models but mainly to discover the important features that distinguish the different cases. The two-class and three-class ML models based on lung lobe radiomics and the proposed new meta-features combining lung lobe radiomics are the main innovations of our system.

In the two-class experiments, four pairs of data sets were tested: Normal–COVID, COVID–CAP, Normal–CAP, Normal–Disease data sets and respective models were built. In all those experiments, the paired sets could be well distinguished from each other. The performance of the best models for each case in terms of Average Accuracy through 20 runs ranged from 81.97% Accuracy in the Normal–COVID case to 94.63% Accuracy in the Normal–Disease case.

In the three-class experiments, we used sets of features that were shown to be good at discriminating between the two classes in the two-class experiments. For a more objective result, each experiment was repeated 20 times, by changing the partitioning of the data set. The average metrics of performance (Avg Acc, Avg Sens, Avg Spec, and Avg Bal Acc) for those models were calculated over the 20 runs on the different data splitting.

In three-class experiments, we also had quite good results. The best Average Accuracy of 78.55% was achieved by using 26 features selected from RFE Normal–CAP (6 features), RFE COVID–CAP (20 features) and XGBoost as the training algorithm.

Most of the research related to COVID-19 focuses either on the differentiation between COVID-19 and other types of pneumonia (two-class case) or on the differentiation between COVID-19, other types of pneumonia and Normal cases (three-class case). As far as our two-class COVID–CAP model is concerned, compared to models based on whole lung segmentation, we see that our model gives better Average Accuracy and Sensitivity and lower Average Specificity compared to Bai et al.’s model (88.87% vs. 87%, 90.95% vs. 89% and 82.67% vs. 86%, respectively). As Bai’s model uses a deep neural network (NN), it is important to have a large number of input images to train such a model, which can be seen as a disadvantage for research teams that do not have large data sets at their disposal. On the other hand, our two-class model achieved its performance using a much smaller number of images, which can be considered a pro. Also, the data set split in Bai et al. was 80:10:10 for training/validation/testing, using only 10% of the data as testing.

Our two-class model compared to Chen et al.’s model, which uses semi-automated segmented lesions for radiomic feature extraction, results in better performance when referring to Accuracy, Specificity and worse Sensitivity (acc 88.87% vs. 84.3%, sens 90.95% vs. 92.3% and spec 82.67% vs. 81.6%).

On the other hand, our three-class model lacks accuracy compared to the three-class model of Xu X. et al. Our model has an accuracy of 78.55%, while Xu X. et al.’s model has an accuracy of 86.7%. One point to keep in mind is the training set/test set ratio. In both the two-class and three-class cases, where the performance of our proposed models is lower than that of the compared research, the latter’s test set corresponds to only 10% of the data, whereas ours corresponds to 25% of the available data. One thought is that increasing the percentage of data in the training set could help improve the system’s performance.

Another idea that could probably improve the performance of our three-class model comes from the fact that our Normal–Disease model gives relatively high performance (avg Accuracy 94.63%, avg Sens 88.15% and avg Spec 95.83%), and our COVID–CAP model also gives quite a good performance (avg Accuracy 88.87%, avg Sens 90.95% and avg Spec 82.67%). The idea is to set up a two-stage model, where we first separate the healthy from the unhealthy cases, and, in the second stage, we distinguish the COVID from the CAP cases. Given the performance of the two models above, we expect the two-stage three-class model to outperform the proposed three-class model.

In Table 3, a summary of research works relative to our study is presented.

Other steps that we could follow in our research include testing additional feature sets by combining the feature selection methods already used or exploring new feature selection methods applied to the current data set [28].

The data availability has been the main limitation in this study. As a specific data set was employed, the model generalization needs to be further tested with data from other centers. Collecting more data and creating a more balanced data set is also one of our goals. In addition to performance testing, the model will be tested for biases in a more extensive data set.

5. Conclusions

In this study, we investigated the ability of radiomics extracted from whole lung CTs of patients to discriminate between normal, COVID and CAP cases. The results are quite promising and add to the idea that this could be applied to other lung diseases. Explicit exploration of the radiomic features could improve the overall performance of the models created.

Author Contributions

Conceptualization, I.C. and V.P.; methodology, V.P. and I.C.; software, V.P., D.F. and I.C.; validation, V.P.; writing—original draft preparation, V.P.; writing—review and editing, V.P. and I.C.; visualization, V.P.; supervision, I.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study will be available by the publication of this paper.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

COVID-19	Coronavirus Disease 2019
CAP	Community-Acquired Pneumonia
CT	Computed Tomography
ROI	Region of interest
AI	Artificial Intelligence
VOI	Volume of interest
SVM	Support Vector Machine
AUC	Area Under the receiver operating characteristic Curve
GLCM	Gray Level Cooccurrence Matrix
GLRLM	Gray Level Run Length Matrix
GLSZM	Gray Level Size Zone Matrix
GLDM	Gray Level Dependence Matrix
RL	Relative Difference
AV	Average
SMOTE	Synthetic Minority Oversampling Technique
RFE	Recursive Feature Elimination
RF	Random Forest
Glmnet	Lasso and Elastic-Net Regularized Generalized Linear Model
XGBoost	eXtreme Gradient Boosting
NN	Neural network

Appendix A

In this appendix, all the result tables are gathered.

Table A1. Normal—COVID experimental results.

Normal–COVID
Cases	Method	Nr of Features	% Avg Acc	% Avg Sens	% Avg Spec	% Avg Balanced Acc
1	RFE (on 562 features from t-test) + RF	6 AV (no RL)	72.07	58.68	86.19	72.43
2	RFE (on 562 features from t-test) + RF + SMOTE	6 AV features from RFE	79.10	82.89	77.38	80.13
3	RFE (on 562 features from t-test) + Glmnet	6 AV (no RL)	76.48	47.11	89.76	68.43
4	RFE (on 562 features from t-test) + Glmnet + SMOTE	6 AV features from RFE	78.61	86.32	75.12	80.72
5	Boruta (on 563 features from t-test) + RF	32 (26 AV + 6 RL)	76.63	58.94	84.64	71.79
6	Boruta (on 563 features from t-test) + RF + SMOTE	32 (26 AV + 6 RL)	81.39	85	79.76	82.38
7	Boruta (on 563 features from t-test) + Glmnet	32 (26 AV + 6 RL)	82.62	75.53	85.83	80.67
8	Boruta (on 563 features from t-test) + Glmnet + SMOTE	32 (26 AV + 6 RL)	83.11	82.89	83.21	83.05

Table A2. COVID–CAP experimental results.

COVID–CAP
Cases	Method	Nr of Features	% Avg Acc	% Avg Sens	% Avg Spec	% Avg Balanced Acc
1	RFE (on 654 features from t-test) + RF	20 (2 RL, 18 AV)	77.62	83.22	96.78	69.66
2	RFE (on 654 features from t-test) + RF + SMOTE	20 (2 RL, 18 AV)	88.77	90.95	82.67	86.80
3	RFE (on 654 features from t-test) + Glmnet	20 (2 RL, 18 AV)	86.67	93.92	66.33	80.13
4	RFE (on 654 features from t-test) + Glmnet + SMOTE	20 (2 RL, 18 AV)	85.55	83.42	86.48	84.95
5	Boruta (on 654 features from t-test) + RF	42 (6 RL, 36 AV)	87.72	96.30	63.66	79.98
6	Boruta (on 654 features from t-test) + RF + SMOTE (135 cov 135 cap)	42 (6 RL, 36 AV)	87.36	85.86	89.05	82.66
7	Boruta (on 654 features from t-test) + Glmnet	42 (6 RL, 36 AV)	87.72	94.88	67.66	81.27
8	Boruta (on 654 features from t-test) + Glmnet + SMOTE	42 (6 RL, 36 AV) (135 cov, 135 cap)	85.54	83.41	86.48	84.95
9	Boruta (on 654 features from t-test) + Glmnet + SMOTE	42 (6 RL, 36 AV) (90 cov, 90 cap)	85.75	84.10	86.47	85.29

Table A3. Normal–CAP experimental results.

Normal–CAP
Cases	Method	Nr of Features	% Avg Acc	% Avg Sens	% Avg Spec	% Avg Balanced Acc
1	RFE (on 1053 features from t-test) + RF	35 (11 RL+ 24 AV)	93.97	96.84	90.33	93.58
2	RFE (on 1053 features from t-test) + RF + SMOTE	35 (11 RL+ 24 AV)	93.38	92.63	94.33	93.48
3	RFE on 1053 features from t-test + Glmnet	35 (11 RL+ 24 AV)	91.47	100.00	80.66	90.33
4	RFE (on 1053 features from t-test) + Glmnet + SMOTE)	35 (11 RL+ 24 AV)	91.47	95.53	86.33	90.93
5	Boruta (on 1053 features from t-test) + RF	73 (32 RL + 41 AV)	93.24	96.32	89.33	92.82
6	Boruta (on 1053 features from t-test) + RF + SMOTE	73 (32 RL + 41 AV) (smote 90-90)	93.82	94.73	92.67	93.70
7	Boruta (on 1053 features from t-test) + Glmnet	73 (32 RL + 41 AV)	89.85	98.68	78.67	88.68
8	Boruta (on 1053 features from t-test) + Glmnet + SMOTE	73 (32 RL + 41 AV) (smote 90-90)	90.44	93.42	86.67	90.04

Table A4. Normal—Disease experimental results.

Normal–Disease
Case	Method	Nr of Features	% Avg Acc	% Avg Sens	% Avg Spec	% Avg Balanced Acc
1	RFE on (736 features from t-test) + RF	40 (5 RL + 35 AV)	81.58	63.95	87.46	75.70
2	RFE (on 736 features from t-test) + RF + model weights	40 (5 RL + 35 AV)	81.58	63.95	87.46	75.70
3	RFE (on 736 features from t-test) + RF + SMOTE 114 n–114 d	40 (5 RL + 35 AV)	81.91	86.58	80.35	83.46
4	RFE (on 736 features from t-test) + Glmnet	40 (5 RL + 35 AV)	89.47	73.68	94.74	84.21
5	RFE (on 736 features from t-test) + Glmnet + SMOTE	40 (5 RL + 35 AV)	85.52	86.58	85.17	85.88
6	Boruta (on 736 feat from t-test) + RF	37 (8 RL + 29 AV)	81.45	65.00	86.93	75.96
7	Boruta (on 736 feat from t-test) + RF + SMOTE	37 (8 RL + 29 AV) (smote 114-114)	82.30	87.63	80.53	84.08
8	Boruta (on 736 feat from t-test) + Glmnet	37 (8 RL + 29 AV)	94.63	88.15	95.83	91.99
9	Boruta (on 736 feat from t-test) + Glmnet + SMOTE	37 (8 RL + 29 AV) (smote 114-114)	83.29	86.32	82.28	84.29
10	Boruta (on 736 feat from t-test) + Glmnet + SMOTE	37 (8 RL + 29 AV) (smote 171-171)	82.36	85.00	81.49	83.25

Table A5. Normal–COVID–CAP experimental results.

Normal–COVID–CAP
Sets	Method	Nr of Features	% Avg Acc	% Avg Sens	% Avg Spec	% Avg Balanced Acc
1	RFE on Normal–COVID + RFE on COVID–CAP + RF	6 AV + 20 (18 AV + 2 RL)	76.05	73.31	85.76	79.54
2	RFE on Normal–COVID + RFE on COVID–CAP + RF + SMOTE	6 AV + 20 (18 AV + 2 RL)	75.33	76.25	86.49	81.37
3	RFE on Normal–COVID + RFE on COVID–CAP + XGBoost	6 AV + 20 (18 AV + 2 RL)	78.55	77.20	87.53	82.36
4	RFE on Normal–COVID + RFE on COVID–CAP + XGBoost + SMOTE	6 AV + 20 (18 AV + 2 RL)	77.17	82.76	78.31	87.08
5	RFE on Normal–COVID + RFF on COVID–CAP—only 60 random cases from each class + RF	6 AV + 20 (18 AV + 2 RL)	76.11	84.33	73.91	88.00
6	REF on COVID–CAP + RFE on Normal–CAP + RF	20 (2RL + 18AV) +32 (11 RL + 21 AV: 24-3 common AV)	74.74	72.00	84.76	78.38
7	RFE on COVID–CAP + RFE on Normal–CAP + RF + SMOTE	20 (2RL + 18AV) +32 (11 RL + 21 AV: 24-3 common AV)	74.61	75.17	85.88	80.52
8	RFF on COVID–CAP + RFE on Normal–CAP + XGBoost	20 (2RL + 18AV) +32 (11 RL + 21 AV: 24-3 common AV)	75.33	73.72	85.67	79.69
9	RFE on COVID–CAP + RFE on Normal–CAP + XGBoost + SMOTE	20 (2RL + 18AV) +32 (11 RL + 21 AV: 24-3 common AV)	75.79	81.80	76.88	85.98
10	RFE on Normal–CAP + RF	35 (RL + AV)	71.64	78.16	71.52	78.41
11	RFE on Normal–CAP +RF + SMOTE	35 (RL + AV)	69.27	79.386	70.55	80.22
12	RFE on Normal–CAP + XGBoost + SMOTE	35 (RL + AV)	69.41	79.61	70.97	80.91
13	RFE on Normal–COVID–CAP set of 2442 (RL+ AV) features + RF	80 (16 RL + 64 AV)	73.49	70.60	84.03	77.31
14	RFE on Normal–COVID–CAP set of 2442 (RL+ AV) features + RF + SMOTE	80 (16 RL + 64 AV)	73.75	73.31	85.65	79.48
15	RFE on Normal–COVID–CAP set of 2442 (RL+ AV) features + XGBoost	80 (16 RL + 64 AV)	75.72	73.94	85.78	79.86
16	RFE on Normal–COVID–CAP set of 2442 (RL+ AV) features + XGBoost + SMOTE	80 (16 RL + 64 AV)	73.81	74.38	85.57	79.98
17	Boruta on Normal–COVID–CAP set from 2442 (RL+ AV) features + RF	61 (18 RL, 43 AV)	74.61	71.94	84.69	78.31
18	Boruta on Normal–COVID–CAP set of 2442 (RL+ AV) features +RF + SMOTE	61 (18 RL, 43 AV)	74.61	75.45	86.11	80.78
19	Boruta on Normal–COVID–CAP set of 2442 (RL+ AV) features + XGBoost	61 (18 RL, 43 AV)	75.59	73.76	85.67	79.71
20	Boruta on Normal–COVID–CAP set of 2442 (RL+ AV) features + XGBoost + SMOTE	61 (18 RL, 43 AV)	75.20	75.95	86.34	81.15

References

Mayerhoefer, M.E.; Materka, A.; Langs, G.; Häggström, I.; Szczypiński, P.; Gibbs, P.; Cook, G. Introduction to Radiomics. J. Nucl. Med. 2020, 61, 488–495. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Lambin, P.; Rios-Velazquez, E.; Leijenaar, R.; Carvalho, S.; van Stiphout, R.G.; Granton, P.; Zegers, C.M.; Gillies, R.; Boellard, R.; Dekker, A.; et al. Radiomics: Extracting more information from medical images using advanced feature analysis. Eur. J. Cancer 2012, 48, 441–446. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Kumar, V.; Gu, Y.; Basu, S.; Berglund, A.; Eschrich, S.A.; Schabath, M.B.; Forster, K.; Aerts, H.J.; Dekker, A.; Fenstermacher, D.; et al. Radiomics: The process and the challenges. Magn. Reason. Imaging 2012, 30, 1234–1248. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Van Griethuysen, J.J.M.; Fedorov, A.; Parmar, C.; Hosny, A.; Aucoin, N.; Narayan, V.; Beets-Tan, R.G.H.; Fillon-Robin, J.C.; Pieper, S.; Aerts, H.J.W.L. Computational Radiomics System to Decode the Radiographic Phenotype. Cancer Res. 2017, 77, 104–107. [Google Scholar] [CrossRef]
Tunali, I.; Gillies, R.J.; Schabath, M.B. Application of Radiomics and Artificial Intelligence for Lung Cancer Precision Medicine. Cold Spring Harb. Perspect. Med. 2021, 11, a039537. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Bevilacqua, A.; Calabrò, D.; Malavasi, S.; Ricci, C.; Casadei, R.; Campana, D.; Baiocco, S.; Fanti, S.; Ambrosini, V. A [68Ga] Ga-DOTANOC PET/CT Radiomic Model for Non-Invasive Prediction of Tumour Grade in Pancreatic Neuroendocrine Tumours. Diagnostics 2021, 11, 870. [Google Scholar] [CrossRef] [PubMed]
Fusco, R.; Piccirillo, A.; Sansone, M.; Granata, V.; Rubulotta, M.R.; Petrosino, T.; Barretta, M.L.; Vallone, P.; Di Giacomo, R.; Esposito, E.; et al. Radiomics and Artificial Intelligence Analysis with Textural Metrics Extracted by Contrast-Enhanced Mammography in the Breast Lesions Classification. Diagnostics 2021, 11, 815. [Google Scholar] [CrossRef]
Li, L.; Qin, L.; Xu, Z.; Yin, Y.; Wang, X.; Kong, B.; Bai, J.; Lu, Y.; Fang, Z.; Song, Q.; et al. Using Artificial Intelligence to Detect COVID-19 and Community-acquired Pneumonia Based on Pulmonary CT: Evaluation of the Diagnostic Accuracy. Radiology 2020, 296, E65–E71. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Rajpoot, R.; Gour, M.; Jain, S.; Semwal, V.B. Integrated ensemble CNN and explainable AI for COVID-19 diagnosis from CT scan and X-ray images. Sci. Rep. 2024, 14, 24985. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Zhao, W.; Jiang, W.; Qiu, X. Deep learning for COVID-19 detection based on CT images. Sci. Rep. 2021, 11, 14353. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Xu, X.; Jiang, X.; Ma, C.; Du, P.; Li, X.; Lv, S.; Yu, L.; Ni, Q.; Chen, Y.; Su, J.; et al. A Deep Learning System to Screen Novel Coronavirus Disease 2019 Pneumonia. Engineering 2020, 6, 1122–1129. [Google Scholar] [CrossRef] [PubMed]
Shiri, I.; Salimi, Y.; Pakbin, M.; Hajianfar, G.; Avval, A.H.; Sanaat, A.; Mostafaei, S.; Akhavanallaf, A.; Saberi, A.; Mansouri, Z.; et al. COVID-19 prognostic modeling using CT radiomic features and machine learning algorithms: Analysis of a multi-institutional dataset of 14,339 patients. Comput. Biol. Med. 2022, 145, 105467. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Bai, H.X.; Wang, R.; Xiong, Z.; Hsieh, B.; Chang, K.; Halsey, K.; Tran, T.M.L.; Choi, J.W.; Wang, D.C.; Shi, L.B.; et al. Artificial Intelligence Augmentation of Radiologist Performance in Distinguishing COVID-19 from Pneumonia of Other Origin at Chest CT. Radiology 2020, 296, E156–E165. [Google Scholar] [CrossRef] [PubMed]
Shi, F.; Xia, L.; Shan, F.; Song, B.; Wu, D.; Wei, Y.; Yuan, H.; Jiang, H.; He, Y.; Gao, Y.; et al. Large-scale screening to distinguish between COVID-19 and community-acquired pneumonia using infection size-aware classification. Phys. Med. Biol. 2021, 66, 065031. [Google Scholar] [CrossRef] [PubMed]
Chen, H.J.; Mao, L.; Chen, Y.; Yuan, L.; Wang, F.; Li, X.; Cai, Q.; Qiu, J.; Chen, F. Machine learning-based CT radiomics model distinguishes COVID-19 from non-COVID-19 pneumonia. BMC Infect Dis. 2021, 21. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Wu, Q.; Wang, S.; Li, L.; Wu, Q.; Qian, W.; Hu, Y.; Li, L.; Zhou, X.; Ma, H.; Li, H.; et al. Radiomics Analysis of Computed Tomography helps predict poor prognostic outcome in COVID-19. Theranostics 2020, 5, 7231–7244. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Afshar, P.; Heidarian, S.; Enshaei, N.; Naderkhani, F.; Rafiee, M.J.; Oikonomou, A.; Fard, F.B.; Samimi, K.; Plataniotis, K.N.; Mohammadi, A. COVID-CT-MD, COVID-19 computed tomography scan dataset applicable in machine learning and deep learning. Sci. Data 2021, 8, 121. [Google Scholar] [CrossRef]
Carlicchi, E.; Gemma, P.; Poerio, A.; Caminati, A.; Vanzulli, A.; Zompatori, M. Chest-CT mimics of COVID-19 pneumonia—A review article. Emerg. Radiol. 2021, 28, 507. [Google Scholar] [CrossRef]
3D Slicer Image Computing Platform. Available online: https://www.slicer.org/ (accessed on 28 January 2025).
Available online: https://pyradiomics.readthedocs.io/en/latest/features.html (accessed on 3 April 2025).
SMOTE Algorithm for Unbalanced Classification Problems. Available online: https://rdrr.io/cran/performanceEstimation/man/smote.html (accessed on 28 January 2025).
Tanha, J.; Abdi, Y.; Samadi, N.; Razzaghi, N.; Asadpour, M. Boosting methods for multi-class imbalanced data classification: An experimental review. J. Big Data 2020, 7. [Google Scholar] [CrossRef]
What Is Imbalanced Data and How to Handle It? Available online: https://turintech.medium.com/what-is-imbalanced-data-and-how-to-handle-it-369a70be16fc (accessed on 3 April 2025).
How to Improve Class Imbalance using Class Weights in Machine Learning? Available online: https://www.analyticsvidhya.com/blog/2020/10/improve-class-imbalance-class-weights/ (accessed on 28 January 2025).
Wang, L.; Kelly, B.; Lee, E.H.; Wang, H.; Zheng, J.; Zhang, W.; Halabi, S.; Liu, J.; Tian, Y.; Han, B.; et al. Multi-classifier-based identification of COVID-19 from chest computed tomography using generalizable and interpretable radiomics features. Eur. J. Radiol. 2021, 136, 109552. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Ghosh, A.; Jaenada, M.; Pardo, L. Classification of COVID19 Patients Using Robust Logistic Regression. J. Stat. Theory Pract. 2022, 16, 67. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Carvalho, E.D.; Carvalho, E.D.; de Carvalho Filho, A.O.; de Araújo, F.H.D.; Rabêlo, R.D.A.L. Diagnosis of COVID-19 in CT image using CNN and XGBoost. In Proceedings of the 2020 IEEE Symposium on Computers and Communications (ISCC), Rennes, France, 8–10 July 2020; pp. 1–6. [Google Scholar] [CrossRef]
Mohtasham, F.; Pourhoseingholi, M.; Hashemi Nazari, S.S.; Kavousi, K.; Zali, M.R. Comparative analysis of feature selection techniques for COVID-19 dataset. Sci. Rep. 2024, 14, 18627. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Representative examples of classes: (a) COVID-19 case; (b) CAP case; (c) Normal case.

Figure 2. System flowchart.

Figure 3. Six selected RFE features for Normal–COVID case.

Figure 4. Top 10 important features for Glmnet model trained over 32 Boruta features for the Normal–COVID case.

Figure 5. A total of 20 RFE selected features for the COVID–CAP case.

Figure 6. Top 10 important features for RF model trained over 20 RFE features for the COVID–CAP case.

Figure 7. Top 10 important features in RF model trained over 35 RFE features for the Normal–CAP case.

Figure 8. Top 10 important features in Glmnet model trained over 37 Boruta features for the Normal–Disease case.

Figure 9. Top 10 important features in XGBoost model trained over 26 RFE features for the Normal–COVID–CAP case.

Figure 10. Top 8 important features for the Normal–COVID–CAP best model.

Table 1. Selected feature sets based on different methods.

	t-Test	RFE	Boruta
Normal–COVID	562 (127 RL, 435 AV)	6 AV	32 (6 RL, 26 AV)
COVID–CAP	654 (297 RL, 357 AV)	20 (2 RL, 18 AV)	42 (6 RL, 36 AV)
Normal–CAP	1053 (440 RL, 613 AV)	35 (11 RL, 24 AV)	73 (32 RL, 41 AV)
Normal–Disease	736 (229 RL, 507 AV)	40 (5 RL, 35 AV)	37 (8 RL, 29AV) ¹
Normal–COVID–CAP	-	80 (16 RL, 64 AV)	61 (18 RL, 43 AV)

¹ Feature sets extracted with different methods were used for various training models in both two-class and multi-class cases.

Table 2. Summary of the best models for all cases.

Best Case
	Feature Selection Method	Number of Features	Algorithm	SMOTE Used	Avg Acc %	Avg Sens %	Avg Spec %	Tuning Parameters
Normal–COVID	Boruta	32	Glmnet	YES	83.11	82.89	83.21	alpha = 1 lambda = 1 × 10⁻⁴
COVID–CAP	RFE	20	RF	YES	88.77	90.95	82.67	mtry = 2
Normal–CAP	RFE	35	RF	NO	93.97	96.84	90.33	mtry = 27
Normal/Disease	Boruta	37	Glmnet	NO	94.63	88.15	95.83	alpha = 1 lambda = 1 × 10⁻⁴
Normal–COVID–CAP	RFE	26	XGBoost	NO	78.55	77.20	87.53	nrounds = 100, max_depth = 7 eta = 0.05 gamma = 0.01, colsample_bytree = 0.75 min_child_weight = 0 subsample = 0.5.

Table 3. Summary of research works relative to our study.

Research Work	Disease	Segmentation Method	Type of Segments	Type of Model—Algorithm	Type of Features	Training/Testing Ratio	AUC	Acc %	Sens %	Spec %
Bai HX, AI augmentation to distinguish COVID—other pneumonia	COVID-other pneumonia	Auto-segmentation + partially by radiologists	Whole lung	Deep learning neural network		70:20:10 (training/validation/testing)	0.90	87	89	86
Shi F. Large scale screening with infection size aware classification	COVID–CAP	Automated segmentation + handcrafted features	Infected lung regions and lung fields		Handcrafted location-specific features + radiomics	80:20		89.4	90.7	87.2
Chen HJ, ML- based Ct-radiomics	COVID–CAP	Semi-automated segmentation (auto lesion segmentation + radiologists’ refinement)	Lesions	SVM	Radiomics	85:15		84.3	92.3	81.6
Wu Q, Radiomics analysis for COVID-19 poor outcome pred.		Automated	Whole lung		Radiomics + clinical risk factors		0.862
Proposed 2-class COVID–CAP	COVID–CAP	Automated	Whole lung		Radiomics	75:25		88.77	90.95	82.67
Xu X, Deep learning to screen COVID-19	Normal -COVID–CAP	Automated	Infected regions	Deep learning neural network		85,6:14,4		86.7
Proposed 3-class model	Normal -COVID–CAP	Automated	Whole lung	XGBoost	Radiomics	75:25		78.55	77.20	87.53

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Paschaloudi, V.; Fotopoulos, D.; Chouvarda, I. Radiomics for Machine Learning—A Multi-Class System for the Automatic Detection of COVID-19 and Community-Acquired Pneumonia from Computed Tomography Images. BioMedInformatics 2025, 5, 21. https://doi.org/10.3390/biomedinformatics5020021

AMA Style

Paschaloudi V, Fotopoulos D, Chouvarda I. Radiomics for Machine Learning—A Multi-Class System for the Automatic Detection of COVID-19 and Community-Acquired Pneumonia from Computed Tomography Images. BioMedInformatics. 2025; 5(2):21. https://doi.org/10.3390/biomedinformatics5020021

Chicago/Turabian Style

Paschaloudi, Vasileia, Dimitris Fotopoulos, and Ioanna Chouvarda. 2025. "Radiomics for Machine Learning—A Multi-Class System for the Automatic Detection of COVID-19 and Community-Acquired Pneumonia from Computed Tomography Images" BioMedInformatics 5, no. 2: 21. https://doi.org/10.3390/biomedinformatics5020021

APA Style

Paschaloudi, V., Fotopoulos, D., & Chouvarda, I. (2025). Radiomics for Machine Learning—A Multi-Class System for the Automatic Detection of COVID-19 and Community-Acquired Pneumonia from Computed Tomography Images. BioMedInformatics, 5(2), 21. https://doi.org/10.3390/biomedinformatics5020021

Article Menu

Radiomics for Machine Learning—A Multi-Class System for the Automatic Detection of COVID-19 and Community-Acquired Pneumonia from Computed Tomography Images

Abstract

1. Introduction

2. Materials and Methods

3. Results

3.1. Two-Class Experiments

3.1.1. Normal–COVID

3.1.2. COVID–CAP

3.1.3. Normal–CAP

3.1.4. Normal–Disease

3.1.5. Summary for Two-Class Cases

3.2. Multi-Class Experiments

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI