1. Introduction
Strokes, a major global health issue, rank as the second leading cause of death and a top cause of disability worldwide [
1]. They occur when the blood supply to the brain is interrupted either by a blockage in a blood vessel (ischemic stroke) or a rupture (hemorrhagic stroke), leading to damage in brain cells and severe outcomes such as paralysis, speech difficulties, and cognitive impairment [
2]. The likelihood of a stroke is influenced by several risk factors, including high blood pressure, smoking, obesity, diabetes, and high cholesterol, which are modifiable through lifestyle changes and medical interventions. Other risk factors, such as age, gender, and family history, are non-modifiable and require careful monitoring [
3,
4]. Prevention, early recognition of symptoms, and prompt medical intervention are essential to mitigating a stroke’s severe impact [
5].
Carotid artery diseases, most notably stenosis, aneurysms, and dissections, are primary contributors to strokes. Stenosis, characterized by narrowing of the arteries due to plaque buildup, is a leading cause of ischemic strokes, especially when the blockage exceeds 70% [
6]. Aneurysms, which involve bulging of the artery wall, pose a risk of hemorrhagic stroke if they rupture. Dissections, where tears in the artery walls lead to blood-filled channels, also increase the risk of stroke [
7]. The timely diagnosis and treatment of these conditions are vital to reducing stroke incidence and preventing complications.
The structural morphology of the carotid arteries plays a significant role in the development of these diseases. The carotid artery consists of three layers: the intima, media, and adventitia. Damage to the innermost layer, the intima, can lead to the formation of atherosclerotic plaques, narrowing the artery and impeding blood flow. The common carotid artery (CCA) branches into the internal carotid artery (ICA) and external carotid artery (ECA), and their branching patterns influence the development of arterial plaques [
8,
9]. Understanding these structural dynamics is crucial in diagnosing and predicting carotid artery diseases.
Several well-established laws help quantify and analyze the characteristics of blood vessels and their flow dynamics. Murray’s laws, for example, describe how blood vessels optimize their structure to minimize the energy required for blood flow [
8,
9]. Hook’s laws further describe the relationship between the flow rate and vessel diameter, providing insights into vascular resistance and blood pressure. These principles, along with the minimum cost function, help researchers understand the structural and functional characteristics of blood vessels [
10,
11,
12,
13].
Given the complexity of carotid artery diseases, detailed analysis of blood flow dynamics is essential. Properties like branch angles, artery diameters, blood viscosity, and flow velocity are critical factors in the onset of a stroke. Cranio-cervical computed tomography angiography (CTA) allows for precise measurement of these properties, aiding in the diagnosis of arterial abnormalities such as stenosis and aneurysms [
14]. Studies have shown that geometrical features, such as the angle of the ICA, can increase the risk of stroke, particularly when the angle exceeds 25 degrees [
15,
16]. Similarly, differences in artery diameter, influenced by factors like sex and age, have been correlated with stroke risk [
17]. Ozdemir et al. identified that patients with aneurysms show larger bifurcation angles and artery diameters than healthy individuals, highlighting the role of structural deformation in disease progression [
18]. Conversely, stenosis and dissection appear to have little effect on the arterial structure, although dissection is triggered by the artery diameter [
19].
Artificial intelligence (AI), particularly machine learning (ML), has become increasingly influential in the diagnosis of complex diseases, including carotid artery disease (CAD). ML systems excel at identifying patterns from vast datasets without the need for explicit programming, significantly enhancing diagnostic accuracy in areas where traditional observational methods may fall short. In the context of carotid artery diseases, ML and deep learning (DL) techniques applied to carotid CT angiography (CTA) have shown remarkable potential. For instance, radiomics-based analysis of CTA images has demonstrated a superior ability to distinguish symptomatic from asymptomatic patients, achieving area under the curve (AUC) scores as high as 0.96 and outperforming conventional calcium scoring methods [
20,
21,
22].
The success of ML models in CAD diagnosis relies heavily on well-structured and relevant feature sets, which are crucial for both the accuracy and interpretability of these models [
23,
24]. By extracting detailed patterns from CTA images, ML systems provide insights beyond what human experts can readily observe, aiding in better stroke risk stratification and the identification of high-risk plaques. Consequently, the integration of ML with CTA offers a promising avenue for advancing CAD diagnostics, especially in craniocervical assessments [
23].
In [
20], a study analyzing 132 carotid arteries from both symptomatic (ischemic stroke and transient ischemic attack) and asymptomatic patients utilized carotid CT angiography (CTA) images to detect carotid artery disease (CAD). The study revealed a small but meaningful difference in the performance of advanced imaging techniques compared with traditional methods. Radiomics achieved a mean AUC of 0.96 for distinguishing symptomatic from asymptomatic arteries, while deep learning (DL) followed closely with a mean AUC of 0.86. The calcium score, determined using the Agatston method, obtained a mean AUC of 0.79. Although the performance gap between radiomics and DL was relatively minor at ±0.02, radiomics demonstrated a slightly higher accuracy. Furthermore, for multi-class classification, radiomics showed a stronger performance with a mean AUC of 0.95, compared with 0.79 for DL. With 132 carotid arteries assessed (41 culprit, 41 non-culprit, and 50 asymptomatic), the results suggest that while DL techniques hold promise, radiomics offers a slight advantage in precision. This highlights the importance of examining even small methodological variations, as they can lead to significant improvements in diagnostic accuracy.
The main aim of this study is to develop a super learner model which integrates Adaptive Boosting, gradient boosting, and random forests algorithms to enhance the diagnostic accuracy for carotid artery diseases. By identifying and selecting relevant features from cranio-cervical CTA images, our approach offers an effective tool for diagnosing these diseases and improving health outcomes. Moreover, we provide a publicly accessible dataset to facilitate further research and reproducibility within the field. The proposed model demonstrates superior accuracy compared with existing methods, underscoring the potential of ML in medical diagnostics.
This manuscript is organized as follows.
Section 2 outlines the materials and methods, including the data collection process and the machine learning algorithms utilized. In
Section 3, we present techniques such as k-fold cross-validation, bootstrapping, data augmentation, and the SMOTE to enhance model robustness and generalization, particularly for datasets with limited instances. The super learner ensemble model, which integrates XGBoost, random forests, and AdaBoost, achieves a final accuracy of
, demonstrating superior performance.
Section 4 details the results, including the performance metrics of the super learner model and its comparison with state-of-the-art approaches.
Section 5 addresses the application of the SMOTE to balance the dataset by generating 880 additional synthetic samples, increasing the dataset size from 120 to 1000 instances. The SMOTE-augmented dataset retained medical plausibility, improved model generalization, and significantly enhanced performance for minority classes, such as aneurysm and dissection cases. In
Section 6, we compare the model accuracy between the original and SMOTE-expanded datasets.
Section 7 further compares the performance between the original dataset and the SMOTE-augmented dataset, which was optimized using Optuna. Finally,
Section 8 discusses the findings in relation to the existing literature and highlights potential future research directions which could open new perspectives in the development of robust logical frameworks in the context of machine learning applications.
2. Materials and Methods
In this section, we employ several statistical and machine learning techniques to analyze and interpret our data effectively. The Pearson correlation coefficient (PCC) was used to measure the linear correlation between two variables, providing a quantifiable indication of the strength and direction of their relationship. This was crucial for identifying the highly correlated features in our dataset. The chi-squared test was implemented to assess the association between categorical variables, enabling us to determine if the observed frequencies deviated significantly from the expected frequencies under the null hypothesis of independence.
To ensure our data were on a comparable scale, we applied standard deviation normalization, also known as z-score normalization. This technique transformed the data to have a mean of zero and a standard deviation of one, thereby standardizing the contribution of each feature to the analysis. Lasso regularization was employed for feature selection and regularization, adding a penalty to the absolute size of the regression coefficients to improve prediction accuracy and model interpretability.
Recursive feature elimination (RFE) was utilized as a robust feature selection method, iteratively fitting a model and removing the least important features to enhance the model’s performance. This method, along with embedded methods like random forests and gradient-boosted trees, helped in identifying the most significant predictors by ranking features based on their importance.
Various machine learning algorithms, including Extreme Gradient Boosting (XGBoost), Light Gradient-Boosting Machine (LightGBM), random forests (RF), bootstrap aggregation (Bagging), Adaptive Boosting (AdaBoost), and Extremely Randomized Trees (ExtraTrees), were employed to build and optimize the model for detecting carotid artery diseases. Each algorithm was evaluated using performance metrics such as the accuracy, precision, recall, and F score. The super learner model, an ensemble of these optimized algorithms, was developed to improve the predictive accuracy by leveraging the strengths of individual models. Data collection involved obtaining CTA images from 122 patients, which were then analyzed to measure the physical properties of the carotid arteries. Feature selection mechanisms combined multiple methods to identify relevant predictors, and standard deviation normalization ensured consistent scaling of the data.
2.1. Pearson Correlation Coefficient
The Pearson correlation coefficient (PCC) is a measure of the linear correlation between two variables
X and
Y [
25]. It quantifies the degree to which a linear relationship exists between them. The PCC is calculated using the formula
In this equation, and are the individual data points, while and represent the mean values of X and Y, respectively. The PCC value ranges between −1 and 1, where values closer to 1 indicate a strong positive correlation, values closer to −1 indicate a strong negative correlation, and values near 0 suggest no linear correlation.
2.2. Chi-Squared Test
The chi-squared test is a statistical method used to assess whether there is a significant association between two categorical variables [
26]. The test statistic is calculated as follows:
Here, represents the observed frequency, and represents the expected frequency under the null hypothesis of independence. This test is particularly useful in categorical data analysis, where the goal is to determine if the distribution of the sample categorical data matches an expected distribution.
2.3. Standard Deviation Normalization (Z-Score)
Standard deviation normalization, or z-score normalization, is a method used to normalize data by transforming the features to have a mean of 0 and a standard deviation of 1 [
27]. The z-score for a given data point
x is computed as follows:
In this formula, x is the value to be normalized, is the mean of the dataset, and is the standard deviation of the dataset. This transformation ensures that each feature contributes equally to the analysis.
2.4. Feature Importance via Lasso Regularization
Lasso regularization is a type of linear regression which includes a penalty term for the absolute size of the regression coefficients [
28]. It performs both variable selection and regularization to enhance the prediction accuracy and interpretability of the statistical model it produces. The lasso objective function is
Here, is the dependent variable, represents the independent variables, represents the model coefficients, n is the number of observations, p is the number of predictors, and is the regularization parameter.
2.5. Recursive Feature Elimination (RFE)
Recursive feature elimination (RFE) is a feature selection method which fits a model and removes the weakest feature (or features) recursively until the specified number of features is reached [
29]. The algorithm proceeds as follows:
- 1.
Fit the model to the data.
- 2.
Rank the features based on their importance.
- 3.
Remove the least important feature(s).
- 4.
Repeat the process until the desired number of features is obtained.
2.6. Machine Learning Models
Several machine learning algorithms were used to build and optimize the model for detecting carotid artery diseases. These algorithms include the following:
Extreme Gradient Boosting (XGBoost): This algorithm uses gradient boosting framework which improves the performa nce of decision trees by combining multiple weak models.
Light Gradient Boosting Machine (LightGBM): This is an efficient and effective gradient boosting framework which uses decision tree algorithms. It is designed for quick and accurate model training.
Random forests (RF): This is an ensemble learning method which operates by constructing multiple decision trees during training and outputting the mode of the classes.
Bootstrap aggregation (bagging): This method reduces variance by training multiple models on different subsets of the data and averaging their predictions.
Adaptive Boosting (AdaBoost): This is a boosting technique which combines multiple weak classifiers to create a strong classifier.
Extremely Randomized Trees (ExtraTrees): This method randomizes the choice of the split point and features to reduce variance in high-dimensional data.
The performance of each model was evaluated using metrics such as the accuracy, precision, recall, and F score. The models were optimized using Optuna optimization software to find the best hyperparameters for each algorithm.
A super learner model was developed by combining the optimized versions of the individual machine learning algorithms. This ensemble model was designed to improve the predictive accuracy by leveraging the strengths of each algorithm. The final super learner model was constructed by combining the XGBoost, random gorest, and bagging algorithms, achieving an overall accuracy of 0.90.
2.7. Data Collection
In this study, CTA images from 122 patients obtained from Ege University Hospital (protocol number/code: 19-9.1T/6) were selected to measure some physical properties of the vessels. These CTA images belong to diseases classified as stenosis (30 persons), aneurysm (30 persons), and dissection (31 persons). As an addition, the persons without disease (31 persons) were evaluated as normal. In accordance with these data, CTA was used to obtained the diameters and angles of the CCA, ICA, and ECA. Here, CCA, ICA, and ECA are abbreviations of the diameters of the common, internal, and external carotid arteries, respectively. The illustrations of these CCA, ICA, and ECA branches and bifurcation angles between them are shown in
Figure 1. The CCA, ICA, and ECA and bifurcation angle values of all persons (normal and patients) were measured with the help of CTA images with a great accuracy. The angles and the diameter values for all patients (normal, stenosis, aneurysm, and dissection) were obtained by taking the bisectors of the ICA, ECA, and CCA and from the bulber segments, respectively. The values obtained from every patient were taken more than once, and their averages were used for the calculations. As shown in
Figure 2 and
Figure 3, the images obtained from CTA were analyzed with the help of Sectra (Sectra Workstation IDS7 for Windows Version, Sectra AB, Sweden) and AW Server 2 (AW SERVER 2.0 EXT. 7.1 SOFTWARE AND DOCS DVD by GE Healthcare, Chicago, IL, USA) programs. The parameters used in CTA were kVp, mA, rotation time, section thickness, pitch value, coverage, kernel filter, medium, matrix, and FOV, and their values were 120, dose modulated, 0.3 s, 0.6 mm,
, 76.8 mm, 326 f, smooth,
×
, and 230 mm, respectively [
18,
30].
2.8. Feature Extraction and Preprocessing
To identify the most significant features from the dataset, we employed a comprehensive feature selection (FS) mechanism which integrates three types of feature selection algorithms: filter-based, embedded, and hybrid. Each approach has its advantages and drawbacks, and by combining them, we aimed to enhance the accuracy and reduce the complexity of the machine learning (ML) model [
31]:
To further refine the model,
Table 1,
Table 2,
Table 3 and
Table 4 provide detailed demographic and clinical measurements of four different groups of cases: normal, stenosis, aneurysm, and dissection, respectively. The demographic characteristics in each table include key attributes such as the age, measured angle data, and diameters of various arterial regions. These tables include both male and female participants, with a variety of minimum, maximum, mean, and standard deviation (STD) values for each attribute.
Normal cases: The demographic structure of the study’s normal sample included a total of 31 participants, with 16 females and 15 males. The ages of the participants ranged from 30 to 79 years, with a mean age of 55.45 years and a standard deviation of 14.16, indicating a moderately diverse age range. The difference between the measured and the predicted values (based on Murray’s law) varied between −0.456 and 0.165, with a mean of −0.102 and a standard deviation of 0.153, suggesting a slight tendency for the measured values to be lower than predicted.
Regarding anatomical measurements, the right internal carotid artery diameter (ICAdiaR) varied from 4.4 mm to 8.0 mm, with a mean of 5.703 mm and a standard deviation of 0.812, reflecting relatively low variability. The left external carotid artery angle (Car.Angle.L) spanned from 12° to 58°, with an average of 27.2° and a standard deviation of 12.639, indicating notable variation. The total right carotid angles (Car.Angle.R) ranged from 20.8° to 76.0°, with a mean of 46.65° and a standard deviation of 14.934, while the total left carotid angles (Car.Angle.L) varied more significantly, ranging from 26.0° to 105.4°, with a mean of 50.97° and a standard deviation of 19.198. The right carotid angle (Car.Angle.R) had values between 6.3° and 54.1°, with a mean of 23.05° and a standard deviation of 11.427, while the left carotid angle (Car.Angle.L) ranged from 4.7° to 49.0°, with a mean of 23.77° and a standard deviation of 12.48. The combined right and left carotid angles (Car.Angle.(R+L)) spanned from 6.1° to 47.65°, with an average of 23.41° and a standard deviation of 10.836. Lastly, the right common carotid artery diameter (CCAdiaR) ranged from 5.1 mm to 10.7 mm, with a mean of 7.161 mm and a standard deviation of 1.049, showing relatively consistent measurements across the participants.
Stenosis cases: The demographic structure of stenosis cases consists of 30 participants, with 10 females and 20 males. The ages of the participants ranged from 52 to 70 years, with a mean age of 63.73 years and a standard deviation of 4.948, indicating a somewhat narrow age range. The difference between the measured and the predicted values (based on Murray’s law) varied between −0.458 and 0.093, with a mean of −0.177 and a standard deviation of 0.140, suggesting that the measured values tended to be lower than predicted.
In terms of anatomical measurements, the right internal carotid artery diameter (ICAdiaR) ranged from 3.5 mm to 10.5 mm, with a mean of 5.707 mm and a standard deviation of 1.484, indicating notable variability. The left external carotid artery angle (Car.Angle.L) spanned from −67.3° to 45.0°, with an average of 16.77° and a standard deviation of 20.079, suggesting significant variation. The total right carotid angles (Car.Angle.R) ranged from 22.6° to 66.4°, with a mean of 41.51° and a standard deviation of 10.079, while the total left carotid angles (Car.Angle.L) ranged from 10.7° to 83.9°, with a mean of 42.24° and a standard deviation of 16.088, indicating more variability on the left side.
The right carotid angle (Car.Angle.R) ranged from 3.0° to 36.6°, with a mean of 17.44° and a standard deviation of 7.68, while the left carotid angle (Car.Angle.L) ranged from 4.0° to 78.0°, with a mean of 25.48° and a standard deviation of 15.115, reflecting wider variability on the left side. The total right and left carotid angles (Car.Angle.(R+L)) ranged from 9.35° to 53.7°, with a mean of 23.46° and a standard deviation of 9.511. Lastly, the right common carotid artery diameter (CCAdiaR) spanned from 5.9 mm to 11.1 mm, with a mean of 7.748 mm and a standard deviation of 1.898, showing relatively consistent measurements.
Aneurysm cases: The demographic structure of aneurysm cases consists of 30 participants, with 13 females and 17 males. The ages of the participants ranged from 33 to 74 years, with a mean age of 53.17 years and a standard deviation of 11.68, indicating moderate age variation. The difference between the measured and the predicted values (based on Murray’s law) varied between −0.391 and 0.764, with a mean of 0.101 and a standard deviation of 0.258, showing greater deviation compared with the other cases.
In terms of anatomical measurements, the right internal carotid artery diameter (ICAdiaR) ranged from 3.6 mm to 9.4 mm, with a mean of 6.693 mm and a standard deviation of 1.371, indicating a wider variation in the artery diameter. The left external carotid artery angle (Car.Angle.L) ranged from 0.0° to 46.9°, with an average of 21.08° and a standard deviation of 11.115, reflecting significant variability. The total right carotid angles (Car.Angle.R) spanned from 29.2° to 125.0°, with a mean of 53.997° and a standard deviation of 19.547, while the total left carotid angles (Car.Angle.L) ranged from 24.1° to 102.6°, with a mean of 52.317° and a standard deviation of 17.026, indicating similar variability on both sides.
The right carotid angle (Car.Angle.R) ranged from 4.3° to 67.1°, with a mean of 28.76° and a standard deviation of 13.267, while the left carotid angle (Car.Angle.L) ranged from 8.1° to 63.8°, with a mean of 31.23° and a standard deviation of 14.587, reflecting higher variation. The total right and left carotid angles (Car.Angle.(R+L)) spanned from 8.05° to 61.4°, with a mean of 29.998° and a standard deviation of 12.894. Lastly, the right common carotid artery diameter (CCAdiaR) ranged from 4.9 mm to 10.8 mm, with a mean of 7.393 mm and a standard deviation of 1.353, showing relatively consistent measurements across the participants.
Dissection cases: The demographic structure of the dissection cases included 31 participants, with 18 females and 13 males. The ages of the participants ranged from 27 to 76 years, with a mean age of 48.71 years and a standard deviation of 10.558, reflecting moderate age variation. The difference between the measured and the predicted values (based on Murray’s law) ranged from −0.429 to 0.256, with a mean of −0.817 and a standard deviation of 0.177, indicating a tendency for the measured values to be lower than predicted.
Regarding anatomical measurements, the right internal carotid artery diameter (ICAdiaR) spanned from 3.3 mm to 6.7 mm, with a mean of 4.877 mm and a standard deviation of 0.978, showing moderate variability. The left external carotid artery angle (Car.Angle.L) ranged from 1.0° to 61.2°, with a mean of 22.094° and a standard deviation of 12.735, reflecting significant variation. The total right carotid angles (Car.Angle.R) varied between 23.8° and 99.9°, with a mean of 49.629° and a standard deviation of 16.831, while the total left carotid angles (Car.Angle.L) spanned from 25.2° to 93.8°, with a mean of 51.545° and a standard deviation of 16.909, indicating similar variability on both sides.
The right carotid angle (Car.Angle.R) ranged from 7.2° to 40.8°, with a mean of 23.516° and a standard deviation of 8.299, while the left carotid angle (Car.Angle.L) varied from 7.4° to 55.9°, with a mean of 29.452° and a standard deviation of 12.408, showing greater variation on the left side. The total right and left carotid angles (Car.Angle.(R+L)) ranged from 12.7° to 46.85°, with a mean of 26.484° and a standard deviation of 9.458. Lastly, the right common carotid artery diameter (CCAdiaR) ranged from 4.9 mm to 8.9 mm, with a mean of 6.261 mm and a standard deviation of 0.888, indicating relatively consistent measurements across the participants.
Finally,
Table 5 is a summary table which combines the data from all four previous tables. This table provides an overall demographic overview of the entire dataset. The general demographic structure included a total of 122 participants, with 57 females and 65 males. The participants’ ages ranged from 27 to 79 years, with a mean age of approximately 54.84 years and a standard deviation of around 12.34, reflecting a relatively broad age distribution. The difference between the measured
and the predicted
values varied from −0.458 to 0.764 across the cases, with an overall mean of approximately −0.248 and a standard deviation of 0.189, indicating a tendency for the measured values to be lower than the predicted values. The measured
values spanned from 0.479 to 0.994, with a mean of 0.887 and a standard deviation of 0.090. The right internal carotid artery diameter (ICAdiaR) ranged from 3.3 mm to 10.5 mm, with an overall mean of 5.745 mm and a standard deviation of 1.161, indicating moderate variation.
The left external carotid artery angle (Car.Angle.L) varied significantly from −67.3° to 61.2°, with a mean of 21.03° and a standard deviation of 14.067. The total right carotid angles (Car.Angle.R) spanned from 20.8° to 125.0°, with a mean of 47.47° and a standard deviation of 15.848, while the total left carotid angles (Car.Angle.L) ranged from 10.7° to 105.4°, with a mean of 49.52° and a standard deviation of 17.305. The right carotid angle (Car.Angle.R) ranged from 3.0° to 67.1°, with a mean of 23.74° and a standard deviation of 10.668, while the left carotid angle (Car.Angle.L) spanned from 4.7° to 78.0°, with a mean of 27.23° and a standard deviation of 13.148. The combined right and left carotid angles (Car.Angle.(R+L)) ranged from 6.1° to 61.4°, with a mean of 25.34° and a standard deviation of 10.74. Lastly, the right common carotid artery diameter (CCAdiaR) ranged from 4.9 mm to 11.1 mm, with an overall mean of 6.641 mm and a standard deviation of 1.214, reflecting consistent measurements across the participants.
This general overview offers a comprehensive summary of the variability and commonalities observed across the cases.
The primary goal of our feature selection approach was to identify common significant features selected by all the employed methods. By accomplishing this, we ensured that the training dataset included only the most relevant features, thereby improving the model’s interpretability and performance. At this point, it should be also noted that the dataset was split into training and test sets prior to feature selection to ensure the integrity of the model evaluation and avoid data leakage.The relevant predictors for diagnosing carotid artery diseases are summarized in
Table 5.
Since the variables for carotid artery diseases are sparse, and the presence of outliers may negatively affect the predictive performance of ML models, scaling the data is essential. We accomplished this by applying standard deviation normalization (z-score normalization), which scales each feature by dividing its values by the standard deviation. This transformation ensures that all features contribute equally to the analysis. The z-score was calculated as shown in Equation (
3), where
is the data point,
is the mean, and
is the standard deviation. This normalization process is discussed in more detail in
Section 2.3.
3. Enhanced Model Robustness Techniques
To mitigate the challenges posed by the relatively limited dataset, we implemented several advanced techniques to improve the model robustness and accuracy. These included k-fold cross-validation, bootstrapping, data augmentation, and synthetic data generation using the SMOTE.
3.1. Cross-Validation
We employed k-fold cross-validation to optimize the model’s validation process. Specifically, we used five-fold cross-validation, where the dataset was divided into five subsets. In each iteration, the model was trained on four subsets and validated on the remaining subset. This process was repeated five times, ensuring that each subset was used as a validation set once. The following Python code was used for the implementation:
from sklearn.model_selection import KFold
kf = KFold(n_splits=5)
for train_index, test_index in kf.split(X):
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
model.fit(X_train, y_train)
predictions = model.predict(X_test)
3.2. Bootstrapping
We used bootstrapping to create multiple resampled versions of the training data, thereby increasing the number of training examples and allowing for better model generalization. This is particularly helpful in a small dataset scenario, where variance reduction is crucial. Bootstrapping was implemented as shown below:
from sklearn.utils import resample
X_bootstrap, y_bootstrap = resample(X_train, y_train,
replace=True,
n_samples=len(X_train))
model.fit(X_bootstrap, y_bootstrap)
3.3. Data Augmentation
Although data augmentation is typically applied in image classification tasks, we adapted this technique to our dataset by introducing random perturbations to key features, such as artery diameters and bifurcation angles. Gaussian noise was added to simulate small variations, which increased the training data’s diversity:
import numpy as np
noise = np.random.normal(0, 0.01, X_train.shape)
X_augmented = X_train + noise
model.fit(X_augmented, y_train)
3.4. Synthetic Data Generation (SMOTE)
We applied the synthetic minority oversampling technique (SMOTE) to generate synthetic examples for the minority classes in our dataset, such as aneurysms and dissections. This technique helped preserve the balance of the dataset by creating new samples from the existing data:
from imblearn.over_sampling import SMOTE
smote = SMOTE()
X_resampled, y_resampled = smote.fit_resample(X_train, y_train)
model.fit(X_resampled, y_resampled)
3.5. Ensemble Learning
We enhanced the robustness of the model by utilizing an ensemble learning approach. The super learner model combines optimized versions of XGBoost, random forests, and AdaBoost, among other algorithms. This ensemble method leverages the strengths of each algorithm to achieve superior performance:
from sklearn.ensemble import StackingClassifier
estimators = [
(’rf’, RandomForestClassifier(n_estimators=100)),
(’xgb’, XGBClassifier()),
(’ada’, AdaBoostClassifier())
]
stacking_model = StackingClassifier(estimators=estimators,
final_estimator=LogisticRegression())
stacking_model.fit(X_train, y_train)
After implementing these robustness-enhancing techniques, the final super learner model demonstrated a significant improvement in accuracy and robustness, with a final accuracy of . The integration of cross-validation, bootstrapping, data augmentation, and the SMOTE contributed to this result by ensuring better generalization across various datasets and reducing overfitting.
4. Construction of Carotid Artery Disease Detection Model
The difficulties encountered by systems which rely on hard-coded knowledge indicate that artificial intelligence (AI) systems require the capability to learn independently by extracting patterns from raw data [
23]. This capability is referred to as machine learning (ML). ML is a type of applied statistics which uses computers to statistically estimate complex functions, enabling them to solve real-world problems and make decisions that appear subjective. The main aim of this study is to investigate the applicability of different ML techniques in detecting carotid artery diseases.
We began by analyzing various types of ML algorithms and tuning their hyperparameters using the optimization software
Optuna (
https://github.com/pfnet/optuna/ accessed on 26 September 2024) [
34]. After optimizing the candidate algorithms, we built a super learner model by creating a weighted combination of these candidates. The following ML algorithms were considered candidate learners:
Extreme Gradient Boosting (XGBoost): XGBoost is a supervised ML algorithm based on the tree boosting method [
35]. It is an ensemble learning algorithm which creates a final model from a collection of individual models, typically decision trees. XGBoost uses gradient descent to optimize weights and minimize the loss function, considering second-order gradients to improve model performance.
Light Gradient-Boosting Machine (LightGBM): LightGBM is a variant of gradient boosting which achieves superior performance, especially with high-dimensional data and large datasets [
36]. It employs two novel techniques—Gradient-based one-side sampling (GOSS) and exclusive feature bundling (EFB)—which enhance the training speed and efficiency. Like XGBoost, LightGBM is based on decision tree algorithms.
Random forests (RF): RF is a robust ensemble learning method based on a collection of decision trees [
37]. Each tree is constructed using a random vector sampled independently but with the same distribution across all trees. Nodes in the decision trees are split based on measures like entropy or the Gini index.
Bootstrap aggregation (bagging): Bagging is an ensemble learning technique which reduces variance in noisy datasets and is considered an extension of the random forests algorithm [
38]. It involves selecting random samples of data with replacement, training multiple models independently, and averaging their predictions for improved accuracy.
Adaptive Boosting (AdaBoost): AdaBoost is a boosting approach which generates a robust classifier from a set of weak classifiers. It maintains weights over the training data and adjusts them adaptively after each learning cycle, increasing the weights for incorrectly classified samples and decreasing the weights for correctly classified ones.
Extremely Randomized Trees (ExtraTrees): ExtraTrees is an ensemble learning technique based on decision trees. Unlike RF, where the tree splits are deterministic, ExtraTrees uses randomized splits, providing a robust approach for high-dimensional data by balancing bias and variance.
In
Figure 4, the workflow chart involves a systematic process to predict patient outcomes using machine learning techniques. Initially, a dataset comprising patient data is collected and preprocessed, where feature selection is conducted to identify the most relevant variables, reducing the dataset’s dimensionality. These selected features are then scaled to ensure uniformity across all variables. The dataset is subsequently divided into a training set and a test set, with
of the data used for training and
reserved for testing. A variety of basic machine learning models, including XGBoost, LGBM, random forests, bagging, AdaBoost, and ExtraTrees, are trained on the training set. Each model’s performance is evaluated, and optimization techniques are applied to enhance their accuracy. Following optimization, a super learner model is constructed by combining the top-performing models, specifically AdaBoost, LGBM, and random forests. The final super learner model achieved the highest performance, illustrating the value of model optimization and ensemble techniques in predictive analytics.
To establish the hyperparameter search for the ML algorithms, we employed
Optuna optimization software, since it is an efficient hyperparameter optimization framework which tunes machine learning models by intelligently exploring the hyperparameter search space. It uses a tree-structured Parzen estimator (TPE) as a probabilistic model to guide the search, balancing exploration and exploitation. Optuna evaluates different sets of hyperparameters by running trials and selecting those which improve the model’s performance based on a specified objective, such as minimizing validation loss or maximizing accuracy. The framework also employs pruning strategies which stop unpromising trials early, reducing computational cost by focusing on the hyperparameters. The primary advantage of
Optuna is its efficient hyperparameter search, facilitate by its
define-by-run principle.
Table 6 provides a summary of each algorithm’s performance across accuracy scores before the ensemble. According to this, AdaBoost demonstrated strength in handling weak learners but was sensitive to noisy data, which affected its overall performance. Random forests performed well in terms of avoiding overfitting and generalizing across different data subsets, although it had difficulty distinguishing fine-grained class distinctions. XGBoost was particularly effective at handling complex patterns and sparse data but required careful tuning to maximize performance. LightGBM had the highest accuracy and offered faster training times, though it was sensitive to imbalanced datasets. Bagging, with similar accuracy to LightGBM, provided reliable generalization by reducing variance but lacked the fine-tuned control of boosting algorithms. Finally, ExtraTrees, though proficient at managing noisy data due to its randomized split point selection, had reduced precision and lower overall performance. These findings highlight how combining these algorithms in an ensemble capitalizes on their complementary strengths. Bagging and random forests contributed to variance reduction and robustness, XGBoost and LightGBM exceled in managing complex datasets, AdaBoost focused on improving weak learners, and ExtraTrees managed noisy data. This ensemble approach resulted in a more generalized and accurate model than any of the individual algorithms could achieve on their own.
To achieve higher predictive accuracy, we implemented stacked generalization of the distinct candidate learners, namely in an ensemble method [
39,
40]. In this approach, lower-level predictive algorithms (candidate learners) are combined into a higher-level model called a super learner (SL) [
41]. After developing the SL, we used
Optuna again to find the best combination of candidate learners. The best SL model, which combined the XGBoost, RF, and bagging algorithms, achieved an overall accuracy of 0.90.
Table 7 presents the performance metrics, namely the precision, recall, and F score, for the corresponding classes in the test data.
5. Dataset Expansion Using SMOTE
In the recent literature, it has been observed that one study [
42] utilized the Nasarian Coronary Artery Disease (CAD) dataset, which incorporates both workplace and environmental factors alongside clinical features. The results demonstrated that the proposed feature selection method achieved a classification accuracy of
by employing the SMOTE technique and the XGBoost classifier. The synthetic minority oversampling technique (SMOTE) approach plays a crucial role in addressing class imbalance within the dataset, improving model performance by generating synthetic examples of the minority class and thereby ensuring a more balanced training process. This technique, which is widely used in contemporary research, significantly enhances the robustness and accuracy of machine learning models when applied to imbalanced datasets like the CAD dataset in this study.
In order to increase the amount of data for this work, we applied the synthetic minority oversampling technique (SMOTE). The original dataset consisted of 120 instances, with 80 normal cases and 40 instances of aneurysm and dissection cases, leading to scarcity. Through the SMOTE, we increased the number of minority class instances, expanding the dataset to approximately 1000 samples.
5.1. SMOTE Process
The SMOTE augmentation involved oversampling the minority classes by generating 880 additional synthetic samples and increasing the minority class size from 40 to 920 instances while keeping the 80 majority class samples unchanged. This resulted in a total of 1000 instances, creating and preserving a balanced dataset as in
Table 8.
5.2. Outcome of SMOTE Application
The SMOTE application yielded the following benefits:
Expanding the dataset to 1000 samples.
Addressing class scarcity by increasing the minority class to 920 instances.
Enhancing model accuracy, particularly for the minority classes.
The SMOTE-augmented dataset provided a more balanced and robust training set, improving model generalization across both the majority and minority classes.
Moreover, we employed several measures to ensure the quality and reliability of the SMOTE-augmented dataset. The generated features, such as age, carotid angles, and artery diameters, were kept within realistic ranges observed in the original data, ensuring that the augmented data maintained medical plausibility. The SMOTE generated synthetic samples by interpolating between existing minority class data points, which avoided duplication and reduced the risk of overfitting.
We validated the augmented data by comparing statistical properties such as the mean, standard deviation, minimum, and maximum values with the original dataset, ensuring consistency. Additionally, related attributes like the artery diameters and age were cross-verified to preserve the expected medical relationships. The sex and age distributions were balanced to reflect equal representation of male and female subjects, maintaining the overall demographic structure.
Finally, consistency checks were implemented throughout the augmentation process to ensure data integrity, avoiding outliers and unrealistic values. These steps resulted in a robust and reliable dataset for training and evaluating machine learning models in the diagnosis of carotid artery diseases.
7. Performance Comparison: Original versus SMOTE with Optuna
We also compared the models’ performance on both datasets after applying Optuna optimization. The SMOTE balanced the dataset, while Optuna optimized the hyperparameters of the models, leading to significant improvements.
7.1. Optuna Optimization on Original Data and on SMOTE-Expanded Data
Table 6 presents the accuracy scores after Optuna optimization on the original dataset.
Table 12 shows the accuracy scores for the models trained on the SMOTE-augmented dataset after Optuna optimization. Balancing the dataset with the SMOTE, coupled with efficient hyperparameter tuning via Optuna, further enhanced model performance.
7.2. Comparison of Performance
A performance comparison between the original dataset and the SMOTE-balanced dataset after Optuna optimization is presented in
Table 13. Across all models, balancing the dataset with the SMOTE led to higher accuracies, with the most significant gains observed in models like XGBoost, LightGBM, and AdaBoost.
As a result, the comparison between the original and SMOTE-augmented datasets (
Table 13) highlights how the SMOTE and Optuna together improved model accuracy. Noteworthy observations include the following:
XGBoost: Accuracy increased from 0.81 to 0.89 after SMOTE and Optuna optimization.
LightGBM: Accuracy improved from 0.86 to 0.91, making it the top performer.
Bagging and random forests: Bagging’s accuracy increased from 0.86 to 0.90, while that of random forests improved from 0.81 to 0.88.
AdaBoost and ExtraTrees: AdaBoost’s accuracy improved from 0.81 to 0.87, while that of ExtraTrees increased from 0.71 to 0.79.
Overall, the models benefited significantly from the balanced dataset provided by the SMOTE, with each model demonstrating improved performance after hyperparameter optimization with Optuna. The comparison shows that using the SMOTE, followed by Optuna optimization, enhanced model accuracy across all tested algorithms.
To further improve the predictive accuracy on the SMOTE-augmented dataset, we implemented stacked generalization of various optimized candidate learners through an ensemble technique [
39,
40]. This method combines lower-level models (candidate learners) into a higher-level ensemble model known as the super learner (SL) model [
41]. After constructing the SL model using the SMOTE-augmented data, we employed
Optuna once again to identify the optimal combination of candidate learners for the best performance.
The best SL model, integrating the XGBoost, random forests (RF), and bagging algorithms, achieved an overall accuracy of 0.91 on the SMOTE-augmented dataset. This model showed improved performance, especially in handling the minority classes due to the balanced nature of the data.
Table 14 summarizes the classification report, detailing the precision, recall, and F scores for the individual classes within the test data. The results highlight the improved classification performance for all classes, particularly the minority categories such as aneurysm and dissection cases, which previously suffered from scarcity of representation.
The SL model trained on the SMOTE-augmented dataset outperformed the model trained on the original data. The overall accuracy reached 0.91, with significant improvements in the precision and recall for the minority classes. This performance highlights the effectiveness of combining the SMOTE for data balancing and Optuna for optimizing the ensemble of models.
7.3. Performance Metrics
The performance of the proposed ML model was evaluated using four metrics—accuracy, precision, recall, and F score—which are commonly used in classification problems. These metrics were computed as follows:
Here, , , , and represent true negative, true positive, false negative, and false positive, respectively. Higher values of these metrics indicate a model with accurate predictive capabilities:
True negative (TN): A TN is an outcome where the model correctly identifies non-carotid artery diseases.
True positive (TP): A TP is an outcome where the model correctly identifies carotid artery diseases.
False negative (FN): An FN is an outcome where the model incorrectly identifies non-carotid artery diseases.
False positive (FP): An FP is an outcome where the model incorrectly identifies carotid artery diseases.
Accordingly, it should be emphasized that for the above performance metrics, values closer to one indicate a model with high predictive accuracy.
8. Conclusions
Carotid artery diseases are associated with high mortality rates, making early diagnosis and prevention critical. The use of AI techniques, as demonstrated in this study, holds significant potential in assisting with diagnosis and reducing mortality. Despite their success in healthcare, AI systems face certain limitations, largely influenced by the quality and quantity of data available. The accuracy of these models depends on the relevance of the data used, and irrelevant or insufficient data can lead to suboptimal performance.
In this study, we explored the relationship between structural deterioration in vascular branches and carotid artery diseases such as stenosis, aneurysm, and dissection. Specifically, the diameters of common, internal, and external carotid arteries were selected as key features for data preparation in constructing the machine learning model to diagnose ischemic events. The rationale behind this was the direct influence of blood flow patterns on vessel walls, which impact endothelial cells and contribute to disease development. Therefore, the angles and branching geometry of arteries play a vital role in the creation of predictive AI models.
We utilized a combination of feature selection methods to eliminate non-informative predictors, making the developed model more interpretable and computationally efficient. The feature selection process revealed a strong connection between the structural properties of vessels and carotid artery diseases. Additionally, we developed a super learner (SL) model based on ensemble methods, including random forests (RF), AdaBoost, XGBoost, and LightGBM, to diagnose carotid artery diseases. Optuna optimization was employed to fine-tune the hyperparameters, ensuring minimal generalization error. The final SL model, combining AdaBoost, LightGBM, and RF, exhibited the best performance in diagnosing these diseases, as detailed in
Table 6. The findings indicate that (1) the structural properties of carotid arteries are closely linked to disease, and (2) the developed feature selection and model architecture have strong potential for accurate disease diagnosis and prediction.
This study also presented a comprehensive analysis of machine learning techniques applied to medical datasets, emphasizing model robustness and generalization. Methods such as k-fold cross-validation, bootstrapping, data augmentation, and the SMOTE were particularly effective in improving model performance on small datasets. The siper learner ensemble model achieved accuracy, with the SMOTE-augmented data significantly improving the prediction accuracy for minority classes like aneurysms and dissections. Optuna’s optimization further confirmed the advantages of data balancing and ensemble learning techniques for enhancing diagnostic capabilities.
Future Directions
This study is structured in three stages. The first stage, presented in this manuscript, focused on creating an initial machine learning model using a dataset of 122 instances, which was expanded to 1000 instances using the SMOTE without compromising data balance. This significantly improved the model’s ability to classify minority cases.
The second stage will involve automating data extraction using deep learning techniques as more data become available. These processes will be integrated into the existing model to improve its accuracy and scalability.
The final stage will focus on integrating the machine learning model with deep learning-based data extraction, creating a comprehensive system which can process large-scale medical data and provide accurate diagnostic insights automatically. This integration will aim to refine the overall system’s efficiency and accuracy, offering valuable contributions to medical AI applications.