1. Introduction
Acute Aortic Syndrome (AAS) encompasses a spectrum of life-threatening conditions, including Acute Aortic Dissection (AAD), Intramural Hematoma (IMH), and Penetrating Aortic Ulcer (PAU) [
1]. AAD involves a tear in the inner layer of the aortic wall; IMH entails bleeding within the aortic wall, without a clear tear in the inner layer; and PAU refers to a defect or ulceration in the aortic wall penetrating deeper layers, as indicated in
Figure 1 below. Various risk factors, including hypertension and atherosclerosis, are associated with these subtypes [
2,
3].
Characterized by defects in the aortic wall, AAS poses the imminent risk of vessel obstruction or rupture, often manifesting through symptoms such as chest pain [
3]. High blood pressure and genetic factors are commonly associated with AAS [
1]. Symptoms include intense pain, often resembling a heart attack [
1,
4]. Prompt diagnosis, typically through imaging techniques, is crucial. A compromised aortic wall can lead to severe complications, underscoring the urgency of addressing AAS [
3]. The incidence of AAS in the general population ranges from 3.5 to 6.0 cases per 100,000 patient-years, with higher rates observed among individuals aged 64 to 74 years and those aged 75 years and older [
5,
6].
Diagnosing AAS presents a multifaceted challenge due to its diverse clinical presentations and the absence of standardized diagnostic criteria [
3,
7]. This diagnostic uncertainty contributes to delays in treatment, emphasizing the critical need for improved diagnostic approaches [
8]. Machine learning, a branch of artificial intelligence, has emerged as a promising tool in healthcare, offering the potential to enhance diagnostic accuracy and patient outcomes [
9,
10].
Recognizing the gap in diagnostic guidance, this paper leverages a comprehensive dataset integrating clinical records from 68 emergency departments across the USA, encompassing medical histories of nearly 150,000 patients from 2021 to 2022. By exploring the utility of machine learning techniques, the study aims to develop predictive models for identifying patients at risk of AAS. Through systematic review and analysis of diagnostic accuracy methodologies, this research seeks to elucidate critical factors influencing AAS diagnosis and facilitate the development of clinical decision support systems. Ultimately, the objective is to expedite AAS diagnosis and improve patient care in emergency settings.
1.1. Machine Learning and Deep Learning in Acute Aortic Syndrome
Machine learning is revolutionizing the detection and classification of Acute Aortic Syndrome (AAS) through medical imaging interpretation and clinical data analysis. Here is how it’s making an impact:
1.1.1. Medical Imaging Interpretation
Computed Tomography (CT) Scans: Algorithms analyze CT scans to detect subtle patterns of AAS, learning from extensive datasets to improve accuracy [
11].
Magnetic Resonance Imaging (MRI): Similar to CT scans, algorithms analyze MRI images to recognize AAS features, enhancing subtype identification [
12].
1.1.2. Clinical Variable and Results Analysis
Risk Prediction Models: Models integrate clinical variables and imaging results to predict AAS risk, leveraging data patterns for accurate risk estimation [
13].
Integration of Multimodal Data: Machine learning integrates information from diverse sources for comprehensive AAS detection and classification [
13].
2. Literature Review of AAS
In studies related to AAS and its subsets, such as Acute Aortic Dissection (AAD) criteria, the key measure of success is primarily assessed through the Area Under the Receiver Operating Characteristic (ROC) Curve (AUC), along with sensitivity and specificity. Analyzing these parameters allows for a comprehensive statistical evaluation of the results in the below-mentioned research studies and facilitates meaningful comparisons between them.
In some research, the choice of methodology for predicting AAS is influenced by the type of dataset sources available. Some studies utilize Computed Tomography (CT) Scans and Magnetic Resonance Imaging (MRI), while others combine clinical variables and CT images. As a result, researchers adopt a hybrid approach, incorporating both clinical and imaging data. Additionally, some studies concentrate specifically on clinical measurements and demographic data, employing machine learning techniques. Various researchers utilize a diverse array of machine learning strategies to analyze these datasets.
2.1. Prediction of AAS by Machine Learning Techniques
In prior investigations focusing on AAS, researchers adopted varied methodologies to collect data. Some studies relied on immediate medical experiment results and demographic data from patients arriving at emergency departments, while others concentrated on patients admitted to hospitals, incorporating more detailed parameters and epidemiological experiment results. Certain studies utilized standard data entry forms, capturing extensive information on patient demographics, history, clinical presentations, physical findings, imaging study results, and patient outcomes, including mortality.
In October 2020, Duceau et al. [
13] applied machine learning methods to prehospital research, involving 976 patients. Their study focused on prehospital triage and utilized 27 variables and patient characteristics. Two prediction models for AAS were formulated, employing logistic regression and an ensemble machine learning approach named Super Learner (SL). Under-triage, representing the percentage of AAS patients not transported to the specialized aortic center, and over-triage, indicating the percentage of patients with alternative diagnoses transported to the specialized aortic center, were key parameters studied. For internal validation, Duceau et al. employed five-fold cross-validation in their respective studies. The SL algorithm demonstrated superior performance in predicting AAS, attributed to its ability to comprehend intricate relationships and patterns within the data. The enhanced predictive accuracy of the SL algorithm was evident in its higher AUC value of 0.73 compared to the logistic regression model’s 0.67 in the validation cohort, indicating superior predictive capabilities.
In Huo et al.’s [
14] study, initially considering 526 patients, 34 cases were excluded from the analysis, resulting in a final population size of 492. The study utilized Correlation-Based Feature Selection (CFS) to choose a subset of attributes most relevant for classification, considering each feature’s utility in predicting the class and addressing intercorrelations among them. Huo et al. applied 10-fold cross-validation during hyperparameter tuning for advanced machine learning models in the training data. In their study, the Bayesian Network outperformed five other methods, achieving a superior AUC value of 0.857. This underscores the effectiveness of the Bayesian Network in their research.
Wu et al. [
15] achieved the highest AUC for predicting in-hospital rupture of type A aortic dissection using the random forest technique. The research retrospectively evaluated 1133 consecutive patients diagnosed with Thoracic Aortic Aneurysm and Dissection (TAAD), employing the random forest method. A notable finding from the research was the exceptional performance of the random forest classification model, achieving an outstanding AUC of 0.994 for the training dataset and AUC of 0.752 for testing. This remarkable result indicates an exceptional discrimination ability in the training dataset, accompanied by sensitivity of 51.4% and a specificity of 94.5%. These metrics collectively emphasize the robust predictive capabilities of the model, highlighting its effectiveness in accurately distinguishing and classifying cases within the context of TAAD.
Guo et al. [
16] conducted a thorough research study involving 1344 patients diagnosed with acute aortic dissection, comprising 1071 survivors and 273 non-survivors. The study utilized five machine learning models, including logistic regression, decision tree, K-Nearest Neighbor, Gaussian Naive Bayes, and Extreme Gradient Boosting (XGBoost), to predict in-hospital mortality risk. Among these models, the XGBoost model demonstrated remarkable performance, with the highest mean AUC of 0.927 across 10 iterations. The study revealed that the XGBoost model was an effective approach for generating accurate and early predictions of in-hospital mortality in patients with AAD. Evaluation metrics further reinforced XGBoost’s superiority, showing high accuracy at 0.918 and robust values for sensitivity, specificity, positive predictive value, and negative predictive value.
In Wu et al.’s [
17] study, continuous variables were presented as mean ± standard deviation, and categorical variables as frequencies and percentages. A decision tree, utilizing demographic and clinical data from the training set, reveals distinct risk profiles for various aortic dissection types. A nomogram based on logistic regression analysis established a scoring system using 23 selected variables. The data were randomly divided into training (80%) and testing (20%) sets, with missing values imputed using the median of recorded measurements. This pragmatic strategy ensured effective model training and testing, offering insights into model performance on unseen data while addressing missing values. The study explored various machine learning techniques, with the XGBoost model demonstrating the highest accuracy and robustness. Shapley Additive explanations analysis highlighted factors such as Stanford type A, maximum aortic diameter > 5.5 cm, high variability in HR, high variability in diastolic BP, and involvement of the aortic arch impacting in-hospital deaths before surgery.
Lin et al.’s study [
18] showcased the impressive performance of a Convolutional Neural Network (CNN) model in predicting acute type A aortic dissection rupture. The CNN achieved a remarkable AUC of 0.99, along with notable sensitivity (93%) and specificity (90%). The investigation identified age, gender, specific biomarkers, and aortic morphological parameters as independent predictors for acute type A aortic dissection rupture, emphasizing the CNN’s efficacy in accurately predicting and identifying potential risk factors. Notably, both random forests and CNN outperformed logistic regression (LR) in predicting acute type A aortic dissection risk, while the Support Vector Machine (SVM) demonstrated inferior performance compared to LR.
2.2. Literature Review Summary
In summary, the reviewed studies collectively advance the refinement of diagnostic and predictive models for AAS, demonstrating progress across diverse methodologies. The emphasis on the AUC in comparing diagnostic performance underscores the effectiveness of notable models like XGBoost and random forest, indicating their accuracy in predicting and classifying cases. These findings contribute to a comprehensive understanding of factors influencing outcomes and risk estimation in Acute Aortic Syndrome and its subsets.
The survey’s notable discovery was the impressive AUC of 0.99 achieved using the CNN in predicting the risk of Acute Aortic Dissection rupture [
16]. The dataset was strategically divided, with 70% designated for training and 30% for testing, enhancing the robustness of the model evaluation. Specifically, for predicting AAS, the highest AUC recorded was 0.974, accompanied by a notable 91.8% sensitivity, utilizing the multivariate logistic regression [
19].
In the domain of AAS research, the study with the smallest patient cohort, as indicated in
Table 1, is associated with Morello et al. [
20], conducted in 2023. This investigation focused on the identification and analysis of 128 cases of AAS evaluated in two emergency departments. On the contrary, the study with the largest patient cohort encountered in this research on AAS is McLatchie et al.’s [
21], conducted in 2023. Initially centered on the identification and analysis of cases in three UK emergency departments, this study expanded to include 5548 patients and considered 44 variables, providing a comprehensive exploration of the subject.
One limitation observed in the above-mentioned studies is the absence of a sample size calculation. This study recognizes the importance of having at least 10 events for each predictor parameter when developing prediction models, as small sample sizes can compromise statistical power and the reliability of predictive models. However, this comprehensive thesis addresses this limitation by incorporating a substantial total sample size of 148,707 cases with 42 clinical variables’ data that will be explained more in the next section, which is about the data. This marks a significant transition into the realm of big data compared to earlier, smaller studies in the field of Acute Aortic Syndrome (AAS). Significantly, this extensive dataset also holds potential applicability for predicting various heart diseases beyond AAS.
3. Roadmap
In this section,
Figure 2 is designed to provide a visual roadmap outlining the steps of the project, including cleaning, preprocessing, and implementation. This roadmap serves to enhance understanding and facilitate the follow-through of the project’s implementation process.
4. Clinical Dataset
Clinical datasets in Acute Aortic Syndrome (AAS) are indispensable reservoirs of patient information that significantly contribute to advancing our comprehension of this intricate cardiovascular ailment. These datasets comprise a wide array of clinical histories, imaging results, laboratory tests, and treatment outcomes [
14], offering a comprehensive overview of AAS presentations. Leveraging the richness of these datasets allows for discerning patterns, identifying risk factors, and tailoring diagnostic and therapeutic approaches to individual patients [
1,
15]. Meticulous data integration, cleaning, and analysis enable clinicians to derive valuable insights into various facets of AAS, facilitating early detection, risk prediction, and personalized patient care [
15,
16].
The dynamic utilization of clinical datasets in AAS research involves continuous refinement and analysis to uncover hidden patterns and enhance diagnostic and treatment strategies. By employing advanced computational techniques, including machine learning, to extract meaningful information from these datasets, clinicians are empowered to make evidence-based decisions and formulate effective treatment plans [
1,
15]. This iterative process of data analysis and refinement not only contributes to the scientific understanding of AAS but also drives ongoing progress in heart medicine, enabling clinicians to better predict, diagnose, and manage this life-threatening cardiovascular condition in emergency department settings.
4.1. Structure of Dataset
This study utilizes a diverse array of clinical datasets from emergency departments (EDs) in hospitals, focusing on patients with and without Acute Aortic Syndrome (AAS). These datasets, gathered from 150 EDs in Canada and the USA between 2021 and 2022, provide a detailed overview of patients’ physiological conditions, diagnostic patterns, and disposition outcomes.
The dataset was organized into separate files, with each patient assigned a unique visit UID. In
Table 2, the numbers of files, columns, and rows, and unique visit UIDs in each of the four categories of files are visible.
4.1.1. Demographic Dataset
This dataset includes demographic information, such as age and gender, for both non-AAS and AAS patient groups, as depicted in
Table 3. Specifically, a gender imbalance is evident within the AAS group, with males constituting 65.1% compared to females, at 34.9%, suggesting gender-specific trends in AAS. The presence of missing data underscores the importance of ensuring analysis reliability. Furthermore,
Figure 3 and
Table 3 highlight that 46.5% of AAS patients fall within the 60–80-year-old age range, emphasizing the prevalence of this condition among older demographics. Meanwhile, the disposition outcomes column offers insights into patients’ final healthcare status, encompassing discharge, admission, transfer, and mortality, reflecting the diverse scenarios encountered within the ED setting for both non-AAS and AAS groups. Additionally, the chief complaint column details the primary reasons for seeking medical attention, notably focusing on chest pain symptoms. Further analysis elucidates the prevalence and severity of chest pain within the AAS group, emphasizing the importance of comprehensive symptom assessment in addressing healthcare needs. These combined analyses provide valuable insights into demographic characteristics, disposition outcomes, diagnosis patterns, and symptom profiles within both patient populations.
The comparison of gender distributions between patients without and with AAS reveals noteworthy findings. In the absence of Acute Aortic Syndrome (AAS) (
Figure 4), the gender distribution is relatively balanced, with women at 54.13% and men at 45.87%. However, in the presence of AAS (
Figure 5), there is a significant shift, indicating a higher proportion of male patients at 65.12%, with females comprising 34.88%. This gender disparity suggests that AAS may exhibit gender-specific patterns or predispositions. Further investigation into the underlying factors contributing to this gender-related difference could provide valuable insights into the characteristics and risk factors associated with Acute Aortic Syndrome in different demographic groups.
4.1.2. Lab Results Dataset
This dataset encompasses measurements of biomarkers and biochemical parameters obtained from blood samples, including D-dimer, hemoglobin, and troponin levels. It offers valuable insights into patients’ physiological profiles and provides markers for potential clotting abnormalities and cardiac injury associated with AAS.
Table 4 provides data on troponin levels in the lab results dataset for both the non-AAS (148,578 individuals) and AAS (129 individuals) groups. Troponin is a protein associated with heart muscle contraction, and its levels in the blood can indicate cardiac injury. The unit for troponin measurement is nanograms per liter (ng/L).
In the non-AAS group, the average troponin level is 2 ng/L, while in the AAS group, it slightly rises to 4 ng/L. The examination of troponin level distribution across various ranges reveals differences between the two groups. For example, in the 0–1 ng/L range, a higher percentage of individuals is observed in the AAS group (41.86%) compared to the non-AAS group (37.13%). Additionally, the proportion of individuals with troponin levels in the 15–25 ng/L range is higher in the AAS group (1.55%) than in the non-AAS group (0.42%).
The table also underscores the presence of missing data, with 55% of individuals in the non-AAS group and 45% in the AAS group lacking recorded troponin values.
4.1.3. Vital Signs Dataset
This part of the dataset includes measurements of essential health indicators, such as blood pressure, body temperature, and pulse rate. It provides insights into patients’ overall health status and helps monitor physiological parameters relevant to AAS diagnosis and management.
Table 5 offers a summary of diastolic blood pressure (Diastolic-BP) measurements in both the non-AAS (148,578 individuals) and AAS (129 individuals) groups within the vitals dataset. The unit for diastolic blood pressure is millimeters of mercury (mmHg). Diastolic blood pressure signifies the pressure in the arteries when the heart is at rest between beats.
In the non-AAS group, the average diastolic blood pressure is 85 mmHg, while in the AAS group, it is slightly lower at 83 mmHg. The table illustrates the distribution of diastolic blood pressure across different ranges for both groups. For instance, in the 60–90 mmHg range, a higher percentage of individuals is observed in the non-AAS group (20.13%) compared to the AAS group (10.85%). Conversely, in the 90–120 mmHg range, the AAS group has a higher percentage (19.38%) than the non-AAS group (9.40%).
Moreover, the table highlights the presence of missing data, with 62% of individuals in the non-AAS group and 56.6% in the AAS group lacking recorded diastolic blood pressure values. Addressing missing data is crucial for a comprehensive analysis of blood pressure patterns, especially in the context of aortic dissection.
Table 6 presents a summary of systolic blood pressure (Systolic-BP) measurements in both the non-AAS (148,578 individuals) and AAS (129 individuals) groups within the vitals dataset. The unit for systolic blood pressure is millimeters of mercury (mmHg). Systolic blood pressure denotes the maximum pressure in the arteries during a heartbeat.
In the non-AAS group, the average systolic blood pressure is 121 mmHg, and in the AAS group, it is slightly lower, at 118 mmHg. The table illustrates the distribution of systolic blood pressure across different ranges for both groups. For instance, in the 90–120 mmHg range, the AAS group has a higher percentage (24.81%) compared to the non-AAS group (14.93%). On the other hand, in the 120–150 mmHg range, the non-AAS group has a higher percentage (15.22%) than the AAS group (7.75%).
Additionally, the table indicates the presence of missing data, with 63% of individuals in the non-AAS group and 56.6% in the AAS group lacking recorded systolic blood pressure values. Addressing missing data is crucial for a comprehensive analysis of blood pressure patterns, particularly in the context of aortic dissection.
4.1.4. Procedure Dataset
In this segment of the dataset, there are 15 individual files, each with three columns—Visit-UID, procedure code, and procedure description—collecting information about various medical procedures carried out on patients. In preparation for analysis, rows with similar medical details were grouped by assigning them new titles or brief abbreviations. This arrangement enhances the structure of the dataset, making the analysis of medical procedures more efficient. This organized structure aids in obtaining a clearer understanding of patient categories with AAS and assists in identifying which procedures were conducted or overlooked during their hospital stay.
4.2. Data Preprocessing and Standardization
There are integral steps aimed at enhancing the reliability and interpretability of analyses. This involves standardizing units, ensuring consistency in labeling, and addressing missing data. The standardized datasets enable meaningful comparisons and correlations across various hospitals and patient populations. The implementation of preprocessing before integrating separate datasets entails cleaning, organizing, and standardizing the individual datasets. This stage is crucial to ensure that the data are in a suitable and uniform format for seamless integration. Common preprocessing tasks include handling missing values, addressing outliers, standardizing units or scales, and resolving inconsistencies in data formats. Through these preprocessing steps, the quality, reliability, and compatibility of the individual datasets are enhanced, laying the foundation for a more accurate and coherent analysis upon integration.
4.3. Missing Data
Missing data are a common challenge in medical research, especially in cardiovascular health studies, where participant withdrawal can reduce sample sizes. Missingness refers to absent information, which can introduce bias and reduce analysis efficiency. Understanding the reasons behind missing data, known as missingness mechanisms, is crucial for accurate analyses. There are three main mechanisms: MCAR (missing completely at random), MAR (missing at random), and MNAR (missing not at random) [
31,
32].
To address missing data, various methods are available, each with its advantages and limitations. These include Complete Case Analysis, Mean or Median Imputation, Regression Imputation, Multiple Imputation (MI), and others. MI is favored for its ability to account for uncertainty by creating multiple datasets with varied imputed values.
In this research, Multiple Imputation by Chained Equations (MICE) was chosen due to its robustness and ability to provide reliable results in the presence of missing data. The imputation principle involves estimating or replacing missing values based on statistical techniques to minimize bias and preserve the dataset’s statistical properties. Multiple Imputation by Chained Equations (MICE) is an iterative method where missing values are imputed by modeling each incomplete variable conditional on the observed values of other variables. This approach generates multiple imputed datasets, which are analyzed separately and then combined to account for imputation uncertainty [
33]. The choice of imputation method depends on factors like data distribution and relationships. For categorical variables with no discernible relationships, MICE is suitable. It imputes missing values by modeling each variable with missing data as a function of other variables, iteratively updating imputed values until convergence is reached.
4.3.1. Missing-Data Mechanisms
Various factors can result in missing data, and it is crucial to acknowledge them. The choice of methods to address missing data in statistical analysis relies on assumptions about the underlying mechanisms, emphasizing the importance of understanding the reasons behind missing data [
31,
32].
Table 7 categorizes missing-data mechanisms for predictors into three types: MCAR (missing completely at random), MAR (missing at random), and MNAR (missing not at random). It outlines the scenarios associated with each mechanism, providing a foundational understanding for selecting suitable methods to address missing data in statistical analyses.
Based on the characteristics and relationships among our dataset values, the choice between imputation methods is determined. In the case of a dataset where variables are neither continuous nor correlated, the selection of an appropriate imputation technique becomes crucial. Specifically, when dealing with non-continuous variables that exhibit no discernible relationships, Multiple Imputation by Chained Equations (MICE) emerges as a suitable option. It is particularly well-suited for handling missing values in categorical variables, allowing for the estimation of probabilities associated with certain events.
4.3.2. Multiple Imputation by Chained Equations (MICE)
This method involves imputing missing values in a dataset by modeling each incomplete variable conditional on the observed values of the other variables. The imputation equations are created for each variable with missing data. The general idea is to perform imputation in a series of steps, with each step updating one variable at a time. This process is iterated until convergence. The imputation models can vary depending on the nature of the variables (continuous, categorical, etc.) and the assumptions you make about the data [
33].
Multiple Imputation by Chained Equations (MICE) is an iterative imputation method that imputes missing values in a dataset by modeling each variable with missing data as a function of other variables. The method is based on the idea of imputing missing values multiple times, creating multiple datasets with plausible imputed values. These datasets are then analyzed separately, and the results are combined to account for the uncertainty introduced by the imputation process [
34].
The general equation for imputing a missing value in MICE can be described as follows:
where
represents the imputed value for the missing data in a variable,
, that
-th imputed dataset;
is the predicted value for the missing data; and
is a random error term sampled from the distribution of the residuals of the predictive model [
35].
The MICE algorithm proceeds through multiple cycles, where each cycle involves updating the imputed values for each variable based on the imputed values of other variables. This process is repeated until convergence, and the final imputed values are obtained by combining the results from all imputed datasets [
36].
In this study, we used IterativeImputer from the scikit-learn library, which implements MICE for imputing missing data. This method employs a Bayesian Ridge Regression model to predict missing values iteratively. In each iteration, the model imputes missing values for each feature in the dataset, taking into account the correlation between features. By default, it uses 10 iterations to converge, though this can be adjusted. The imputed data are then combined, ensuring that the variability introduced by the imputation process is incorporated into the final analysis.
4.4. Balancing Data with SMOTE
To address the imbalance challenge in the dataset, with only 129 positive cases among 148,578 negative cases, the Synthetic Minority Oversampling Technique (SMOTE) was employed.
4.4.1. What Is SMOTE
SMOTE stands for Synthetic Minority Oversampling Technique. It is a popular method used to address class imbalance in machine learning datasets, particularly in classification tasks. Class imbalance occurs when one class (the minority class) is significantly underrepresented compared to the others (the majority classes).
The main goal of SMOTE is to balance the class distribution by generating synthetic samples for the minority class. In this study, SMOTE was applied to address the imbalance between positive and negative cases of Acute Aortic Syndrome (AAS). With only 129 positive cases among 148,578 negative cases, the dataset faced an imbalance challenge. By generating synthetic samples of the minority class, SMOTE ensured that the predictive model is not biased toward the majority class. This improved the model’s ability to accurately identify both positive and negative cases of AAS, thereby enhancing its predictive performance.
4.4.2. Application of SMOTE
In this study, as shown in
Figure 6, the dataset initially exhibited a significant class imbalance, with only 129 positive AAS cases compared to 148,578 negative cases. This imbalance posed a challenge for model training, as it could lead to biased predictions favoring the majority class. In the graph, the minority class is depicted as a small cluster of orange data points, while the majority class appears as a dense cluster of blue points.
To address this, we applied the Synthetic Minority Oversampling Technique (SMOTE) using k = 5 nearest neighbors with the imbalanced-learn Python library. SMOTE generates synthetic samples for the minority class, helping to balance the class distribution. As illustrated in
Figure 7, the minority class becomes better represented after SMOTE is applied, reducing bias in the model and improving overall prediction performance. This balanced dataset contributes to the model’s ability to more accurately identify AAS cases and generalize effectively.
4.5. Data Loading and Exploration
The process of loading and exploring clinical datasets in Python involves utilizing key libraries such as Pandas, NumPy, Scikit-learn, and Stats models. These libraries enable efficient data manipulation, numerical operations, visualization, machine learning tasks, and statistical modeling. By leveraging demographic information, researchers can gain insights into the distribution of the population across different age groups and genders, forming the basis for exploring patterns and trends within the dataset. This analysis facilitates a comprehensive understanding of the dataset’s diversity and composition, ultimately contributing to valuable insights for medical research and healthcare decision-making. After cleaning and preprocessing the data, and based on medical knowledge and a review of the relevant literature, 26 of the most relevant variables were selected from the initial 139, as shown in
Table 8. These selected variables are illustrated in
Table 3 below and serve as inputs for the classifiers and machine learning methods used in this study.
The selected features reflect a comprehensive combination of demographic, physiological, and biochemical indicators that are clinically relevant to the early identification and prediction of Acute Aortic Syndrome (AAS). Here is a brief overview of their significance:
Age (years): Age is a major risk factor for AAS, as vascular elasticity decreases and aortic wall fragility increases with age.
Gender (M and F): AAS tends to be more common in males, but outcomes may vary by gender, making it a relevant predictive variable.
D-dimer (ng/mL): Elevated D-dimer levels indicate active clot formation and breakdown, often present in AAS due to aortic wall disruption.
HBA1C (%): Reflects long-term glucose control; poorly managed diabetes is associated with vascular damage, which may contribute to aortic pathology.
Hemoglobin (g/dL): Low levels can indicate bleeding or anemia, while high levels may be associated with increased blood viscosity and cardiovascular strain.
Absolute lymphocyte count/lymphocyte percentage: These can reflect immune response or systemic inflammation, which may be indirectly linked to vascular injury.
Neutrophil absolute: High neutrophil counts often reflect inflammation or acute stress response, commonly seen in serious cardiovascular events.
Platelet count (K/μL): Platelets play a role in clotting, and abnormal levels could signal systemic inflammation or coagulopathy relevant to AAS.
Troponins (ng/L): Elevated levels, especially high-sensitivity troponin, indicate myocardial injury, which may occur with or mimic AAS.
Blood pressure (mmHg): Both systolic and diastolic pressures are critical, as hypertension is a key risk factor and exacerbating condition in AAS.
Glasgow Coma Score: While less directly related, altered consciousness could indicate severe hemodynamic compromise.
Heart rate, pulse rate (bpm), respiratory rate (bar/min), and temperature (°C): Vital signs reflect the patient’s immediate physiological state, with abnormal values often seen in AAS due to pain, shock, or cardiovascular strain.
Height, weight, and body mass index (BMI): Obesity and body composition affect cardiovascular health and aortic structure.
Pain score: Chest or back pain is a hallmark symptom of AAS, often described as sudden and severe.
Pulse Ox (oxygen saturation%): May decrease in cases where AAS compromises circulation or causes secondary cardiac issues.
These features were chosen not only for their statistical contribution but also for their pathophysiological relevance to AAS. The combination of these indicators provides a comprehensive profile that supports robust model performance and clinical applicability.
5. Method
As depicted in
Figure 2 within the project roadmap, the initial step involves addressing missing data. Following this, the interquartile range (IQR) method is employed to assess data spread and identify outliers, refining the dataset for subsequent analysis. Subsequently, the focus shifts to feature extraction, aiming to simplify and transform the dataset by selecting essential variables. Various feature extraction methods, such as PCA, Relief feature selection, and CFS, are utilized to manage data complexity and streamline information, ultimately facilitating more efficient analysis and interpretation.
5.1. Feature Extraction
To handle the complexity and high dimensionality of our clinical dataset—which includes demographic information, vital signs, lab results, and cardiovascular-related procedures—we applied three complementary feature extraction techniques: Principal Component Analysis (PCA), Relief with SelectKBest, and Correlation-Based Feature Selection (CFS).
PCA helped reduce dimensionality by transforming correlated features into a smaller set of uncorrelated components. This was particularly helpful for compressing lab and vital-sign data without significant loss of information, improving model efficiency and reducing overfitting.
Relief with SelectKBest was used to identify and retain features most relevant for distinguishing between AAS and non-AAS cases. It works well for clinical data by emphasizing variables that strongly separate the target classes, such as key biomarkers and physiological indicators.
CFS was applied to select a subset of features that are highly correlated with the target variable (AAS) while minimizing redundancy between features. This ensures that the model benefits from meaningful inputs without duplicating information.
By combining these methods, we aimed to extract the most informative and non-redundant features, allowing our models to focus on the variables that matter most. This process not only streamlined the dataset for training and evaluation but also supported a deeper understanding of the clinical variables that play a significant role in predicting Acute Aortic Syndrome.
5.2. Classifiers
In this study, as illustrated in
Figure 2, a variety of classification algorithms were employed to address the diagnostic challenges associated with AAS. These classifiers include random forest (RF), logistic regression (LR), Gradient Boosting Classifier (G-Boost), Extreme Gradient Boosting (XGB), K-Nearest Neighbors (KNNs), decision trees, Gaussian Naive Bayes (Gaussian-NB), and AdaBoost. Each algorithm offers unique capabilities in handling the complexity of clinical data and extracting valuable insights for accurate AAS diagnosis. The application of these classifiers aims to establish a comprehensive framework for categorizing patients and assisting clinicians in making informed decisions regarding AAS diagnosis and management.
One widely employed technique is random forest (RF), recognized for its ensemble learning approach, which constructs multiple decision trees on random subsets of features and data points. By mitigating overfitting and enhancing generalization, random forest offers robustness and insightful feature importance analysis.
Logistic regression (LR) stands out as a straightforward, yet powerful algorithm suited for binary classification tasks. It estimates probabilities based on predictor variables, providing interpretability and effectiveness.
Gradient Boosting Classifier (G-Boost) sequentially combines weak learners, refining predictions through gradient descent optimization. Its iterative approach allows it to capture intricate data patterns, making it valuable for tasks like diagnosing AAS. Similarly, Extreme Gradient Boosting (XGB) merges decision trees with regularization techniques, offering high performance and interpretability, especially in handling numerical clinical features for AAS diagnosis.
K-Nearest Neighbor (KNN) classifies instances based on the majority class of their nearest neighbors, proving valuable for assessing complex clinical manifestations and determining the likelihood of AAS based on similar cases. Decision trees provide hierarchical decision-making, capturing non-linear relationships within data, essential for understanding AAS diagnostic logic and interpreting predictive factors.
Gaussian Naive Bayes leverages Bayes’ theorem and assumes feature independence, making it efficient for handling numerical clinical variables and aiding in probabilistic classification for AAS diagnosis.
Lastly, AdaBoost sequentially trains weak learners, emphasizing challenging instances to improve accuracy iteratively, contributing to more accurate classification outcomes in AAS diagnosis. Overall, these classification methods offer diverse approaches to handling AAS diagnostic challenges, providing valuable insights and accurate predictions based on clinical data.
In addition, we selected widely used machine learning algorithms, such as random forest, XGBoost, Gradient Boosting, and logistic regression, based on their strong performance in similar medical prediction tasks. To improve model performance, we used grid search with 10-fold cross-validation for hyperparameter tuning. For example, in random forest, we tested different numbers of trees (from 50 to 500) and tree depths (5 to 20). For XGBoost, we adjusted learning rates (0.01 to 0.2) and tree depths (3 to 10). These tuning steps helped us find the best model settings for our dataset.
We have conducted a basic statistical analysis to compare the performance of the different classification models used in our study. We calculated the mean accuracy and 95% confidence intervals (CI) for each classifier using 10-fold cross-validation.
Here are the results of our statistical comparison:
Random forest (RF): Mean accuracy = 99.2%, CI = [99.1%, 99.3%].
Decision trees (DT): Mean accuracy = 98.9%, CI = [98.7%, 99.1%].
K-Nearest Neighbors (KNN): Mean accuracy = 98.2%, CI = [98.1%, 98.4%].
Extreme Gradient Boosting (XGB): Mean accuracy = 93.9%, CI = [88.7%, 99.0%]
Gradient Boosting (G-Boost): Mean accuracy = 91.5%, CI = [85.7%, 97.2%]
AdaBoost: Mean accuracy = 83.4%, CI = [78.9%, 87.8%].
Logistic regression (LR): Mean accuracy = 71.6%, CI = [70.6%, 72.6%]
Gaussian Naive Bayes (Gaussian-NB): Mean accuracy = 70.8%, CI = [69.4%, 72.2%]
The results indicate that random forest, decision trees, and K-Nearest Neighbors achieved the highest performance, with random forest slightly outperforming the others. In contrast, logistic regression and Gaussian Naive Bayes had the lowest accuracy, with relatively narrow confidence intervals, suggesting more consistent but lower predictive power.
5.3. Evaluation Metrics
In this study, the performance of classification models for diagnosing AAS is evaluated using a set of metrics. The evaluation metrics employed include accuracy, specificity, sensitivity (recall), Area Under the Curve (AUC), precision, and the F1-score. These metrics provide comprehensive insights into the performance of the classification models, allowing for a thorough assessment of their effectiveness in accurately identifying AAS cases.
Accuracy measures overall model performance, specificity assesses the model’s ability to correctly identify true negatives, and sensitivity evaluates the model’s capability to capture true positives. The AUC reflects the model’s discriminative ability across various thresholds, while precision quantifies the accuracy of positive predictions. The F1-score provides a balanced measure of precision and recall. These metrics collectively provide insights into the reliability and effectiveness of classification models in diagnosing AAS, aiding clinicians in decision-making.
6. Results
In this study, following the roadmap outlined in
Figure 2, the dataset underwent partitioning into three sets: a training set and a combined validation and test set. Two splitting formats were applied: an 80-10-10 split and a 70-20-10 split. These splits allocate proportions of the dataset for training, validation, and testing purposes, facilitating robust model development and evaluation. In the 80-10-10 split, 80% of the data constituted the training set, while the remaining 20% were further divided into 10% for validation and 10% for testing. Similarly, in the 70-20-10 split, 70% of the data were allocated to the training set, and the remaining 20% were split into 10% for validation and 10% for testing.
Additionally, Stratified K-Fold Cross-Validation was employed in this study, specifically utilizing a 10-fold Stratified Cross-Validation approach. This methodology involved partitioning the dataset into ten distinct subsets or “folds”. The model underwent training ten times, with each iteration using nine folds for training and reserving the remaining fold for testing. Stratification ensured that each fold accurately represented the complete dataset, maintaining consistent proportions of class labels. This approach provided a reliable assessment of the model’s generalization ability while preserving a balanced distribution of classes across each fold.
In the dataset, there were 129 positive cases of AAS and 148,578 non-AAS cases. After handling missing and imbalanced data, and applying machine learning methods, the Relief feature method combined with random forest on an 80:10:10 splitting strategy consistently outperformed other combinations, demonstrating superior accuracy, sensitivity, and specificity (99.3%, 99.5%, and 99.3%, respectively). The identified top ten important features through Relief feature extraction, including age, troponin levels, and temperature, offer crucial insights into AAS prediction, enhancing diagnostic accuracy and improving patient outcomes in emergency departments.
The following top results in
Table 9 were observed in analyzing various classifiers and feature selection methods, considering accuracy, sensitivity, and specificity as key criteria. The table below presents the best results among all the methods applied.
Based on the analysis, both the CFS and Relief feature methods demonstrate slightly better performance compared to PCA, as indicated by their highest values for accuracy, sensitivity, and specificity across various classifiers and scenarios. However, the Relief feature method combined with random forest on the SR 80:10:10 splitting strategy emerges as the most consistently high-performing combination, with an accuracy of 0.993, sensitivity of 0.995, specificity of 0.993, and AUC of 1.000.
6.1. Feature Importance Analysis with Relief Feature Selection
In
Table 10 and
Figure 8, the values correspond to the feature importance obtained using the Relief feature selection method. Each feature is represented, along with its importance value. The importance values range from highest to lowest, with age having the highest importance value of 0.1516 and gender having the lowest importance value of 0.0033. The bar graph provides a visual representation of these importance values, with the color gradient ranging from black for the most significant feature to yellow for the least significant feature, aiding in the interpretation of feature importance.
The top ten important features identified through the Relief feature extraction method, as depicted in
Figure 8, offer valuable insights into potential risk factors and indicators associated with AAS. These features include age, troponin levels, temperature, weight, height, respiratory rate, BMI, pulse oximetry readings, pain score assessments, and gender. Leveraging these key features in predictive models enhances the diagnosis and prognosis processes in emergency departments, ultimately improving patient outcomes and resource allocation strategies. By incorporating these significant features into predictive models, healthcare professionals can make more informed decisions, leading to better patient care and management of AAS cases.
6.2. Discussion
In comparison to the prior literature on the prediction of Acute Aortic Syndromes (AAS) using machine learning and Deep leaning techniques, our study presents notable advancements in methodology, data handling, and scale. Duceau et al. [
13] employed logistic regression and the Super Learner (SL) ensemble method, with SL achieving an AUC of 0.73 using 5-fold cross-validation on a dataset comprising 976 patients, including 609 positive cases.
Huo et al. [
14] implemented Correlation-Based Feature Selection (CFS) and identified the Bayesian Network as the most effective model, achieving an AUC of 0.857 with a sample size of 492. Similarly, Wu et al. [
15] explored the predictive value of D-dimer and radiographic features using both multivariate logistic regression and random forest models. Their study, conducted on 558 patients with 93 positive cases, reported an AUC of 0.994 for the random forest model in predicting in-hospital rupture of type A aortic dissection.
Guo et al. [
16] demonstrated the predictive capability of the XGBoost algorithm in assessing in-hospital mortality risk among AAS patients, achieving an AUC of 0.927. They further validated XGBoost’s performance using 10-fold cross-validation on a dataset of 1344 patients, where it outperformed other models in terms of accuracy and robustness.
Lin et al. [
18] applied a Convolutional Neural Network (CNN) to predict rupture risk in acute type A aortic dissection, achieving an AUC of 0.99. However, the model was trained and validated on a relatively small dataset of 200 patients, which may limit its generalizability.
In contrast, our study integrates a broad range of preprocessing techniques, including multiple imputation (MICE) for handling missing data; Synthetic Minority Oversampling Technique (SMOTE) for class imbalance; and various feature selection methods, such as Relief. Multiple machine learning classifiers were evaluated, and the optimal combination—Relief feature selection with random forest—achieved superior performance with an AUC of 1.0 and accuracy of 99.3%. This comprehensive pipeline allowed us to effectively model a real-world, high-dimensional, and imbalanced clinical dataset.
Moreover, the scale of our dataset represents a significant contribution to the field. While the largest sample size identified in the previous literature was 5548 patients in the study by McLatchie [
21], our integrated dataset comprises 148,707 patient records collected from 68 emergency departments. Of these, 148,578 patients were non-AAS cases, while 129 were confirmed AAS cases. This extensive dataset not only enhances the statistical power of our analysis but also provides a valuable foundation for future research in AAS prediction and broader cardiovascular risk modeling.
The scale of this research dataset represents a substantial contribution to the field of AAS prediction. As shown in
Figure 9, which compares sample sizes across related studies, the largest previously reported dataset included 5548 patients in the study by McLatchie [
21]. In contrast, our integrated dataset comprises 148,707 patient records collected from 68 emergency departments, including 129 confirmed AAS cases and 148,578 non-AAS cases. This significant increase in sample size not only strengthens the statistical power of our findings but also enhances the generalizability and clinical relevance of our machine learning models. Moreover, this extensive dataset provides a valuable resource for future research on AAS detection and broader cardiovascular risk prediction.
7. Conclusions
One of the main aims of this research was to gather and clean a large clinical dataset focused on cardiovascular health, with the intention of making it useful for studies like ours. During the data preparation phase, we encountered missing values, which we handled using the Multiple Imputation by Chained Equations (MICE) method. Due to class distribution imbalance in the dataset, the SMOTE technique was used to address underrepresented classes.
Different forms of selection of three features were employed using Principal Component Analysis (PCA), Correlational-Based feature selection (CFS), and Relief feature, with each one providing its own advantages. Therefore, an evaluation was conducted on each selected method based on the outcome, using different approaches to determine the optimal one.
The selected features were tested against eight classification algorithms, which include Gradient Boosting, Extreme Gradient Boosting (XGB), random forest, logistic regression, K-Nearest Neighbors, decision tree, Gaussian Naive Bayes, and AdaBoost. The models were evaluated based on accuracy, AUC, precision, F1-score, sensitivity, and specificity, which reflect the model performance.
One particularly relevant study is that by B. Duceau et al. [
13], who focused on prehospital triage of AAS using machine learning. Their approach employed 5-fold cross-validation and a Super Learner model, achieving a ROC of 0.87 and sensitivity of 0.99 using a dataset of 976 hospital admissions, of which 609 were confirmed AAS cases.
In this study, we used a data-splitting strategy that allocated 80% for training, 10% for testing, and 10% for validation. By combining this with the Relief F feature selection method and a random forest classifier, we achieved impressive results: 99.3% accuracy, 0.995 sensitivity, and 0.993 specificity, among others. This setup outperformed all the other models we tested and showed a remarkable predictive ability compared to previous research. The differences in performance across various feature selection methods really underscore how crucial it is to choose the right mix of feature selection and classification techniques that fit the dataset and the predictive task at hand. For context, one of the other standout results in the existing literature comes from Y. Lin et al. [
18], who reported an accuracy of 90%, with sensitivity and specificity values of 0.93 and 0.90, respectively, using a Convolutional Neural Network (CNN). Their research pinpointed age, sex, certain biomarkers, and the aorta’s morphological parameters as key independent predictors for rupture in Acute Type A Aortic Dissection (ATAAD). In our analysis, we evaluated 25 features and identified the most significant predictors through a thorough feature ranking process. The top-ranked variables included age, troponin levels, temperature, weight, height, and respiratory rate. Interestingly, gender also played a notable role, coming in tenth in terms of importance. These results highlight the clinical significance of regularly collected vital signs and biochemical markers in aiding the early detection of AAS in emergency-department scenarios.
In short, regarding sample size, this study compiled an impressive clinical dataset featuring 148,707 patient records related to various cardiovascular diseases. It showcased the potential of machine learning as a game-changer for early AAS prediction in the emergency department. However, there is still a need for more research to refine these techniques and validate the findings using different classifiers or methods. By offering such a large dataset, this study sets the stage for future exploration in this area, aiming to develop effective tools for the early detection and management of AAS.
8. Future Work
For future work, the methodologies and findings of this study could be extended to leverage this integrated dataset for the prediction of other cardiovascular diseases and diverse datasets. The techniques employed in this research, specifically in the realms of data preprocessing, feature extraction, and machine learning model application, offer opportunities for further refinement and optimization.
Additionally, the potential for handling imbalanced data could be explored using the ADASYN (Adaptive Synthetic Sampling) technique as an alternative to SMOTE, aiming to address dataset imbalances more effectively.
Moreover, future research endeavors could explore the incorporation of advanced machine learning models, such as deep learning, to obtain more comprehensive results and potentially enhance predictive capabilities. This expansion into cutting-edge methodologies could contribute to a deeper understanding of the dataset and improve the overall effectiveness of predictive models.