1. Introduction
Stroke can be divided into two main types based on its cause: ischemic and hemorrhagic. An ischemic stroke occurs when a blood vessel in the brain becomes blocked, resulting in a lack of blood flow to a specific part of the brain. This can happen due to a blood clot (thrombosis), a piece of debris from the heart (cardio-embolism), a buildup of plaque in the arteries (atherosclerosis), or a blockage of many small arteries (platelets plug). In contrast, hemorrhagic stroke occurs when a blood vessel in the brain bursts, causing bleeding in the brain [
1,
2]. In the last 20 years, stroke has become a major cause of death and long-term disabilities worldwide. According to the World Health Organization, stroke accounts for about one in every ten cases of illness globally and is the second most common cause of death worldwide [
3,
4,
5]. Stroke is a major health issue in Turkey. Out of every 100,000 people, about 93–109 experience ischemic strokes (caused by blood clots in the brain) and 32–40 have intracerebral hemorrhages (bleeding in the brain). This means that about 142–158 people per 100,000 have a stroke. The high number of strokes in Turkey shows how much of a problem it is. This means that we need to find ways to prevent and treat strokes to reduce the number of people who get sick or die from them [
6,
7]. In cases of stroke, the initial 6 h are critical for effective treatment. During this acute phase, prompt decision-making and timely interventions by healthcare professionals are crucial [
8,
9,
10]. Advanced brain scanning methods like CT and MRI have become crucial for rapidly identifying, understanding, and predicting the outcome of acute strokes [
11]. Deep learning techniques have made significant progress in accurately forecasting acute ischemic stroke, a sudden loss of blood flow to the brain. This breakthrough has transformed healthcare, enabling early detection and intervention to mitigate stroke’s devastating effects.
When exploring how artificial intelligence (AI) is utilized to predict strokes through data analysis, Scopus (with its database containing 93 research papers) provides valuable insights into trends, notable authors, impactful studies, and emerging topics within this field. In
Figure 1, countries like India, Canada, the United States, Nigeria, Iran, Australia, China, South Africa, Japan, Austria, and Korea stand out as leaders in documenting advancements in AI-driven stroke prediction, demonstrating the widespread global focus on improving stroke diagnosis and prevention through AI technologies.
In research papers, a visual representation (referred to as
Figure 2) known as the co-network of author keywords illustrates the connections between various author keywords. Each keyword is shown as a node, with lines connecting nodes to indicate how often they appear together in the same studies.
Figure 2 offers a comprehensive breakdown of keywords related to intelligence, comprising a total of 80 nodes. By using such tools and datasets, researchers can gain an understanding of the current state and future directions of AI in stroke prediction, ultimately advancing medical science and improving patient outcomes. These findings can inspire new ideas, helping to develop more accurate predictive models and personalized treatment plans that can significantly reduce the burden of strokes on patients and healthcare systems.
2. Deep Learning Models
Different types of deep learning models are used to predict stroke risk by utilizing various data types. Convolutional neural networks (CNNs) are particularly good at analyzing images like MRI [
12] or CT scans [
13], which help them identify subtle patterns that could indicate a higher risk of stroke. RNNs and LSTMs are powerful tools for analyzing sequential data like patient monitoring readings. They can be used to predict events like strokes. Combining them with CNNs into CNN-LSTM models allows for the use of multimodal data sources, leading to more accurate prediction outcomes. Different deep learning models provide potential benefits for predicting strokes [
14,
15]. However, each model has its own strengths and weaknesses. Some models may need labeled data for training, while others require significant computing power. Additionally, some models may be easier to interpret and explain than others.
Deep learning, particularly in medical image analysis, incorporates several prominent CNN architectures [
16]. AlexNet, a trailblazer introduced in 2012, set the stage for CNN development, featuring convolutional layers, max-pooling layers, and fully connected layers, exhibiting remarkable performance on the ImageNet dataset. VGG Networks, renowned for their simplicity, adopt a consistent architecture with multiple convolutional layers employing small 3 × 3 filters. ResNet, notable for its depth, employs skip connections that bypass intermediate layers, enhancing performance in stroke prediction tasks. Inception (GoogLeNet) utilizes filter banks of various sizes within a single layer to capture diverse features effectively, bolstering medical image analysis. MobileNet, tailored for mobile devices, prioritizes efficiency through depthwise separable convolutions, making it ideal for resource-constrained environments. ResNet created “skip connections” that help train deeper networks by reducing the problem of gradually weakening signals. GoogLeNet’s “Inception” sections use different filter sizes in parallel for better computational efficiency. MobileNet focuses on lightweight deployment on devices like phones, using “depthwise separable convolutions” to cut down on parameters and computations while still performing well. YOLO (You Only Look Once), initially designed for detecting objects, can also be utilized in medical image analysis due to its lightning-fast processing and single-pass architecture. MobileNet, optimized for mobile platforms, concentrates on efficiency by employing depthwise separable convolutions, making it perfect for environments with limited resources. These designs are built for different purposes and have become essential frameworks for deep learning applications, like predicting strokes using medical images [
17,
18,
19].
In this study, the feature blocks of EfficientNet V2 Large, ResNet101, MobileNet V3 Large, VGG19BN, and ConvNeXt Base were used as feature extractors for the given image data. These models (
Table 1) were compared based on various classification metrics within the proposed architecture, which integrates the feature extractor block from each pretrained model with classification blocks for both image and tabular data. Performance metrics were evaluated to determine the optimal combination of feature extractor and classification blocks for the given task.
Through this comparison, valuable insights were gained into the strengths and limitations of each model, guiding the selection of the optimal architecture to meet the specific needs of the task. The findings from this evaluation are expected to inform future work by highlighting the most effective strategies for combining feature extraction and classification in similar applications.
3. Materials and Methods
3.1. Dataset Acquisition and Preprocessing
Before training the model, several steps were taken to prepare the images. The pixel intensity values were normalized to create a consistent standard, all images were resized to a uniform dimension, and various augmentation techniques, such as rotation, flipping, and contrast adjustments, were applied. These enhancements increased the variety and strength of the dataset. This thorough preprocessing was crucial for improving the model’s performance and ensuring that it could effectively handle a range of different MRI inputs.
The dataset (
Table 2) utilized in this study consists of ischemic stroke MRI images sourced from Kaggle [
20], a well-known platform for data science competitions and datasets. The dataset includes MRI images of the brain with and without ischemic stroke. The images were divided into training, validation, and testing sets. Specifically, the dataset contains 1173 images for training, 252 images for validation, and 252 images for testing, corresponding to a 70%, 15%, and 15% split, respectively.
To address the limited dataset size, data augmentation techniques were applied, increasing sample variety by introducing transformations like rotation, scaling, and flipping. This approach helped the model learn more adaptable features, enhancing its ability to generalize. The ConvNeXt Base model was implemented with transfer learning, utilizing pre-trained weights from a larger, diverse dataset to boost performance despite data limitations.
3.2. Data Augmentation and Transformation
To ensure uniformity and enhance model generalization, the dataset underwent transformation and augmentation processes using PyTorch’s torchvision library. These processes aimed to augment the dataset’s diversity while maintaining its integrity. These processes included various techniques such as normalization, resizing, and data augmentation, which collectively improved the dataset’s diversity while preserving its integrity.
Firstly, all images were resized to a fixed dimension of 224 × 224 pixels. This resizing operation standardized the image dimensions across the dataset, facilitating consistent model input. Although image resizing reduces spatial resolution, essential features for classification are still preserved, as the model’s convolutional layers are designed to capture key patterns within the images. The rationale for resizing or downsampling MRI images is primarily to reduce computational complexity and memory usage, which can be crucial for efficient processing and model training. However, it is recognized that this process can lead to some degree of information loss, which may impact the accuracy of stroke classification. To mitigate this, careful consideration was given to the resizing parameters to balance image quality and computational efficiency. Additionally, techniques such as data augmentation and the use of advanced image processing methods were employed to preserve essential features and maintain classification accuracy.
Subsequently, a random autocontrast operation was applied to 70% of the images, adjusting their contrast to enhance visual quality. Additionally, to ensure compatibility with pre-trained models, the pixel values of the images were normalized. This normalization process involved subtracting the mean values of [0.485, 0.456, 0.406] and dividing by the standard deviations of [0.229, 0.224, 0.225], aligning the image data distribution with the pre-trained model’s input requirements. By introducing variability in contrast levels and standardizing pixel values, these transformations collectively contribute to a more resilient training process and enhance the model’s adaptability to real-world scenarios.
3.3. Proposed Framework
The model is composed of several critical components, each contributing uniquely to its overall performance: deep feature extraction using ConvNeXt Base’s feature block, median filtering, Sobel edge detection, feature concatenation, feature selection, and classification via a support vector machine (SVM). The deep feature extraction component utilizes the feature block of ConvNeXt, a state-of-the-art convolutional neural network (CNN), to extract robust and high-level features from the images. ConvNeXt Base is renowned for its efficiency and powerful feature extraction capabilities, capturing intricate patterns and features that are not easily discernible through traditional methods. ConvNeXt Base initiates the process with a convolutional layer, followed by batch normalization for stabilization and Hardswish activation for non-linearity, effectively capturing diverse features across various image types. The architecture integrates inverted residual blocks, each composed of depthwise separable convolutions and pointwise convolutions, which efficiently capture spatial hierarchies and semantic information within the image data.
The deep learning model, ConvNeXt Base, extracts 1024 features from its final feature block. Sobel edge detection produces 2 features per pixel (horizontal and vertical gradients). For an input image with dimensions H × W, this results in a total of 2 × H × W Sobel features. After feature concatenation, the combined feature set comprises 1024 + (2 × H × W) features.
Following deep feature extraction, median filtering is applied to the images to reduce noise while preserving important edge information. This preprocessing step is crucial as it enhances the quality of the image features, ensuring that the subsequent edge detection process operates on high-quality, denoised images. This step is vital for maintaining the integrity of the features extracted by the ConvNeXt Base network.
After median filtering, Sobel edge detection is employed to identify and highlight the edges within the image. Sobel edge detection is critical as it captures the essential structural information of objects within the image. The Sobel operator calculates the gradient of image intensity, detecting edges by measuring changes in pixel values, which enhances the model’s capability to recognize the outlines and contours of objects, providing a solid foundation for subsequent feature integration.
The refined features from ConvNeXt Base, combined with the edge-based structural features from Sobel edge detection, undergo a feature concatenation process. This step integrates the high-level abstract features from CNN with the detailed structural information captured by edge detection. The combined feature set benefits from both the abstract pattern recognition capabilities of deep learning and the precise structural information from traditional edge detection, resulting in a more comprehensive and discriminative feature set.
Subsequently, the model employs feature selection techniques, such as SelectKBest, to refine the combined feature set by selecting the most relevant features based on their statistical significance in relation to the target variable. This step reduces dimensionality, removes redundant features, and enhances the model’s efficiency by focusing on the most informative features for classification.
The final component of the proposed model is the classification step, performed using a support vector machine (SVM). SVMs are chosen for their robustness and effectiveness in handling high-dimensional data. The SVM classifier is trained on the selected features, enabling it to learn the optimal decision boundaries for classifying the images. By using the strengths of SVM in conjunction with the rich feature set provided by the previous steps, the model achieves superior classification performance.
3.4. Proposed Deep Learning Model Architecture
Incorporating ConvNeXt Base’s feature extraction mechanism, the proposed model harnesses the efficiency and sophistication of modern CNN architectures to enhance image classification accuracy. ConvNeXt Base’s feature extractor is crucial in distilling intricate patterns from images, making it particularly effective for diverse and complex image datasets. As depicted in
Figure 3, ConvNeXt Base initiates the process with a convolutional layer, followed by batch normalization for stabilization and Hardswish activation for non-linearity. This initial sequence is crucial for capturing diverse features across various image types.
ConvNeXt Base’s architecture integrates inverted residual blocks, each composed of depthwise separable convolutions and pointwise convolutions. These blocks are designed to efficiently capture spatial hierarchies and semantic information within the image data. The depthwise separable convolutions reduce the computational complexity while preserving the richness of the feature representations, and the pointwise convolutions facilitate the integration of these features across channels. This sophisticated structure allows the model to maintain a balance between efficiency and accuracy, making it suitable for a wide range of image classification tasks. After the deep features are extracted by ConvNeXt Base, the model proceeds with additional processing steps to enhance these features further. Initially, the extracted deep features are subjected to median filtering, which is essential for noise reduction. Median filtering helps in preserving important edge information while removing noise, ensuring that the subsequent processing steps operate on high-quality, denoised images. This step is crucial for maintaining the integrity of the features extracted by the ConvNeXt Base network.
Following median filtering, Sobel edge detection is applied to the filtered images. The Sobel operator calculates the gradient of image intensity, effectively highlighting the edges and contours within the image. This edge detection process is critical for capturing the structural information of objects within the image, which complements the high-level abstract features extracted by ConvNeXt Base. By integrating the edge-based structural features with the deep learning features, the model enhances its capability to recognize and classify complex patterns. The features extracted from ConvNeXt Base, combined with the structural edge features obtained through Sobel edge detection, are merged through a feature concatenation process. This step integrates the high-level abstract features from CNN with the detailed structural information captured by edge detection. The combined feature set benefits from both the abstract pattern recognition capabilities of deep learning and the precise structural information from traditional edge detection, resulting in a more comprehensive and discriminative feature set.
Subsequently, the model employs feature selection techniques, such as SelectKBest, to refine the combined feature set. This step involves selecting the most relevant features based on their statistical significance in relation to the target variable. By focusing on the most informative features, the model reduces dimensionality, removes redundant features, and enhances efficiency, which is critical for improving classification performance and preventing overfitting. The final component of the proposed model is the classification step, performed using an SVM. The SVM is trained on the selected features, learning the optimal decision boundaries for classifying the images. SVMs are chosen for their robustness and effectiveness in handling high-dimensional data. They excel in finding the hyperplane that maximizes the margin between different classes, leading to high accuracy in classification tasks.
To boost the SVM’s performance, techniques like kernel functions are used to handle non-linear relationships between features. The choice of kernel function and the careful tuning of hyperparameters are key to fine-tuning the model’s accuracy and ensuring it generalizes well. By integrating these advanced techniques, the final model not only excels with the training data but also proves effective across different datasets and in practical situations. This approach helps ensure that the model remains reliable and accurate, even as it encounters new and varied data.
The novelty of this study is reflected in the use of the ConvNeXt Base architecture, which is recognized for enhancing feature extraction and learning efficiency beyond that of traditional CNNs, making it particularly effective for medical imaging tasks. A unique set of data augmentation techniques modified specifically for MRI images was implemented, allowing the training dataset to be expanded and the model to become more resilient to the common variations seen in clinical imaging. Unlike previous studies, this approach focuses on the early and accurate identification of ischemic strokes, which are both prevalent and time-sensitive for effective treatment.
3.5. Experimental Setup
For conducting the experiments, a computational setup consisting of a 2.6 GHz 6-Core Intel Core i7 processor was utilized, providing adequate processing power for handling complex computations involved in deep learning tasks. The experiments were executed within the Visual Studio Code (VS Code) Jupyter Notebook interface, which facilitated an interactive programming environment conducive to iterative model development and evaluation. PyTorch utilizes GPU acceleration through CUDA, which enables it to process multiple tensors at the same time, significantly speeding up the training process. This capability allows for simultaneous calculations, making the entire workflow more efficient. Moreover, PyTorch offers dynamic computational graphs, which let you build and adjust complex models as needed. This flexibility is valuable for experimenting with different neural network architectures and fine-tuning hyperparameters, allowing researchers to adapt their models as needed throughout the process. The architecture of the proposed methodology is shown in
Figure 4.
In the experimental setup, datasets were organized into three separate directories: training, validation, and testing, with each directory containing images categorized into different classes. To enhance the robustness and performance of the model, several data preprocessing and augmentation techniques were employed. Each image was resized to a standard size of 224 × 224. Additionally, random autocontrast was applied with a probability of 0.7 to adjust the contrast of the images randomly, providing variability and aiding in better generalization of the model. The images were then normalized using the mean and standard deviation values specific to the ImageNet dataset (mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225]). This preprocessing pipeline was implemented using the torchvision.transforms module. The transformed images were loaded into the model using the ImageFolder class from the torchvision.datasets module, which facilitated the organization and labeling of the images. The data were then fed into data loaders created with the DataLoader class from torch.utils.data, with a batch size of 64 and the number of workers set to the CPU count, ensuring efficient data loading and shuffling for the training set, while the validation and testing sets were not shuffled to maintain consistency. For feature extraction, a pretrained model’s feature extraction block was utilized. The process began with extracting deep features from the images using the pretrained model. Additional preprocessing steps were then applied to enhance these features further. Each image was passed through a median filtering function to reduce noise, followed by a Sobel edge detection function to highlight edges. The median filter worked by replacing each pixel’s value with the median value of the neighboring pixels, thus smoothing the image while preserving edges. For edge detection, the Sobel operator was used to detect edges in both the x and y directions, and these were combined to form an edge map.
These steps were implemented within a function that processed the images from the data loaders. The function first extracted deep features from each image using the pretrained model, then applied median filtering, and subsequently performed Sobel edge detection on the filtered image. The resulting deep features and edge-detected features were flattened and concatenated to form a single comprehensive feature vector for each image. To further refine the features, the SelectKBest method from the sklearn.feature_selection module was used. This method selected the top 50 features based on the ANOVA F-value, which measures the statistical significance of each feature in relation to the target variable. The training, validation, and test feature sets were reshaped accordingly to match the expected input format for the feature selection process. A SVM classifier with a radial basis function (RBF) kernel was employed for classification. The SVM classifier was trained on the selected features from the training set. The performance of the model was evaluated on the validation set, and the results, including the classification report and confusion matrix, were generated to assess the model’s accuracy, precision, recall, and F1-score. The same evaluation metrics were applied to the test set to validate the model’s performance on unseen data. This comprehensive experimental setup ensured a robust evaluation of the model’s capabilities across different stages of the data pipeline.
3.6. Metrics for Evaluating Stroke Classification Models
To improve the effectiveness of stroke classification models, a variety of performance metrics are used. These metrics [
21,
22] help assess how accurate and reliable predictions are, making sure the models can correctly identify different types of strokes or differentiate stroke cases from non-stroke cases. By examining these metrics, researchers can understand where a model excels and where it needs improvement. This process plays a crucial role in refining the model to better support accurate diagnoses and treatment decisions, which can make a real difference in patient care.
Accuracy points out how many predictions the model got right overall, giving a general idea of its performance. However, it can be misleading when dealing with imbalanced datasets, where one class significantly outnumbers the others.
Precision measures of how many of the cases the model predicted as positive were correct, helping to ensure the reliability of identifying stroke-positive cases. By focusing on precision, the model becomes better at accurately identifying strokes and minimizing false positives.
Recall (Sensitivity) measures how well the model can identify all actual stroke cases, which is especially important in healthcare, where missing a stroke diagnosis (false negative) can have serious consequences.
The F1-Score combines precision and recall into one balanced measure, making it especially helpful for dealing with imbalanced datasets where one class may be underrepresented.
The confusion matrix provides a detailed breakdown of a model’s predictions into four categories: true positives (TPs), which are stroke cases correctly identified; true negatives (TNs), which are non-stroke cases correctly identified; false negatives (FNs), which are stroke cases incorrectly classified as non-stroke; and false positives (FPs), which are non-stroke cases incorrectly identified as stroke. These categories offer valuable insights into areas where the model can be refined. Evaluating stroke classification models is crucial for ensuring they can accurately distinguish between stroke and non-stroke cases.
The ROC-AUC metric plays a key role here, showing how well the model can differentiate between the two classes. A higher AUC means the model is better at correctly identifying stroke cases, which is especially important in medical settings where early and accurate diagnosis is essential. A higher AUC reduces the chances of misclassification, giving healthcare providers more confidence in making informed decisions.
4. Results and Discussion
4.1. Results
The results of this study highlight ConvNeXt Base as the optimal pre-trained model for stroke classification, offering superior accuracy and robustness. These findings underscore the potential of deep learning models in enhancing diagnostic accuracy and guiding clinical decision-making in stroke detection. The results obtained from the evaluation on the validation and test sets are summarized in
Table 3 and
Table 4, respectively, while
Table 5 and
Table 6 provide the corresponding confusion matrices.
Achieving an accuracy of 86% represents an important step forward compared to traditional, manual diagnostic approaches, especially in the critical area of stroke diagnosis. Even small improvements in accuracy can lead to meaningful benefits in patient care. This model is designed to complement, not replace, clinical expertise by identifying cases that may need additional inquiry. Its high sensitivity is crucial, as it helps ensure that true ischemic stroke cases are detected, supporting timely intervention. Integrated as an assistive tool, the model allows radiologists to efficiently prioritize cases identified for closer review, enhancing both workflow and diagnostic accuracy.
This study evaluated the performance of several pre-trained models as feature extractors for stroke classification, including EfficientNet_v2_l, ConvNeXt Base, ResNet101, MobileNet_v3_large, and VGG19_BN. These models were assessed based on their accuracy, precision, recall, and F1-score for both stroke (S) and no-stroke (N) categories, with results presented for both validation and test sets.
On the validation set, ConvNeXt Base achieved the highest overall accuracy at 84%, demonstrating superior performance compared to the other models. It also excelled in precision, with values of 85% for stroke and 83% for no-stroke, indicating a high level of exactness in its positive predictions. In terms of recall, ConvNeXt Base achieved 80% for stroke and 87% for no-stroke, highlighting its effectiveness in capturing actual positive cases. Its F1-scores were 82% for stroke and 85% for no-stroke, reflecting a balanced performance between precision and recall.
MobileNet_v3_large also performed well, with an accuracy of 83% on the validation set. Its precision was 82% for stroke and 83% for no-stroke, and its recall was 81% for stroke and 84% for no-stroke. The F1-scores for MobileNet_v3_large were 82% for stroke and 83% for no-stroke, slightly lower than those of ConvNeXt Base but still indicating robust performance.
ResNet101 and VGG19_BN showed moderate performance on the validation set. ResNet101 achieved an accuracy of 79%, with precision values of 78% for stroke and 80% for no-stroke, recall values of 78% for stroke and 80% for no-stroke, and F1-scores of 78% for stroke and 80% for no-stroke. VGG19_BN had an accuracy of 77%, with a precision of 74% for stroke and 82% for no-stroke, recall of 82% for stroke and 73% for no-stroke, and F1-scores of 77% for both stroke and no-stroke.
EfficientNet_v2_l had the lowest performance on the validation set, with an accuracy of 60%. It showed precision values of 57% for stroke and 62% for no-stroke, recall values of 61% for stroke and 58% for no-stroke, and F1-scores of 59% for stroke and 60% for no-stroke, indicating limited capability in distinguishing between stroke and no-stroke cases.
On the test set, ConvNeXt Base maintained its superior performance, achieving an accuracy of 86%. Its precision was 88% for stroke and 85% for no-stroke, recall was 82% for stroke and 89% for no-stroke, and F1-scores were 85% for stroke and 87% for no-stroke. MobileNet_v3_large achieved an accuracy of 81%, with precision values of 81% for both stroke and no-stroke, recall values of 78% for stroke and 83% for no-stroke, and F1-scores of 80% for stroke and 82% for no-stroke.
ResNet101 also had an accuracy of 81% on the test set, with precision values of 79% for stroke and 82% for no-stroke, recall values of 81% for stroke and 80% for no-stroke, and F1-scores of 80% for stroke and 81% for no-stroke. VGG19_BN achieved a test set accuracy of 79%, with precision values of 76% for stroke and 81% for no-stroke, recall values of 81% for stroke and 77% for no-stroke, and F1-scores of 78% for stroke and 79% for no-stroke. EfficientNet_v2_l continued to underperform with an accuracy of 60%, precision values of 59% for stroke and 61% for no-stroke, recall values of 53% for stroke and 67% for no-stroke, and F1-scores of 56% for stroke and 64% for no-stroke.
The confusion matrices further illustrate the performance differences among the models. ConvNeXt Base consistently achieved the highest true positive (TP) and true negative (TN) counts while maintaining low false positive (FP) and false negative (FN) counts. Specifically, on the validation set, ConvNeXt Base had 96 TPs and 115 TNs, with only 24 FNs and 17 FPs. On the test set, it had 99 TPs and 118 TNs, with 21 FNs and 14 FPs.
Given its superior performance across all metrics, including the highest accuracy, precision, recall, and F1-scores, as well as the most favorable confusion matrix results, ConvNeXt Base was selected as the optimal model for stroke classification in this study. This choice emphasizes the model’s reliability and effectiveness, making it a promising candidate for use in clinical settings to improve early detection and treatment of ischemic strokes. This positions ConvNeXt Base as a valuable tool for clinicians seeking to make timely and informed decisions in stroke diagnosis and treatment.
4.2. Discussion
In this experimental study, several pre-trained models for stroke detection were examined, each bringing unique features and advantages. EfficientNet_v2_l, with approximately 66 million parameters, is known for its effective balance between accuracy and efficiency. Advanced techniques and specific normalization were used to process images optimally. ConvNeXt Base, which includes about 89 million parameters, is characterized by its sophisticated attention mechanisms and enhanced convolutions, though it is associated with higher computational demands. ResNet101, featuring around 44 million parameters, benefits from residual connections that facilitate the management of deep networks, although it is less resource-efficient. MobileNet_v3_large, with roughly 27 million parameters, is designed for speed and efficiency, making it well-suited for mobile devices, but it may not achieve the accuracy of larger models. Finally, VGG19_BN, with around 143 million parameters, excels in accuracy due to its deep architecture, though it requires significant computational resources. Each model offers distinct advantages, and the choice of model depends on the specific requirements of the task.
In conclusion, the results of this study clearly show that ConvNeXt Base is the most effective model for stroke classification, achieving the highest accuracy of 86% on the test set. It outperformed other models, such as ResNet101 and MobileNet_v3_large, which both achieved 81% accuracy. While VGG19_BN and EfficientNet_v2_l showed lower performance, ConvNeXt Base stood out for its overall reliability, precision, and recall. These results suggest that ConvNeXt Base is a promising tool for clinical applications, offering the potential for accurate and early stroke detection.
5. Conclusions
This study evaluated the performance of several pre-trained deep learning models for stroke classification, including EfficientNet_v2_l, ConvNeXt Base, ResNet101, MobileNet_v3_large, and VGG19_BN. The results indicated that ConvNeXt Base outperformed the other models, achieving the highest accuracy, precision, recall, and F1-scores on both the validation and test sets. Specifically, ConvNeXt Base achieved an accuracy of 84% on the validation set and 86% on the test set, with consistently superior precision, recall, and F1-scores compared to the other models evaluated. The confusion matrices further confirmed the superior performance of ConvNeXt Base, displaying the highest true positive and true negative counts and the lowest false positive and false negative counts. These findings underscore the potential of ConvNeXt Base in enhancing diagnostic accuracy and guiding clinical decision-making in stroke detection. These tools can be valuable in helping prioritize cases and providing a supportive framework for clinicians, especially in busy healthcare settings where time and resources are limited.
In the experiments conducted, ConvNeXt Base was found to be highly reliable, excelling not only in accuracy but also in precision, recall, and F1-score. This indicates that it consistently provides dependable results for both stroke and no-stroke cases, establishing it as a strong candidate for diagnostic use.
The observed differences in performance among the models highlight the importance of selecting the appropriate model. While ConvNeXt Base performed exceptionally well, models like EfficientNet_v2_l did not meet expectations, demonstrating that not all models are suitable for every task. This emphasizes the need for careful evaluation when choosing a model for medical classification.
With its impressive performance, ConvNeXt Base shows significant potential for clinical applications, particularly in the early detection of strokes. Its high accuracy and precision could assist doctors in making faster, more accurate decisions, ultimately improving patient outcomes by facilitating quicker diagnosis and treatment.
Implementing a deep learning model for MRI-based stroke classification in clinical practice faces several key challenges. Variability in MRI scans stemming from differences in image quality, acquisition protocols, and patient backgrounds, complicates ensuring consistent model performance across diverse imaging systems. Moreover, gathering a large, well-labeled dataset that represents a wide range of patient demographics remains essential yet challenging for generalizing model predictions. Integrating the model within clinical workflows requires a thoughtful approach to avoid disrupting routines, while gaining clinicians’ trust and acceptance through clear demonstrations of the model’s utility and reliability. Regulatory compliance presents another hurdle, as rigorous approvals must be secured to meet safety and data protection standards, such as HIPAA and GDPR. To build trust, the model’s decision-making processes should be transparent so that clinicians can interpret its predictions confidently and understand its limitations. Furthermore, the model must be tested across diverse patient groups and continuously monitored post-deployment to detect any drop in performance. Financial and technical resources are also critical, as the implementation of AI requires both significant investment and infrastructure. Finally, maintaining model accuracy and relevance calls for ongoing retraining and updates as more data become available, requiring sustained institutional support.
This study encountered several limitations, primarily due to a dataset of only 1173 MRI images, which may not fully capture the variety of ischemic stroke cases encountered in clinical settings. This constraint could affect the model’s generalizability across diverse patient populations. Additionally, variations in MRI acquisition techniques, equipment, and patient demographics could influence model performance, potentially leading to inconsistencies when implemented in real-world environments. The model currently focuses exclusively on ischemic strokes, excluding hemorrhagic strokes, which limits its scope in stroke diagnosis. The “black box” nature of deep learning models can present challenges for clinician trust, as interpretability remains an ongoing concern in clinical applications. Moreover, regulatory approval, which is essential for clinical deployment, involves a rigorous process that has not been addressed in this study. Although validation results were promising, sustained effectiveness would require continuous assessment in real-world settings.
Future research efforts should aim to refine these models, bringing in the latest imaging technologies and designing AI tools that clinicians can easily use. To confirm how well these models work in real-world settings, comprehensive validation studies and clinical trials will be essential. Exploring ways to personalize medicine and addressing healthcare access issues globally will also be key in improving stroke diagnosis and treatment. By utilizing advanced deep learning models like ConvNeXt Base, the accuracy and reliability of stroke detection can be significantly enhanced, ultimately helping to improve patient outcomes and reduce mortality rates. This research highlights AI’s immense potential to transform medical diagnostics and elevate global health. Future work could undertake existing limitations by expanding the dataset, adding images from various healthcare institutions, and broadening the model’s use to cover other stroke types. Making these models easier to interpret would also help build clinician trust and support more informed decision-making.