Enhanced Ischemic Stroke Lesion Segmentation in MRI Using Attention U-Net with Generalized Dice Focal Loss

Garcia-Salgado, Beatriz P.; Almaraz-Damian, Jose A.; Cervantes-Chavarria, Oscar; Ponomaryov, Volodymyr; Reyes-Reyes, Rogelio; Cruz-Ramos, Clara; Sadovnychiy, Sergiy

doi:10.3390/app14188183

Open AccessArticle

Enhanced Ischemic Stroke Lesion Segmentation in MRI Using Attention U-Net with Generalized Dice Focal Loss

by

Beatriz P. Garcia-Salgado

¹

,

Jose A. Almaraz-Damian

²

,

Oscar Cervantes-Chavarria

¹,

Volodymyr Ponomaryov

^1,*

,

Rogelio Reyes-Reyes

¹

,

Clara Cruz-Ramos

¹

and

Sergiy Sadovnychiy

³

¹

Instituto Politécnico Nacional, ESIME Culhuacán, Santa Ana 1000, Mexico City 04440, Mexico

²

Centro de Investigación Científica y de Educación Superior de Ensenada, Unidad de Transferencia Tecnológica Tepic, Tepic 63173, Mexico

³

Instituto Mexicano del Petróleo, Eje Central Lázaro Cárdenas Norte 152, Mexico City 7730, Mexico

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(18), 8183; https://doi.org/10.3390/app14188183

Submission received: 19 August 2024 / Revised: 7 September 2024 / Accepted: 9 September 2024 / Published: 11 September 2024

(This article belongs to the Special Issue Advances in Computer Vision and Semantic Segmentation, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

:

Ischemic stroke lesion segmentation in MRI images represents significant challenges, particularly due to class imbalance between foreground and background pixels. Several approaches have been developed to achieve higher F1-Scores in stroke lesion segmentation under this challenge. These strategies include convolutional neural networks (CNN) and models that represent a large number of parameters, which can only be trained on specialized computational architectures that are explicitly oriented to data processing. This paper proposes a lightweight model based on the U-Net architecture that handles an attention module and the Generalized Dice Focal loss function to enhance the segmentation accuracy in the class imbalance environment, characteristic of stroke lesions in MRI images. This study also analyzes the segmentation performance according to the pixel size of stroke lesions, giving insights into the loss function behavior using the public ISLES 2015 and ISLES 2022 MRI datasets. The proposed model can effectively segment small stroke lesions with F1-Scores over 0.7, particularly in FLAIR, DWI, and T2 sequences. Furthermore, the model shows reasonable convergence with their 7.9 million parameters at 200 epochs, making it suitable for practical implementation on mid and high-end general-purpose graphic processing units.

Keywords:

ischemic stroke segmentation; MRI segmentation; attention U-Net; Generalized Dice Focal loss

1. Introduction

Currently, the mortality and disability rate of stroke has increased worldwide, becoming one of the leading causes of death or disability. Millions of people die annually due to this condition, while those who survive have permanent disabilities that significantly affect their quality of life [1].

The brain plays a fundamental role in the coordination and communication of the human body. Strokes are one of the most critical conditions that directly impact brain function. According to the World Health Organization (WHO), about 5 million deaths and another 5 million cases of permanent disability are estimated globally each year, being more common in adults over 40 years of age [2].

This work focuses on ischemic strokes, classified into two subtypes: transient ischemic attack (TIA), which resolves in less than 24 h, and cerebral infarction, which is caused by an obstruction in blood flow, such as a clot or, less commonly, fat in a blood vessel, causing death of cerebral tissue due to lack of oxygen [3]. The prompt response to stroke with appropriate treatment is necessary to reduce the mortality rate and improve the well-being of affected patients [4]. Then, early identification allows the implementation of effective interventions, reducing the risk of permanent brain damage and increasing the chances of recovery [5]. Consequently, the progress in research and innovative technologies is essential to aid in the detection and treatment of stroke and brain injury, primarily to mitigate its devastating effects on people of all ages.

However, accurate detection and segmentation of strokes represent a challenge for radiologists because of the variability in the size and location of lesions in the brain [6]. Effective treatment mainly depends on factors such as the type of stroke, its location, and its extent. These factors are evaluated using diagnostic imaging modalities, such as Magnetic Resonance Imaging (MRI), Computed Tomography (CT), and angiography, where MRI stands out for its sensitivity and versatility [4].

The Fluid-Attenuated Inversion Recovery (FLAIR) sequence in MRI allows distinguishing between edema and cerebrospinal fluid. At the same time, Diffusion-Weighted Imaging (DWI) aids in the early detection of acute ischemic stroke by identifying areas of restricted water diffusion in the brain tissue [7]. The Apparent Diffusion Coefficient (ADC) map, derived from DWI, helps differentiate between acute and chronic ischemic lesions by quantifying the degree of water diffusion in tissues. Additionally, T1-weighted and T2-weighted MRI sequences provide detailed anatomical information, with T1-weighted images offering excellent contrast for normal brain anatomy and T2-weighted images being highly sensitive to pathological changes [8].

The effectiveness of stroke treatment is limited by the time it takes to obtain and interpret patient images. Radiologists face significant challenges in the manual analysis of MRI due to the complexity and variability of ischemic cerebrovascular lesions, leading to time-consuming and labor-intensive processes and the potential for human error [9]. Therefore, automatic segmentation techniques for detecting ischemic cerebrovascular lesions from MRI can significantly accelerate specialists’ work, improve the diagnostic process, reduce the risk of diagnostic errors, and ensure more accurate and precise results.

These techniques are integrated into Computer-Aided Diagnosis (CADx) systems, which usually employ machine-learning approaches to improve the segmentation and classification of cerebrovascular lesions [10] and evaluate the results using various well-known metrics, such as accuracy, precision, sensitivity, specificity, F1-Score, IoU, and Hausdorff Distance (HD), ensuring a performance that effectively aids medical personnel.

This study focuses on segmenting MRI modalities using a deep learning approach centered on the U-Net architecture [11] with an attention mechanism suitable for segmenting lesions of different sizes caused by ischemic strokes and capable of rapid training.

The remainder of this document is structured as follows: Section 2 reviews the related works, providing context and background on previous research. Section 3 outlines the materials and methods employed in this study, starting with an overview of the proposed approach, including details on the network architecture, loss function, and optimizer. The experimental setup is detailed in Section 3.2, covering the datasets, data augmentation procedure, hyperparameter optimization, and metrics used to evaluate the results. Then, results are presented in Section 4, with subsections dedicated to ablation testing, segmentation results in different MRI modalities, comparisons with state-of-the-art methods, evaluations in the coronal and sagittal planes and performance of the proposed approach on another dataset. Finally, Section 5 discusses the findings, followed by the conclusions in Section 6.

2. Related Works

Segmentation in MRI implies discerning which pixels in an image correspond to a determined class given some features, which usually include pixel intensities, textures, and edges. The segmentation performance is generally evaluated using metrics to assess the similarity between a model’s prediction and a reference map, such as the Dice coefficient (DC) and the Hausdorff Distance (HD). For example, Mahmood and Basit [12] presented an approach where multimodal MRI images are processed using various filters to obtain the features that feed a Random Forest (RF) model. The features include the smooth intensities given by a Gaussian filter, median intensities, edge information, and local entropy. Their approach achieved an average DC and HD of 0.54 and 82.78 for the Ischemic Stroke Lesion Segmentation (ISLES) dataset in its 2015 edition [13]. Although these pixel features led to an acceptable segmentation performance, they are usually processed locally without regard for their context in the whole image.

The Deep Learning approach has recently become a remarkable medical imaging advancement [10]. Deep Learning models capture global and local information in the images and integrate them to perform a prediction [14]. This integration allows the evaluation of detailed features, such as edges and textures, in a global context, enhancing the performance in medical image segmentation. However, MRI segmentation presents a high degree of class imbalance when using the overall context of the image, which refers to the disproportion between the number of pixels belonging to the foreground, or segmentation target, and the image’s background. Therefore, several studies have addressed this issue, from modifying a network architecture to selecting an adequate loss function.

This section presents a comprehensive literature review of deep learning approaches for ischemic stroke segmentation, focusing on the architecture and the loss employed. A point worth mentioning is that the reviewed methods used the same dataset as Mahmood and Basit [12] for evaluation, which simplifies its comparison.

The U-Net architecture is widely used for segmentation purposes. Its layer connections compose a “U” shape structure, which allows for preserving spatial information and fine details when the data passes through the encoder and decoder [11], giving precise pixel segmentations. This architecture shape is commonly used as a baseline for different methods.

Liu et al. [9] proposed the development of a deep residual attention convolutional neural network (DRANet) model to accurately segment and quantify ischemic stroke lesions and white matter hyperintensities (WMH) in MRI. Their method uses a “U”-shape architecture with an attention module to improve predictive performance. This module consists of a main branch and a dilated soft mask branch that work together to enhance the quality of features for accurate segmentation. In addition, the approach employs the Dice loss function, which is based on the DC, to address the class unbalance in the MRI segmentation task. The method was tested on the ISLES 2015 dataset in different MRI modalities, obtaining 0.7178 and 3.36, respectively, for DC and HD in the FLAIR modality. A point worth noting is that the attention modules help the network focus on relevant regions, adding more contextual information. These modules can be added to a U-Net, taking advantage of the intermediate connections, such as in the work of Oktay et al. [15] for abdominal CT segmentation.

In other work, Liu et al. [16] designed a deep convolution neural network for multimodal MRI segmentation, incorporating dense blocks in a “U”-shape-based network. The architecture was trained using a combined loss function based on the Cross-Entropy and Dice loss functions. Their approach achieved an average DC of 0.57 and HD of 43.02 for the multimode input composed of DWI and FLAIR sequences.

Another U-Net variant is given by Karthik et al. [17]. Their technique uses five convolutional blocks in the encoder and a Leaky Rectified Linear Unit (ReLU) as the activation function in the final layers. Their study compares the model’s performance using Adaptive Moment Estimation (ADAM), AdaGrad and AdaDelta optimizers and the Cross-Entropy, Dice, and Tversky loss functions. The model was trained and tested on multimodal MRI datasets, obtaining the best results in the ADAM optimizer and Dice loss version, representing an average DC score of 0.7008.

Similarly, Abdmouleh et al. [18] proposed a modified U-Net architecture that includes inception blocks, which provide multiple connection paths to give flexibility in the complexity level of the encoder and decoder. Their study evaluated their model using FLAIR, DWI, T1, and T2 modalities, with FLAIR having the highest DC at 0.7466.

Moreover, the U-Net has also been utilized with a transfer learning approach. Aboudi et al. [19] employed a pre-trained ResNet50 architecture combined with a U-Net. The ResNet50 backbone was included in the U-Net encoder, and a Convolutional Block Attention Module (CBAM) was added after the convolution blocks. The model was fine-tuned using Stochastic Gradient Descent (SGD) and Categorical Cross-Entropy loss. The model was evaluated on DWI and T2 images, and an average DC of 0.796 was obtained.

Other techniques involve using Convolutional Neural Networks (CNN) to segment patches taken from the image. For example, Shah et al. [20] proposed a 2D Convolutional Neural Network (2D-CNN) for ischemic stroke segmentation in multimodal MRI images. Their model utilizes a Leaky ReLU activation function. The images were split into patches of size 33 × 33 to select portions of the lesion area and healthy brain tissue in a balanced manner to avoid class imbalance. This method achieved a mean DC of 0.7156.

Likewise, Kamnitsas [21] presented a 3D Convolutional Neural Network with 11 layers for segmenting brain lesions in multimodal MRI scans, which was further studied afterward in a posterior work [22]. The network can process multimodal 3D patches at multiple scales. The system was trained using SGD and achieved an average Dice coefficient of 0.64 before applying preprocessing methods.

Another 3D approach is proposed by Zhang et al. [23]. They introduced an automatic method to segment acute ischemic stroke from DWI images using a deep 3D fully convolutional and densely connected convolution neural network (3D FC-DenseNet). This method can efficiently utilize 3D contextual information from the MRI slices. The authors tested the performance of the Dice loss and Cross-Entropy loss in the training phase of the model due to the natural class imbalance in this segmentation task. They concluded that the Dice loss led to higher results. Although their model could not segment some brain lesions, the overall DC obtained in the ISLES 2015 dataset was 0.58.

The 3D approach helps contextualize the brain lesions through the MRI slices. However, other studies use different views of the MRI sequences to obtain that context. Zhang et al. [24] proposed a method whose first stage consists of a Detection and Segmentation Network (DSN) trained with the Focal loss function that segments the lesions from DWI sequences’ axial, coronal and sagittal planes. Then, a Multiplane Fusion Network (MPFN) trained with the Dice loss aims to make the prediction more accurate considering the data provided by the DSN. The MPFN reached a DC of 0.622 and a sensitivity of 0.7322.

Another dataset employed in state-of-the-art models to test their performance is the ISLES 2022 challenge edition, which presents high-variance scenarios. Different works proposed architectures similar to the previously mentioned. Wu et al. [25] employed Unet-based architectures along with a Boundary Deformation Module (BDM) to enhance the segmentation of the edges of the lesion. Furthermore, the refinement of the edge segmentation is performed by using a multitask learning loss function that combines the Dice loss and Boundary loss functions. This approach has a high computational demand, achieving a DC of 0.8560 using 119 million parameters.

Similarly, Werdiger et al. [26] used a DenseNet architecture with 22.3 million parameters to segment the ISLES 2022 dataset, which obtained a mean DC of 0.694. Another state-of-the-art approach was presented by Jeong et al. [27]. They utilized the nnU-Net framework, which performs an automatic configuration and optimization of hyperparameters of a U-Net architecture [28]. The loss function consisted of combining the Dice and the Cross-Entropy loss functions. The model achieved a DC of 0.7641 when trained only with the ISLES 2022 DWI images and 0.7869 when transfer learning from different MRI modalities and databases was employed.

Some studies revised in this section show improvements in segmentation procedures when they employ light changes in the U-Net architecture. However, the principal challenge resides in class imbalance, especially when small brain lesions have to be segmented. As can be observed, some methods address the challenge by using the Dice loss function. Nonetheless, they have reported that its behavior is sometimes unstable [23]. Alternately, some procedures use the combination of loss functions to deal with the class imbalance and enhance the segmentation performance on the edges of the ground truth mask, such as in the work of Umirzakova et al. [29]. Other approaches consist of using a combination of different MRI modalities and planes or proposing more weighty models but at a higher complexity cost. These last strategies represent a drawback in model training due to the requirement of specialized computer architectures explicitly oriented to data processing, making applying the models in non-high-tech environments challenging.

In this work, the proposed approach involves a modified U-Net network, based on the work of Oktay et al. [15], and a loss function that combines the advantages offered by the Focal loss and Dice loss in addressing class imbalance. The principal contributions of this study are listed below:

A new system based on U-Net architecture using attention mechanisms that enhance the segmentation in MRI images of small brain lesions caused by ischemic stroke by incorporating the Generalized Dice Focal Loss (GDFL) composite function.
Efficient segmentation in different MRI modalities and planes.
Improved segmentation performance compared to state-of-the-art systems based on the evaluation of the system using the ISLES 2015 dataset and competitive performance with state-of-the-art models in evaluating the ISLES 2022 dataset.
The model moderately converges at 200 epochs with 7.9 million parameters, making it suitable for training in general-purpose mid- and high-end graphic processing units (GPU).

3. Materials and Methods

This section aims to comprehensively describe all the components and processes involved in implementing and evaluating the proposed model. Consequently, the proposed method is first presented, highlighting the particularities of its architectural design and the elements needed for its training. Then, the experimental setup to evaluate the performance of the proposed approach is described, including the datasets and quality metrics employed in the evaluation, along with particularities in implementing the model’s hyperparameters optimization.

3.1. Overview of the Proposed Approach

This section concerns the proposed network architecture. Moreover, the loss function and optimizer utilized to train the model are described, detailing the challenges faced in segmenting MRI images and how the proposed method addresses them.

3.1.1. Network Architecture

The proposed approach is based on the U-Net architecture given by Ronnenberger et al. [11], which is drawn schematically as a “U”-shape network. This shape results from intermediate connections between the encoder and decoder, which help preserve the spatial information and fine details between the encoder and decoder. This feature makes the U-Net capable of performing precise pixel segmentation. Moreover, these connections allow versatility in integrating attention mechanisms, which focus on specific regions of the image, adding contextual information to the model.

The proposed network architecture uses the intermediate connections to feed an attention mechanism proposed by Oktay et al. [15], as illustrated in Figure 1.

The proposed architecture comprises an encoder formed by five convolutional layers, which transform the input into 32 channels and end with 512 channels. Then, the output of each encoder layer is given to the attention module. The decoder layers are fed with the output of the attention module and an upsampling of the previous decoder layer. Finally, the segmentation map is delivered with two channels: foreground and background probabilities.

3.1.2. Loss Function and Optimizer

The loss function and optimizer represent key factors in model optimization and convergence. The loss function measures the model’s performance through the training epochs while the optimizer adjusts the model parameters to minimize or maximize the loss function.

The segmentation task in MRI images presents a class imbalance problem since the target objects to be segmented in the images, also known as foreground, usually occupy fewer pixels than the non-target objects or background. The class imbalance directly impacts the performance of the optimizer loss function of a model.

Regarding the optimizer, the Stochastic Gradient Descent (SGD) method has been widely employed as a usual optimizer for training neural network models. However, this optimizer has been demonstrated to lead to poor results when the tasks involve data imbalance [30]. Therefore, the optimizer employed for training the model was the Adaptive Moment Estimation with Weight Decay (AdamW) [31]. The AdamW optimizer computes the gradients’ first and second-order moments to adapt the parameters’ learning rate. Then, the

L_{2}

regularization is used to avoid overfitting. These features allow fast training compared with other commonly used optimizers, such as SGD.

Concerning the loss function, the proposed model was trained using the Generalized Dice Focal Loss (GDFL) function, which uses Generalized Dice Loss (GDL) and Focal Loss (FL), as described below.

The GDL function is an extension of Dice Loss (DL), which considers the Dice Coefficient to evaluate the similitude of two sets. In binary labeling, DL considers the True Positives (TP), False Positives (FP), and False Negatives (FN) computed from the confusion matrix built with the label predictions of the model and the reference label map known as ground truth. This loss function is defined as follows [32]:

L_{D L} = 1 - \frac{2 TP}{2 TP + FP + FN} .

(1)

In segmentation tasks where the area sizes of the background and foreground classes are considerably different, the subrepresented class has less influence on the computation of

L_{D L}

. Therefore, the GDL includes weights based on the frequency of the classes to mitigate the class imbalance [33]:

L_{G D L} = 1 - 2 \frac{\sum_{c = 1}^{2} ω_{c} \sum_{n = 1}^{N} p_{n, c} g_{n, c}}{\sum_{c = 1}^{2} ω_{c} \sum_{n = 1}^{N} (p_{n, c} + g_{n, c})},

(2)

where

p_{n, c}

represents the probability of the pixel n to belong to class c,

g_{n, c}

is the binary value of the pixel n in the ground truth map for class c, and

ω_{c}

is a weight given to the class c calculated as:

ω_{c} = \frac{1}{{(\sum_{n = 1}^{N} g_{n, c})}^{2}} .

(3)

Another function that considers the class imbalance is the FL. The FL function is based on the Balance Cross Entropy (BCE), which uses the class frequency. The class imbalance generated from easily classified background pixels in the segmentation task is addressed by a modulating factor that depends on a focusing parameter

γ

. This parameter aids in the reduction of the contribution to the loss function of samples that are easy to classify. The FL function is defined as [34]:

L_{F L} = - α {(1 - q)}^{γ} log (q),

(4)

q = \{\begin{matrix} \hat{p} & if g = 1, \\ 1 - \hat{p} & if g = 0, \end{matrix}

(5)

where

\hat{p}

is the estimated probability for the foreground class resulting from the model output and

α

is a weight factor to balance the class contribution.

The GDFL function addresses the class imbalance using GDL and focuses more on challenging samples to classify by utilizing FL. Given

λ_{G D L}

and

λ_{F L}

to ponderate GDL and FL, the loss function combines

L_{G D L}

and

L_{F L}

by summing the weighting loss functions as follows:

L_{G D F L} = λ_{G D L} L_{G D L} + λ_{F L} L_{F L} .

(6)

3.2. Experimental Setup

This section addresses the datasets and their processing, as well as the hyperparameter optimization and the metrics employed to evaluate the proposed model.

The experiments were conducted in a computer architecture with 64GB RAM, an AMD Ryzen 9 5950V 16-core processor, and an NVIDIA GeForce RTX 3090 GPU. The programming environment was set in a 64-bit Linux operating system using Python 3.11.9 and the MONAI library [35].

3.2.1. Datasets

This study evaluated the experiments using the Ischemic Stroke Lesion Segmentation (ISLES) dataset of the challenges in 2015 and 2022, respectively denoted as ISLES 2015 [13] and ISLES 2022 [36]. These challenges were carried out to promote the scientific progress of medical image processing, making the datasets available under an Open Database License. The data are provided on request in the Uncompressed Neuroimaging Informatics Technology Initiative (NIfTI) format, which allows the visualization of the brain in the axial, coronal, and sagittal planes.

The ISLES 2015 challenge contains a subtask for segmenting subacute ischemic stroke lesions (SISS). The dataset for this subtask includes MRI scans of 28 patients in the FLAIR, DWI, T1, and T2 modalities and the segmentation masks of the lesions. A medical center supplied the cases with complete data anonymization, and experienced raters prepared the segmentation masks.

The ISLES 2022 challenge also includes a subtask of segmenting stroke lesions in multimodal MRI images. This subtask presents the images in NIfTI format and segmentation masks for the data of 250 subjects. The main feature of this dataset is that the images were obtained in multiple imaging centers using three different MRI scan devices, which, in contrast to ISLES 2015, provides high variability in the images’ and lesions’ size as well as lesions’ locations.

Healthcare professionals obtained the images of both datasets in clinical imaging routines for stroke patients, and expert raters provided the annotations over the images to conform the segmentation masks. Furthermore, an inter-rater analysis was performed in both datasets before their release to evaluate the annotations and maximize the overall segmentation accuracy. Consequently, these datasets help to evaluate the proposed model’s performance under actual clinical conditions.

The datasets were processed in the axial, coronal, and sagittal planes as follows to carry out the experiments.

For each subject, each slice with at least one value different from zero in the NIfTI file of the masks was taken and transformed into an image in the Portable Network Graphics (PNG) format. Posteriorly, the slices for each MRI modality corresponding to these mask images were normalized, converted to uint16 data type and saved in PNG format. Since this procedure was applied to each plane, three subsets are given for each dataset.

Both datasets present high variance in lesion sizes, which involves class imbalance, making them suitable for model evaluation. Figure 2 presents violin plots of the sizes of segmentation masks in each plane to illustrate their distribution. Moreover, the first quartile (Q1), median (Q2), and third quartile (Q3) for each plane and dataset are annotated in the figure.

In order to feed the model in the training phase with a balanced number of samples of each mask’ size, each subset of the datasets was divided into four categories according to the quartiles given in Figure 2: Small (

s_{p} < Q 1

), Medium Down (

Q 1 \leq s_{p} < Q 2

), Medium Up (

Q 2 \leq s_{p} < Q 3

), and Large (

Q 3 \leq s_{p}

), with

s_{p}

being the size in pixels of the mask.

These categories were considered to comprise the training, validation, and testing sets. Each subset was split into training, validation, and testing sets, followed by a proportion of 30% for testing and 70% for training and validation. Then, this last set was divided into 80% for training and 20% for validation. Consequently, the general proportions for training, validation, and testing sets constitute 56%, 14%, and 30%, respectively. These sets were built by randomly selecting and merging samples from each mask size category using the mentioned proportion. Furthermore, the experiments were evaluated using these ratios with repeated k-fold validation of ten iterations to compute the average and standard deviation of the results. Table 1 and Table 2 present the number of samples employed to compose these sets.

3.2.2. Data Augmentation

Data augmentation is a technique for increasing the diversity of training data and reducing the risk of overfitting. It involves applying transformations to existing data to help improve a model’s generalization.

The transformations employed in this work include vertical and horizontal flip, rotation at 90° in a random direction, gamma correction with a random gamma parameter, Contrast Limited Adaptive Histogram Equalization (CLAHE), image transposition, and the addition of Gaussian noise with random variance. Each training image is subjected to each one of these transformations with a probability of 50% of occurrence in each epoch. Moreover, the images were resized to 256 × 256 pixels.

The data augmentation was implemented in Python using the Albumentations library [37].

3.2.3. Hyperparameter Optimization

Hyperparameters are variables that control the training process and the structure of the model. A model’s performance can vary significantly depending on the selected hyperparameters, and they must be configured before the training phase.

The hyperparameters related to the network architecture were set as follows: the dropout of the network given in Figure 1 was set to 0.3, and the learning rate for the AdamW optimizer is

1 \times 10^{- 4}

; the training phase was carried out in 200 epochs using a batch size of 16. When 30 consecutive epochs occur without improvement in the loss function, the scheduler to the optimizer decreased the learning rate to 1% of the original rate.

The key hyperparameters of the proposed model consist of the assigned weights to the GDL and FL functions in Equation (6):

λ_{G D L}

and

λ_{F L}

, correspondingly. Their selection was performed based on the F1-Scores resulting from experiments varying their values in a complementary manner and

γ = 2

in Equation (4). Since the features of the ISLES 2015 and ISLES 2022 datasets differ, this process was conducted separately for each dataset. Figure 3 illustrates the results for such experiments. It can be observed that the maximum scores are obtained while using

λ_{G D L} = 0.3

and

λ_{F L} = 0.7

for the ISLES 2015 dataset, and

λ_{G D L} = 0.1

with

λ_{F L} = 0.9

for the ISLES 2022 images.

3.2.4. Evaluation Metrics

The experiments were evaluated using the following metrics commonly used in the segmentation task [9,18,38]. These metrics use the True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN) obtained from the confusion matrix computed from the label predictions of the model and the ground truth map.

Intersection over Union (IoU) evaluates the overlap between two areas, such as the predicted and ground truth masks. It is calculated as the intersection divided by the union of two sets:

$IoU = \frac{TP}{TP + FP + FN} .$

(7)
F1-Score represents the harmonic mean between precision and sensitivity. It measures the model’s performance in balancing false positives and false negatives:

$F 1 - Score = \frac{2 TP}{2 TP + FP + FN} .$

(8)
The Hausdorff Distance (HD) measures the maximum distance between the nearest points in two sets. It evaluates the similarity between shapes or contours:

$H D (Y, G) = max (max_{y \in Y} min_{g \in G} d (y, g), max_{g \in G} min_{y \in Y} d (g, y)),$

(9)

where Y is the set of labeled pixels in the foreground class predicted by the model, G corresponds to the set of real foreground pixels given in the ground truth map, and $d (•, •)$ is the Euclidean distance between two pixels.
Accuracy assesses the overall performance of a classification model, describing the percentage of correct predictions out of the total predictions:

$Accuracy = \frac{TP + TN}{TP + FP + TN + FN} .$

(10)
Precision gives the accuracy of the positive predictions:

$Precision = \frac{TP}{TP + FP} .$

(11)
Sensitivity, also known as Recall, evaluates the model’s ability to identify positive cases correctly:

$Sensitivity = \frac{TP}{TP + FN} .$

(12)
Specificity measures the model’s ability to identify the negative cases correctly:

$Specificity = \frac{TN}{FP + TN} .$

(13)

4. Results

The results of different experiments conducted to evaluate the model are presented in this section. These experiments were executed using the ISLES 2015 dataset. They include ablation testing, evaluation in different MRI modalities, a comparison with other state-of-the-art methods, and the model’s behavior in the coronal and sagittal plane. Furthermore, the method’s performance in another dataset, such as ISLES 2022, is documented, and the time to train the model is reported.

4.1. Ablation Testing

In order to assess the impact of various components of the proposed model on its overall performance, an ablation test was conducted using the same training, validation, and testing sets described in Section 3.2.1 for all the model’s versions on the axial FLAIR modality. The details of each model version are described in Table 3.

The average results of the IoU, F1-Score, and HD in the ten k-fold iterations are presented in Table 4. Moreover, the average number of Non-Segmented images (N. S.) was computed. Additionally, the details of N. S. in the Small and Medium Down (M. D.) mask size categories are reported in the table. A point worth mentioning is that no N. S. in the Medium Up and Large categories occurred in the evaluation.

Figure 4 compares the learning curves. The k-fold iteration of each model’s version with the median values in IoU was selected for this display. It is worth noting that the proposed model follows an adequate relation between the train and validation losses without presenting signs of overfitting or underfitting.

The CBAM component involves calculating attention maps across the channel and spatial dimensions and is usually employed for general-purpose segmentation. On the other hand, the attention module given in Figure 1 applies attention through learned gating. The ablation study on the attention modules shows that the CBAM model and the one without attention modules (W/O A. M.) presented a faster convergence than the proposed model. However, the W/O A. M. model obtained the lowest IoU results after SGD, highlighting the contribution of attention modules to the proposed system. The CBAM model achieved marginally lower IoU and F1 scores than the proposed approach but fewer N. S. images in the small category. However, it increased the N. S. in the Medium Down category.

Regarding the ablation study on the loss functions, the version with Focal Loss achieved a faster convergence and a better HD score than Dice Loss and the proposed model. Although a lower HD value is desirable in the segmentation task, the Focal Loss version produced more N. S. than the proposed method. Moreover, the proposed approach performed best in IoU and F1-Score metrics.

A visual comparison between the model’s version is given in Figure 5 to complement this ablation test. The results are ordered following the mask’s size categories: Small (first row), Medium Down (second row), Medium Up (third row), and Large (fourth row). The cases are identified in the ground truth mask image using the nomenclature P C_S, where P refers to the plane, C to the subject’s case number, and S to the slice. This figure shows slight differences in the number of FP and FN pixels: the Dice Loss and W/O A.M. versions present more FP pixels than the proposed model, while Focal Loss results in a narrowly higher number of FN. Furthermore, the CBAM model introduced more FP than the proposed approach but less than the W/O A.M. model, as seen in the case AXI 4_80.

4.2. Performance on Different MRI Modalities and Planes

The ISLES 2015 dataset provides the files of FLAIR, DWI, T1, and T2 modalities for each subject’s case. As previously described in Section 3.2.1, these files were processed to obtain the images’ axial, coronal, and sagittal views. The proposed approach was evaluated separately for each one of these MRI modalities and planes. Also, the training, validation, and testing sets for each modality were built using the same case numbers and slices employed in the repeated k-fold validation of the ablation test. The results can be observed in Table 5.

The proposed model offered generally better segmentation results in the FLAIR modality in all planes. The FLAIR modality, followed by DWI, obtained higher results in IoU and F1 scores in the axial and sagittal planes. In the coronal plane, T2 scored the second-best IoU. Moreover, FLAIR and DWI got the lowest rate of N. S. among the modalities in all planes. Details of these results can be interpreted using violin plots.

The results of the k-fold iteration with the median IoU using the FLAIR modality in the axial plane were plotted according to their mask’s size category and illustrated in Figure 6. In this example, the Small category presents seven N. S. images. It can be observed that, although the model specifically struggles in segmenting small lesions, the distribution of the IoU and F1 scores shows a greater probability of a correct segmentation. Moreover, the median is at 0.8495 for IoU and 0.9211 for F1-Score, which implies that the segmentation of half of the images with small lesions achieved higher scores than those. Regarding the other categories, all the images registered IoU and F1 scores higher than 0.38. Overall, the median IoU values for all the categories are higher than 0.85.

Given the difficulty of segmenting small lesions, a detail of the results in this category produced by the model using the other MRI modalities is presented in Figure 7 for better study. The plot presents the results of the same k-fold iteration given in Figure 6, but according to the MRI modality. The violin plot of DWI shows lower medians than FLAIR in IoU and F1 scores. This fact explains that although DWI accomplished fewer non-segmented images, as shown in Table 5, the average results are lower than using FLAIR. A point worth mentioning is that T2 achieved a higher median IoU in small lesions than DWI. Nonetheless, its mean decreased due to the considerable non-segmented samples in this category.

The results in the coronal and sagittal planes were also illustrated using violin plots to verify the distribution of the segmentation results in Figure 8. A similar behavior to that of the axial plane is observed. There are non-segmented samples. However, the median IoU is above 0.90 in the coronal and sagittal planes. The non-segmented images and some instances with poor results reduce the average values, but most of the segmented images demonstrate the model’s capability to achieve IoU rates higher than 0.8.

The best results in coronal and sagittal planes were achieved using the FLAIR modality. Therefore, examples of segmented images of each mask’s size category in the FLAIR modality are presented in Figure 9 and Figure 10. It can be observed that the wrong classified pixels are located on the edges of the lesion. The model has demonstrated that it can segment various pixel blobs in the image, as shown in Figure 10f. Nevertheless, it struggles with blobs of significantly small size, such as in Figure 9f.

4.3. Performance on the ISLES 2022 Dataset

The images of the ISLES 2022 dataset were acquired in multiple MRI centers with different equipment, making it challenging to capture the underlying trends in the data. Moreover, the distribution of the lesion size presents a high variance, as observed in Figure 2b. The results in the different planes for the DWI and ADC sequences provided in this dataset are presented in Table 6. For this dataset, the proposed model failed to segment many images. However, the segmentation on the DWI images reached an IoU of 0.6961 in the axial plane, although a mean of 121 images could not be segmented. The experiments with the ADC sequence resulted in the worst results, with a mean IoU of 0.5004 for the three planes.

The F1-Scores of the model using the best configuration for the ISLES 2015 dataset were compared against the best configuration for ISLES 2022 to explore the proposed model’s performance when changing the key hyperparameters. For notation convenience, the configuration using (

λ_{G D L} = 0.3

,

λ_{F L} = 0.7

) is named as configuration A, and the one using (

λ_{G D L} = 0.1

,

λ_{F L} = 0.9

) is noted as configuration B. The results are displayed in Figure 11.

It is noted that the median F1-Scores in the different mask size categories surpass 0.8950 for DWI using both hyperparameters’ configurations, where (

λ_{G D L} = 0.1

,

λ_{F L} = 0.9

) has higher median values. The general median values of configurations A and B were 0.9125 and 0.9198, respectively. Nonetheless, their mean values are diminished due to the non-segmented images in both configurations. Furthermore, it can be observed that the mean value of DWI’s Small category is lower in the (

λ_{G D L} = 0.1

,

λ_{F L} = 0.9

) configuration than the mean resulting from configuration A. This is because the first one resulted in a more elevated number of N. S. (121 images), while the second registered a mean of 115.

A violin plot of the mask size of non-segmented images by each configuration applied to DWI modality is provided in Figure 12. The mean value in the plots shows that although the (

λ_{G D L} = 0.3

,

λ_{F L} = 0.7

) configuration results in a lower number of N. S. images, it struggles to segment lesions with a higher number of pixels than the non-segmented for the model using (

λ_{G D L} = 0.1

,

λ_{F L} = 0.9

). Consequently, the last configuration performs a more precise small lesion segmentation. A point worth mentioning is that the mask sizes of the N. S. images do not surpass 300 pixels, corresponding to a smaller size than the median in the lesions registered in ISLES 2015.

Independently of the hyperparameters configuration, applying the proposed method on ADC images produces many non-segmented images with small lesions. Moreover, the median F1-Score was set at 0.5366 with configuration A and 0.5874 with configuration B, demonstrating that the model performs poorly using this MRI sequence compared to the DWI modality. However, the general median using configurations A and B on ADC images was registered at 0.8180 and 0.8350, respectively, assessing a better performance in larger lesions than in small ones. These results could be due to the lesion’s visibility in both MRI modalities. While DWI measures the water diffusion in the tissue, the ADC image quantifies the magnitude of the water diffusion. These measurements are translated into differences in the contrast of the lesion with the rest of the tissues, with DWI being the representation with higher contrast.

In order to better appreciate the behavior of the proposed approach on DWI sequences displayed in Figure 11a,b, the predicted segmentation masks of one example in each mask’s size category were plotted in Figure 13. In subfigure (f) and (g), it can be observed that the model using (

λ_{G D L} = 0.3

,

λ_{F L} = 0.7

) fails to segment some isolated pixels. In contrast, the (

λ_{G D L} = 0.1

,

λ_{F L} = 0.9

) configuration covers a larger segmentation area, giving even more false positives in subfigure (k).

4.4. Comparison with State-of-the-Art Methods

This section provides a comparative analysis of the proposed approach against current state-of-the-art methods. Table 7 displays the reported results of various approaches reviewed in Section 2. The comparison was performed against the proposed method using the F1-Score, HD, accuracy, precision, sensitivity, and specificity metrics.

The proposed approach achieved the highest F1-Score among all the compared methods and MRI modalities using the FLAIR sequences. Regarding the DWI images, the proposed method obtained better F1-Scores than the works in [18,23,24,39]. Nonetheless, Zhang et al. [24] obtained the highest score in accuracy and specificity in DWI and in general. However, the results demonstrate that the proposed method is competitive in these metrics evaluation. Although Liu et al. [9] reported the best HD using the combination of FLAIR and DWI sequences, the proposed method achieved narrow results. Finally, the proposed approach scored higher in the T1 and T2 modalities than Abdmouleh et al. [18] and Kumar et al. [39].

Regarding the ISLES 2022 dataset, Wu et al. [25] achieved the highest F1-Score using the DWI and ADC modalities, followed by Jeong et al. [27] and the proposed method, as can be observed in Table 8.

4.5. Training Time

The proposed model’s training phase was executed with the aid of a GPU. The GPU programming approach leverages the GPU’s processing units to perform the operations involved in the network’s convolutional layers in parallel, reducing the processing time compared to sequential programming. Consequently, the GPU’s hardware architecture directly impacts the time required per epoch.

Extensive neural network architectures need a higher training time due to their number of parameters, and they are usually trained on specialized architectures that can be rented in terms of computing time. Consequently, the training time of an architecture is an essential factor to consider when it will be implemented in production. The proposed model has 7,939,778 parameters, representing a low number compared to other state-of-the-art methods that require specialized computer architectures with GPUs explicitly oriented to data processing for training, as observed in Table 9.

This number of parameters allows the proposed model to be trained in high-end and mid-range general-purpose GPUs. The average time was recorded in Table 10 to assess this characteristic.

The coronal plane in the ISLES 2015 dataset has the highest training time among the other planes because its training set, being 953, is more extensive than in the other planes. Nevertheless, the training time does not exceed an hour using the mid-range GPU. On the other hand, the training set of the coronal plane in the ISLES 2022 dataset is around 4.7 times larger than the one in ISLES 2015, and its training is at most one and a half hours using the high-end GPU.

5. Discussion

The results presented in the previous section reveal several key insights, which are discussed in this section to help readers better understand the proposed approach’s principal features.

As was mentioned in the related works Section 2, the principal challenge in segmenting ischemic stroke lesions in MRI images lay in the class imbalance resulting from the high disparity between the number of pixels labeled as foreground and the background of the images. An example of this situation can be found in MRI images depicting small lesions. Consequently, diverse state-of-the-art methods struggle to segment them.

Different studies experiment with function losses that try to diminish the impact of the class imbalance in the training phase, such as Karthik et al. [17] and Zhang et al. [23]. Both explored Cross-Entropy and Dice loss functions, which contemplate the probability of the classes mitigating the contribution of the class with major influence. In this work, the proposed method was tested utilizing the General Dice loss and the Focal loss functions, the last one being a ponderation of the cross-entropy loss. The results demonstrate that using the General Dice Focal loss improves segmentation results because the number of non-segmented images is reduced. Moreover, the attention module helped increase the IoU of the segmented images by 0.03 and reduce the HD by 1.09, giving more pixel precision and competing with the method of Liu et al. [9] that also employs an attention module but based on dilated soft mask.

Furthermore, the detail of the IoU and F1 scores of small stroke lesions in Figure 7 demonstrates that the proposed approach reached an IoU greater than 0.8 using FLAIR, DWI, and T2 sequences. This premise is based on the median values and the distribution of the violin plots presented. Besides, this behavior is repeated for the segmentation in the coronal and sagittal plane.

Regarding the performance on another dataset, such as ISLES 2022, it can be observed that the average IoU is lower than the one achieved in the ISLES 2015 dataset in the DWI modality. However, the F1-Scores of 0.7517, 0.5833, and 0.5636 for the axial, coronal, and sagittal planes still represent an acceptable output given the value ranges for the ISLES 2015 reported in Table 7, where the lower value for DWI was 0.58. A point worth noting is that the Figure 9f and Figure 13f,g,j,k reveal the struggle of the proposed model to segment targets with isolated pixels accompanied by pixel blobs of more significant size. This behavior is also reported in the work of Zhang et al. [23]. However, the proposed model successfully segments diminutive lesions, such as the one presented in Figure 13e,i.

The proposed model’s performance on ISLES 2022 using two combinations of hyperparameters concerning the loss function was analyzed. The weights assigned to the FL and GDL functions directly impacted the number of non-segmented images and the predicted segmentation mask’s F1-Scores. However, the difference in the performance was marginal.

It is worth noting that both configurations failed to segment lesions whose sizes are in the range

(0, 278)

. Considering the violin plots in Figure 2, it can be observed that this range corresponds to the Small and Medium Down categories of the mask sizes in the axial view of ISLES 2015. Since the proposed model presented N. S. images in these categories, it can be pointed out that its behavior is consistent when segmenting the images from ISLES 2022.

The non-segmentation of small lesions significantly dropped the mean F1-Scores. Nonetheless, the performance of the proposed model is competitive against other state-of-the-art models evaluated using the ISLES 2022 dataset as shown in Table 8, considering that these methods employed more weighty architectures, such as in the works of Wu et al. [25] and Werdiger et al. [26], or performed transfer learning techniques, as in Jeong et al. [27].

Jeong et al. [27] provided the results of their method trained only with the DWI images from ISLES 2022, which can be directly compared with our proposed model. They achieved a mean F1-Score of 0.7641, while the model proposed reached 0.7517. Since the difference is minimal, it can be inferred that the proposed model could accomplish higher segmentation results by employing transfer learning techniques.

Finally, the proposed approach presents efficient training since its 7,939,778 parameters can be trained overall in less than an hour and a half using around 4500 images in the training set, and the training can be performed in general-purpose mid-range GPUs. Furthermore, the learning curve in Figure 4a shows a fast convergence flattening the learning curves at epoch 125. The version of the proposed model using SGD as the optimizer also converges quickly. Nonetheless, their performance is poor compared with the one obtained using AdamW.

6. Conclusions

Early identification of ischemic strokes reduces the risk of permanent damage and increases the chances of recovery by allowing the implementation of effective medical interventions. The identification requires MRI imaging to evaluate the damage in cerebral tissue. However, annotation in the images represents a high workload for radiologists since it is manually performed and prone to potential human error. Therefore, automatic lesion segmentation could improve the diagnosis and treatment planning due to the delivery of a rapid evaluation of the images, which can assist radiologists in reducing their labor complexity and lead to an early diagnosis.

The proposed method significantly improves the segmentation of ischemic stroke lesions in MRI images, particularly addressing the class imbalance challenge. The approach successfully mitigates the influence of background pixels by utilizing the Generalized Dice Focal loss function, leading to enhanced segmentation accuracy. Incorporating an attention module further improves performance, delivering competitive results compared to state-of-the-art methods.

The method’s effectiveness is highlighted by its ability to segment small stroke lesions with a high Intersection over Union (IoU), especially in FLAIR, DWI, and T2 sequences across different planes. Despite a lower average IoU in the ISLES 2022 dataset, the model still achieves acceptable F1-Scores, underscoring its robustness and reliability. These segmentation results can be leveraged as a first screening of the images for radiologists.

Additionally, the proposed model exhibits efficient training due to a reasonable convergence at 200 epochs and relatively low training time, which makes it suitable for production environments. This efficiency highlights the practicality of the proposed approach for ischemic stroke lesion segmentation, which can be deployed using general-purpose GPUs in non-high-tech environments.

Given the model’s ability to segment lesions in the different anatomical planes of the MRI images and its lightweight feature, future research directives could focus on using the proposed model as a backbone to fuse the features of the anatomical planes to mitigate the error in segmenting isolated target pixels.

Author Contributions

Conceptualization, O.C.-C., B.P.G.-S. and V.P.; methodology, B.P.G.-S., J.A.A.-D. and O.C.-C.; software, B.P.G.-S., J.A.A.-D. and O.C.-C.; validation, B.P.G.-S., J.A.A.-D. and V.P.; formal analysis, B.P.G.-S. and J.A.A.-D.; investigation, O.C.-C., B.P.G.-S. and J.A.A.-D.; resources, V.P., R.R.-R., C.C.-R. and S.S.; data curation, B.P.G.-S.; writing—original draft preparation, O.C.-C., J.A.A.-D. and V.P.; writing—review and editing, B.P.G.-S., V.P. and J.A.A.-D.; visualization, B.P.G.-S.; supervision, B.P.G.-S., J.A.A.-D., V.P., R.R.-R. and C.C.-R.; project administration, V.P., R.R.-R., C.C.-R. and S.S.; funding acquisition, V.P., R.R.-R. and C.C.-R. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets used to evaluate the model are publicly available. The information for downloading them can be found in their corresponding references. The data and code presented in this study are available for academic purposes upon request from the corresponding author.

Acknowledgments

The authors would like to thank the Instituto Politécnico Nacional (IPN) (Mexico), the Comisión de Operación y Fomento de Actividades Académicas (COFAA) of IPN, and the Consejo Nacional de Humanidades, Ciencias y Tecnologías (Mexico) for their support in this work.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Feigin, V.L.; Norrving, B.; Mensah, G.A. Global Burden of Stroke. Circ. Res. 2017, 120, 439–448. [Google Scholar] [CrossRef] [PubMed]
World Health Organization. Stroke, Cerebrovascular Accident. 2024. Available online: https://www.emro.who.int/health-topics/stroke-cerebrovascular-accident/index.html (accessed on 11 August 2024).
Adams, H.P.; Bendixen, B.H.; Kappelle, L.J.; Biller, J.; Love, B.B.; Gordon, D.L.; Marsh, E.E. Classification of subtype of acute ischemic stroke. Definitions for use in a multicenter clinical trial. TOAST. Trial of Org 10172 in Acute Stroke Treatment. Stroke 1993, 24, 35–41. [Google Scholar] [CrossRef] [PubMed]
Hurford, R.; Sekhar, A.; Hughes, T.A.T.; Muir, K.W. Diagnosis and management of acute ischaemic stroke. Pract. Neurol. 2020, 20, 304–316. [Google Scholar] [CrossRef] [PubMed]
Saver, J.L. Time Is Brain—Quantified. Stroke 2006, 37, 263–266. [Google Scholar] [CrossRef] [PubMed]
Guerrero, R.; Qin, C.; Oktay, O.; Bowles, C.; Chen, L.; Joules, R.; Wolz, R.; Valdés-Hernández, M.; Dickie, D.; Wardlaw, J.; et al. White matter hyperintensity and stroke lesion segmentation and differentiation using convolutional neural networks. NeuroImage Clin. 2018, 17, 918–934. [Google Scholar] [CrossRef]
Zhang, J.; Ta, N.; Fu, M.; Tian, F.H.; Wang, J.; Zhang, T.; Wang, B. Use of DWI-FLAIR Mismatch to Estimate the Onset Time in Wake-Up Strokes. Neuropsychiatr. Dis. Treat. 2022, 18, 355–361. [Google Scholar] [CrossRef]
Li, X.; Su, F.; Yuan, Q.; Chen, Y.; Liu, C.Y.; Fan, Y. Advances in differential diagnosis of cerebrovascular diseases in magnetic resonance imaging: A narrative review. Quant. Imaging Med. Surg. 2023, 13, 2712. [Google Scholar] [CrossRef]
Liu, L.; Kurgan, L.; Wu, F.X.; Wang, J. Attention convolutional neural network for accurate segmentation and quantification of lesions in ischemic stroke disease. Med. Image Anal. 2020, 65, 101791. [Google Scholar] [CrossRef]
Litjens, G.; Kooi, T.; Bejnordi, B.E.; Setio, A.A.A.; Ciompi, F.; Ghafoorian, M.; van der Laak, J.A.; van Ginneken, B.; Sánchez, C.I. A survey on deep learning in medical image analysis. Med. Image Anal. 2017, 42, 60–88. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention, Proceedings of the MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar] [CrossRef]
Mahmood, Q.; Basit, A. Automatic Ischemic Stroke Lesion Segmentation in Multi-spectral MRI Images Using Random Forests Classifier. In Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries; Springer International Publishing: Cham, Switzerland, 2016; pp. 266–274. [Google Scholar] [CrossRef]
Maier, O.; Menze, B.H.; von der Gablentz, J.; Häni, L.; Heinrich, M.P.; Liebrand, M.; Winzeck, S.; Basit, A.; Bentley, P.; Chen, L.; et al. ISLES 2015—A public evaluation benchmark for ischemic stroke lesion segmentation from multispectral MRI. Med. Image Anal. 2017, 35, 250–269. [Google Scholar] [CrossRef]
Havaei, M.; Davy, A.; Warde-Farley, D.; Biard, A.; Courville, A.; Bengio, Y.; Pal, C.; Jodoin, P.M.; Larochelle, H. Brain tumor segmentation with Deep Neural Networks. Med. Image Anal. 2017, 35, 18–31. [Google Scholar] [CrossRef] [PubMed]
Oktay, O.; Schlemper, J.; Folgoc, L.; Lee, M.; Heinrich, M.; Misawa, K.; Mori, K.; McDonagh, S.; Hammerla, N.; Kainz, B.; et al. Attention U-Net: Learning Where to Look for the Pancreas. arXiv 2018, arXiv:1804.03999. [Google Scholar] [CrossRef]
Liu, L.; Wu, F.X.; Wang, J. Efficient multi-kernel DCNN with pixel dropout for stroke MRI segmentation. Neurocomputing 2019, 350, 117–127. [Google Scholar] [CrossRef]
Karthik, R.; Gupta, U.; Jha, A.; Rajalakshmi, R.; Menaka, R. A deep supervised approach for ischemic lesion segmentation from multimodal MRI using Fully Convolutional Network. Appl. Soft Comput. 2019, 84, 105685. [Google Scholar] [CrossRef]
Abdmouleh, N.; Echtioui, A.; Kallel, F.; Hamida, A.B. Modified U-Net Architeture based Ischemic Stroke Lesions Segmentation. In Proceedings of the 2022 IEEE 21st international Conference on Sciences and Techniques of Automatic Control and Computer Engineering (STA), Sousse, Tunisia, 19–21 December 2022. [Google Scholar] [CrossRef]
Aboudi, F.; Drissi, C.; Kraiem, T. A Hybrid Model for Ischemic Stroke Brain Segmentation from MRI Images using CBAM and ResNet50-Unet. Int. J. Adv. Comput. Sci. Appl. 2024, 15, 950. [Google Scholar] [CrossRef]
Shah, P.M.; Khan, H.; Shafi, U.; Islam, S.u.; Raza, M.; Son, T.T.; Le-Minh, H. 2D-CNN Based Segmentation of Ischemic Stroke Lesions in MRI Scans. In Advances in Computational Collective Intelligence; Springer International Publishing: Cham, Switzerland, 2020; pp. 276–286. [Google Scholar] [CrossRef]
Kamnitsas, K.; Chen, L.; Ledig, C.; Rueckert, D.; Glocker, B. Multi-scale 3D convolutional neural networks for lesion segmentation in brain MRI. In Proceedings of the Ischemic Stroke Lesion Segmentation, Munich, Germany, 5 October 2015. [Google Scholar]
Kamnitsas, K.; Ledig, C.; Newcombe, V.F.; Simpson, J.P.; Kane, A.D.; Menon, D.K.; Rueckert, D.; Glocker, B. Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation. Med. Image Anal. 2017, 36, 61–78. [Google Scholar] [CrossRef]
Zhang, R.; Zhao, L.; Lou, W.; Abrigo, J.M.; Mok, V.C.T.; Chu, W.C.W.; Wang, D.; Shi, L. Automatic Segmentation of Acute Ischemic Stroke From DWI Using 3-D Fully Convolutional DenseNets. IEEE Trans. Med. Imaging 2018, 37, 2149–2160. [Google Scholar] [CrossRef]
Zhang, L.; Song, R.; Wang, Y.; Zhu, C.; Liu, J.; Yang, J.; Liu, L. Ischemic Stroke Lesion Segmentation Using Multi-Plane Information Fusion. IEEE Access 2020, 8, 45715–45725. [Google Scholar] [CrossRef]
Wu, Z.; Zhang, X.; Li, F.; Wang, S.; Huang, L.; Li, J. W-Net: A boundary-enhanced segmentation network for stroke lesions. Expert Syst. Appl. 2023, 230, 120637. [Google Scholar] [CrossRef]
Werdiger, F.; Yogendrakumar, V.; Visser, M.; Kolacz, J.; Lam, C.; Hill, M.; Chen, C.; Parsons, M.W.; Bivard, A. Clinical performance review for 3-D Deep Learning segmentation of stroke infarct from diffusion-weighted images. Neuroimage Rep. 2024, 4, 100196. [Google Scholar] [CrossRef]
Jeong, H.; Lim, H.; Yoon, C.; Won, J.; Lee, G.Y.; de la Rosa, E.; Kirschke, J.S.; Kim, B.; Kim, N.; Kim, C. Robust Ensemble of Two Different Multimodal Approaches to Segment 3D Ischemic Stroke Segmentation Using Brain Tumor Representation Among Multiple Center Datasets. J. Imaging Inform. Med. 2024. [Google Scholar] [CrossRef] [PubMed]
Isensee, F.; Jaeger, P.F.; Kohl, S.A.A.; Petersen, J.; Maier-Hein, K.H. nnU-Net: A self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods 2020, 18, 203–211. [Google Scholar] [CrossRef] [PubMed]
Umirzakova, S.; Ahmad, S.; Mardieva, S.; Muksimova, S.; Whangbo, T.K. Deep learning-driven diagnosis: A multi-task approach for segmenting stroke and Bell’s palsy. Pattern Recognit. 2023, 144, 109866. [Google Scholar] [CrossRef]
Tian, Y.; Zhang, Y.; Zhang, H. Recent Advances in Stochastic Gradient Descent in Deep Learning. Mathematics 2023, 11, 682. [Google Scholar] [CrossRef]
Loshchilov, I.; Hutter, F. Fixing Weight Decay Regularization in Adam. arXiv 2017, arXiv:1711.05101. [Google Scholar] [CrossRef]
Yeung, M.; Sala, E.; Schönlieb, C.B.; Rundo, L. Unified Focal loss: Generalising Dice and cross entropy-based losses to handle class imbalanced medical image segmentation. Comput. Med. Imaging Graph. 2022, 95, 102026. [Google Scholar] [CrossRef]
Sudre, C.H.; Li, W.; Vercauteren, T.; Ourselin, S.; Jorge Cardoso, M. Generalised Dice Overlap as a Deep Learning Loss Function for Highly Unbalanced Segmentations. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support; Springer International Publishing: Cham, Switzerland, 2017; pp. 240–248. [Google Scholar] [CrossRef]
Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollar, P. Focal Loss for Dense Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 318–327. [Google Scholar] [CrossRef]
Cardoso, M.J.; Li, W.; Brown, R.; Ma, N.; Kerfoot, E.; Wang, Y.; Murrey, B.; Myronenko, A.; Zhao, C.; Yang, D.; et al. MONAI: An open-source framework for deep learning in healthcare. arXiv 2022, arXiv:2211.02701. [Google Scholar] [CrossRef]
Hernandez Petzsche, M.R.; de la Rosa, E.; Hanning, U.; Wiest, R.; Valenzuela, W.; Reyes, M.; Meyer, M.; Liew, S.L.; Kofler, F.; Ezhov, I.; et al. ISLES 2022: A multi-center magnetic resonance imaging stroke lesion segmentation dataset. Sci. Data 2022, 9, 762. [Google Scholar] [CrossRef]
Buslaev, A.; Iglovikov, V.I.; Khvedchenya, E.; Parinov, A.; Druzhinin, M.; Kalinin, A.A. Albumentations: Fast and Flexible Image Augmentations. Information 2020, 11, 125. [Google Scholar] [CrossRef]
Aboudi, F.; Drissi, C.; Kraiem, T. Efficient U-Net CNN with Data Augmentation for MRI Ischemic Stroke Brain Segmentation. In Proceedings of the 2022 8th International Conference on Control, Decision and Information Technologies (CoDIT), Istanbul, Turkey, 17–20 May 2022. [Google Scholar] [CrossRef]
Kumar, A.; Debnath, A.; Tejaswini, T.; Gupta, S.; Chakraborty, B.; Nandi, D. Automatic Detection of Ischemic Stroke Lesion from Multimodal MR Image. In Proceedings of the 2019 Fifth International Conference on Image Information Processing (ICIIP), Shimla, India, 15–17 November 2019. [Google Scholar] [CrossRef]

Figure 1. Scheme of the proposed model.

Figure 2. Distribution of segmentation masks’ sizes (in pixels) with annotation of the first quartile (Q1), median (Q2), and third quartile (Q3): (a) ISLES 2015, (b) ISLES 2022.

Figure 3. F1-Scores resulting from changing key hyperparameters

λ_{F L} = FL

,

λ_{G D L} = GDL

. The combination leading to the best results is highlighted in orange. (a) Experiments performed on FLAIR sequences of ISLES 2015. (b) Experiments performed on DWI modality of ISLES 2022.

Figure 3. F1-Scores resulting from changing key hyperparameters

λ_{F L} = FL

,

λ_{G D L} = GDL

. The combination leading to the best results is highlighted in orange. (a) Experiments performed on FLAIR sequences of ISLES 2015. (b) Experiments performed on DWI modality of ISLES 2022.

Figure 4. Learning curves comparison: (a) Proposed model, (b) SGD, (c) W/O A.M., (d) CBAM, (e) Dice Loss, (f) Focal Loss.

Figure 5. Visual comparison of the model’s versions results: Ground truth masks are displayed in the first column (a,g,m,s). Results of Proposed model are given in the second column (b,h,n,t), of Dice Loss model in the third column (c,i,o,u), of Focal Loss model in the fourth column (d,j,p,v), of W/O A.M. model in the fifth column (e,k,q,w), and of CBAM model in the sixth column (f,l,r,x).

Figure 6. Violin plot of the proposed model’s results on FLAIR images using the axial view, where the dot localizes the median, and the white line represents the mean: (a) IoU scores by mask’s size category, (b) F1-Scores by mask’s size category.

Figure 7. Performance of the proposed model in segmenting small lesions on different MRI modalities using the ISLES 2015 dataset (dot and white line represent the median and the mean values): (a) IoU scores by MRI modality, (b) F1-Scores by MRI modality.

Figure 8. Overall performance of the proposed model on different MRI modalities using the ISLES 2015 dataset (dot and white line represent the median and the mean values): (a) IoU scores in the coronal plane, (b) IoU scores in the sagittal plane.

Figure 9. Examples of segmented FLAIR images in the coronal plane by the proposed method (second row) and their corresponding ground truth mask (first row) for mask categories Small (a,e), Medium Down (b,f), Medium Up (c,g), and Large (d,h).

Figure 10. Examples of segmented FLAIR images in the sagittal plane by the proposed method (second row) and their corresponding ground truth mask (first row) for mask categories Small (a,e), Medium Down (b,f), Medium Up (c,g), and Large (d,h).

Figure 11. Violin plot of the proposed model’s results on DWI and ADC images using the axial view, where the dot localizes the median, and the white line represents the mean: (a) F1-Scores by mask size category using DWI and configuration A (FL = 0.7, GDL = 0.3), (b) F1-Scores by mask size category using DWI and configuration B (FL = 0.9, GDL = 0.1), (c) F1-Scores by mask size category using ADC and configuration A (FL = 0.7, GDL = 0.3), (d) F1-Scores by mask size category using ADC and configuration B (FL = 0.9, GDL = 0.1).

Figure 12. Violin plot of the non-segmented images’ mask size in pixels. Mean value is marked as a white line.

Figure 13. Examples of ground truth mask of DWI images in the axial plane (first row) and the segmentation results by the proposed method using

λ_{G D L} = 0.3

,

λ_{F L} = 0.7

(second row) and

λ_{G D L} = 0.1

,

λ_{F L} = 0.9

(third row) for mask categories Small (a,e,i), Medium Down (b,f,j), Medium Up (c,g,k), and Large (d,h,l).

Figure 13. Examples of ground truth mask of DWI images in the axial plane (first row) and the segmentation results by the proposed method using

λ_{G D L} = 0.3

,

λ_{F L} = 0.7

(second row) and

λ_{G D L} = 0.1

,

λ_{F L} = 0.9

(third row) for mask categories Small (a,e,i), Medium Down (b,f,j), Medium Up (c,g,k), and Large (d,h,l).

Table 1. Number of samples for training, validation, and testing using ISLES 2015 in different planes.

Plane	Mask Size	Training	Validation	Testing	Total
Axial	Small	190	48	98	336
	Medium Down	189	48	105	342
	Medium Up	190	47	102	339
	Large	190	47	102	339
	Total	759	190	407	1356
Coronal	Small	237	59	127	423
	Medium Down	239	60	128	427
	Medium Up	238	60	127	425
	Large	239	59	128	426
	Total	953	238	510	1701
Sagittal	Small	142	36	76	254
	Medium Down	143	36	76	255
	Medium Up	142	36	76	254
	Large	143	35	77	255
	Total	570	143	305	1018

Table 2. Number of samples for training, validation, and testing using ISLES 2022 in different planes.

Plane	Mask Size	Training	Validation	Testing	Total
Axial	Small	670	168	359	1197
	Medium Down	681	170	365	1216
	Medium Up	675	169	362	1206
	Large	677	169	362	1208
	Total	2703	676	1448	4827
Coronal	Small	1060	264	568	1892
	Medium Down	1166	292	625	2083
	Medium Up	1126	282	603	2011
	Large	1121	280	601	2002
	Total	4473	1118	2397	7988
Sagittal	Small	854	213	458	1525
	Medium Down	849	213	455	1517
	Medium Up	867	217	464	1548
	Large	857	214	459	1530
	Total	3427	857	1836	6120

Table 3. Features description of the model’s versions used for the ablation test.

Notation	Attention Module	Loss Function	Optimizer
Proposed	AM	GDFL	AdamW
W/O A. M.	NO	GDFL	AdamW
CBAM	CBAM	GDFL	AdamW
Dice Loss	AM	Dice loss	AdamW
Focal Loss	AM	Focal loss	AdamW
SGD	AM	GDFL	SGD

Table 4. Performance comparison of the model’s version used for the ablation experiment in format mean ± standard deviation.

Metric	Proposed	W/O A. M.	CBAM	Dice Loss	Focal Loss	SGD
IoU	0.8596 ± 0.1598	0.8239 ± 0.1593	0.8553 ± 0.1601	0.8258 ± 0.1798	0.8300 ± 0.1797	0.4004 ± 0.2796
F1-Score	0.9129 ± 0.1362	0.8917 ± 0.1364	0.9106 ± 0.1345	0.8893 ± 0.1570	0.8922 ± 0.1549	0.5079 ± 0.3244
HD	4.09 ± 7.67	5.19 ± 8.27	4.24 ± 8.17	4.94 ± 8.97	3.77 ± 6.59	31.24 ± 23.98
N. S.	4.10 ± 2.70	4.30 ± 2.65	3.20 ± 2.27	6.10 ± 4.04	5.80 ± 1.78	86.10 ± 2.98
N. S. Small	3.80 ± 2.52	4.00 ± 2.68	2.70 ± 2.28	5.80 ± 3.99	5.40 ± 1.91	70.60 ± 1.50
N. S. M. D.	0.30 ± 0.90	0.30 ± 0.64	0.40 ± 0.80	0.30 ± 0.90	0.40 ± 0.92	15.50 ± 2.62

Table 5. Performance of the proposed model in different MRI modalities and planes using the ISLES 2015 dataset.

Plane	Metric	FLAIR	DWI	T1	T2
Axial	IoU	0.8596 ± 0.1598	0.8524 ± 0.1562	0.8355 ± 0.1926	0.8441 ± 0.1959
	F1-Score	0.9129 ± 0.1362	0.9096 ± 0.1290	0.8916 ± 0.1783	0.8966 ± 0.1809
	HD	4.09 ± 7.67	4.13 ± 6.94	5.12 ± 9.24	4.98 ± 8.96
	N. S.	4.10 ± 2.70	2.70 ± 2.05	11.20 ± 6.35	10.70 ± 4.05
Coronal	IoU	0.8550 ± 0.1658	0.8491 ± 0.1660	0.8447 ± 0.1846	0.8532 ± 0.1973
	F1-Score	0.9094 ± 0.1425	0.9058 ± 0.1418	0.8995 ± 0.1651	0.9017 ± 0.1816
	HD	4.74 ± 6.88	5.58 ± 9.34	5.95 ± 9.67	5.20 ± 8.13
	N. S.	5.00 ± 2.94	5.00 ± 2.16	7.67 ± 2.49	12.33 ± 2.49
Sagittal	IoU	0.8350 ± 0.2087	0.8238 ± 0.1978	0.8040 ± 0.2206	0.8038 ± 0.2366
	F1-Score	0.8885 ± 0.1933	0.8847 ± 0.1752	0.8667 ± 0.2030	0.8623 ± 0.2235
	HD	8.20 ± 14.44	8.63 ± 14.02	10.38 ± 17.49	7.60 ± 12.10
	N. S.	8.33 ± 0.94	4.33 ± 0.47	8.33 ± 2.87	13.33 ± 4.19

Table 6. Performance of the proposed model in different MRI modalities and planes using the ISLES 2022 dataset.

Plane	Metric	DWI	ADC
Axial	IoU	0.6961 ± 0.3414	0.5505 ± 0.3666
	F1-Score	0.7517 ± 0.3470	0.6192 ± 0.3884
	HD	9.90 ± 17.56	16.73 ± 23.36
	N. S.	121 ± 9	253 ± 13
Coronal	IoU	0.5833 ± 0.4000	0.4954 ± 0.3918
	F1-Score	0.6318 ± 0.4187	0.5535 ± 0.4193
	HD	10.73 ± 19.32	18.08 ± 26.32
	N. S.	188 ± 19	364 ± 27
Sagittal	IoU	0.5636 ± 0.3926	0.4554 ± 0.3740
	F1-Score	0.6176 ± 0.4147	0.5211 ± 0.4103
	HD	14.02 ± 23.21	23.73 ± 32.06
	N. S.	133 ± 25	318 ± 8

Table 7. Comparison of the proposed method with the reported results of state-of-the-art methods on the ISLES 2015 dataset. The best values are given in bold text.

Method	MRI Modality	F1-Score	HD	Accuracy	Precision	Sensitivity	Specificity
Liu et al. [9]	FLAIR	0.7178	3.36	-	-	-	-
Liu et al. [9]	FLAIR-DWI	0.7639	3.19	-	-	-	-
Zhang et al. [23]	DWI	0.58	38.98	-	0.60	0.68	-
Zhang et al. [24]	DWI	0.6220	-	0.9998	-	0.7322	0.9997
Karthik et al. [17]	Multi	0.7008	-	-	-	-	-
Shah et al. [20]	Multi	0.7156	-	-	-	-	-
Mahmood and Basit [12]	Multi	0.54	-	-	0.67	0.5	-
Aboudi et al. [38]	Multi	0.5577	-	0.9996	0.9977	-	-
Aboudi et al. [19]	Multi	0.7960	-	0.9956	0.9712	-	-
Abdmouleh et al. [18]	FLAIR	0.8135	-	0.9673	-	0.8007	0.9962
	DWI	0.6928	-	0.9649	-	0.6069	0.9967
	T1	0.7000	-	0.9651	-	0.6301	0.9961
	T2	0.7072	-	0.965	-	0.6443	0.9961
Kumar et al. [39]	FLAIR	0.8289	-	-	-	-	-
	DWI	0.7029	-	-	-	-	-
	T1	0.7015	-	-	-	-	-
	T2	0.7368	-	-	-	-	-
Proposed	FLAIR	0.9129	4.09	0.9986	0.9152	0.9240	0.9992
	DWI	0.9096	4.13	0.9985	0.9084	0.9263	0.9991
	T1	0.8916	5.12	0.9984	0.8870	0.9088	0.999
	T2	0.8966	4.98	0.9984	0.8968	0.9092	0.9991

Table 8. Comparison of the proposed method with the reported results of state-of-the-art methods on the ISLES 2022 dataset.

Criterion	Wu et al. [25]	Werdiger et al. [26]	Jeong et al. [27]		Proposed
F1-Score	0.8560	0.6940	0.7869	0.7641	0.7517	0.6192
MRI Modality	Multi	Multi	Multi	DWI	DWI	ADC

Table 9. Complexity comparison with state-of-the-art models.

Method	Parameters (M)
Wu et al. [25]	119.0
Werdiger et al. [26]	22.3
Proposed	7.9

Table 10. Average training time of the proposed model in different datasets and planes.

Dataset	GPU Model	Axial	Coronal	Sagittal
ISLES 2015	GeForce RTX 3070	34 m 32 s	41 m 08 s	24 m 55 s
ISLES 2015	GeForce RTX 3090	16 m 26 s	20 m 07 s	11 m 59 s
ISLES 2022	GeForce RTX 3090	53 m 26 s	1 h 16 m 37 s	1 h 16 m 39 s

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Garcia-Salgado, B.P.; Almaraz-Damian, J.A.; Cervantes-Chavarria, O.; Ponomaryov, V.; Reyes-Reyes, R.; Cruz-Ramos, C.; Sadovnychiy, S. Enhanced Ischemic Stroke Lesion Segmentation in MRI Using Attention U-Net with Generalized Dice Focal Loss. Appl. Sci. 2024, 14, 8183. https://doi.org/10.3390/app14188183

AMA Style

Garcia-Salgado BP, Almaraz-Damian JA, Cervantes-Chavarria O, Ponomaryov V, Reyes-Reyes R, Cruz-Ramos C, Sadovnychiy S. Enhanced Ischemic Stroke Lesion Segmentation in MRI Using Attention U-Net with Generalized Dice Focal Loss. Applied Sciences. 2024; 14(18):8183. https://doi.org/10.3390/app14188183

Chicago/Turabian Style

Garcia-Salgado, Beatriz P., Jose A. Almaraz-Damian, Oscar Cervantes-Chavarria, Volodymyr Ponomaryov, Rogelio Reyes-Reyes, Clara Cruz-Ramos, and Sergiy Sadovnychiy. 2024. "Enhanced Ischemic Stroke Lesion Segmentation in MRI Using Attention U-Net with Generalized Dice Focal Loss" Applied Sciences 14, no. 18: 8183. https://doi.org/10.3390/app14188183

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhanced Ischemic Stroke Lesion Segmentation in MRI Using Attention U-Net with Generalized Dice Focal Loss

Abstract

1. Introduction

2. Related Works

3. Materials and Methods

3.1. Overview of the Proposed Approach

3.1.1. Network Architecture

3.1.2. Loss Function and Optimizer

3.2. Experimental Setup

3.2.1. Datasets

3.2.2. Data Augmentation

3.2.3. Hyperparameter Optimization

3.2.4. Evaluation Metrics

4. Results

4.1. Ablation Testing

4.2. Performance on Different MRI Modalities and Planes

4.3. Performance on the ISLES 2022 Dataset

4.4. Comparison with State-of-the-Art Methods

4.5. Training Time

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI