Brain Tumor Segmentation from Optimal MRI Slices Using a Lightweight U-Net

Hernandez-Gutierrez, Fernando Daniel; Avina-Bravo, Eli Gabriel; Zambrano-Gutierrez, Daniel F.; Almanza-Conejo, Oscar; Ibarra-Manzano, Mario Alberto; Ruiz-Pinales, Jose; Ovalle-Magallanes, Emmanuel; Avina-Cervantes, Juan Gabriel

doi:10.3390/technologies12100183

Open AccessArticle

Brain Tumor Segmentation from Optimal MRI Slices Using a Lightweight U-Net

by

Fernando Daniel Hernandez-Gutierrez

¹

,

Eli Gabriel Avina-Bravo

^1,2

,

Daniel F. Zambrano-Gutierrez

³

,

Oscar Almanza-Conejo

¹

,

Mario Alberto Ibarra-Manzano

¹

,

Jose Ruiz-Pinales

¹

,

Emmanuel Ovalle-Magallanes

^4,*

and

Juan Gabriel Avina-Cervantes

^1,*

¹

Telematics and Digital Signal Processing Research Groups (CAs), Engineering Division, Campus Irapuato-Salamanca, University of Guanajuato, Salamanca 36885, Mexico

²

Institute of Advanced Materials for the Sustainable Manufacturing, Tecnológico de Monterrey, Mexico City 14380, Mexico

³

School of Engineering and Sciences, Tecnológico de Monterrey, Av. Eugenio Garza Sada 2501 Sur, Tecnológico, Monterrey 64849, Mexico

⁴

Dirección de Investigación y Doctorado, Facultad de Ingenierías y Tecnologías, Universidad La Salle Bajío, Av. Universidad 602, Col. Lomas del Campestre, León 37150, Mexico

^*

Authors to whom correspondence should be addressed.

Technologies 2024, 12(10), 183; https://doi.org/10.3390/technologies12100183

Submission received: 15 July 2024 / Revised: 22 September 2024 / Accepted: 27 September 2024 / Published: 1 October 2024

Download

Browse Figures

Versions Notes

Abstract

The timely detection and accurate localization of brain tumors is crucial in preserving people’s quality of life. Thankfully, intelligent computational systems have proven invaluable in addressing these challenges. In particular, the UNET model can extract essential pixel-level features to automatically identify the tumor’s location. However, known deep learning-based works usually directly feed the 3D volume into the model, which causes excessive computational complexity. This paper presents an approach to boost the UNET network, reducing computational workload while maintaining superior efficiency in locating brain tumors. This concept could benefit portable or embedded recognition systems with limited resources for operating in real time. This enhancement involves an automatic slice selection from the MRI T2 modality volumetric images containing the most relevant tumor information and implementing an adaptive learning rate to avoid local minima. Compared with the original model (7.7 M parameters), the proposed UNET model uses only 2 M parameters and was tested on the BraTS 2017, 2020, and 2021 datasets. Notably, the BraTS2021 dataset provided outstanding binary metric results: 0.7807 for the Intersection Over the Union (IoU), 0.860 for the Dice Similarity Coefficient (DSC), 0.656 for the Sensitivity, and 0.9964 for the Specificity compared to vanilla UNET.

Keywords:

convolutional neural networks; semantic segmentation; magnetic resonance image; UNET

1. Introduction

A brain tumor is characterized by the uncontrolled growth of cells, which can exert pressure on the cranial cavity and lead to seizures or even death. The proper functioning of the nervous system is closely correlated with the tumor’s location, type, and size [1]. Consequently, the early detection of abnormal tissues in brain conditions is crucial. The World Health Organization (WHO) categorizes tumors as either malignant or benign with four grades of malignancy. Grades I and II are considered benign, while grades III and IV are classified as malignant. Although benign tumors typically exhibit slow growth, they can only pose a danger if left untreated [2]. According to the WHO, cancers are the third cause of death behind heart disease and respiratory infections in 2022. Cancers are responsible for 9.7 million deaths per year; specifically, Brain and Central Nervous System (CNS) cancers represent almost 250,000 deaths per year [3]. The term “gliomas” refers to primary cerebral tumors that develop in cellular tissues, blood vessels, and other nerves, which are usually benign or noncancerous. Conversely, “meningiomas” originate from the membranes that cover the brain and the surrounding central nervous system; they tend to be malignant or cancerous [4]. Nowadays, with technological advancements, capturing the brain’s internal structure during disease detection and analysis can now be achieved with minimal collateral damage. These procedures utilize advanced technologies, such as magnetic resonance imaging (MRI), which employs powerful magnets and radio waves to generate brain images without harmful ionizing radiation.

For this reason, delineating or identifying the tumor area poses a challenge for experts, as it relies on the clinician’s expertise and visual acuity to discern the suspected tumors in the MRI images [5]. The scientific community has recognized the crucial role of early tumor detection in saving lives; substantial efforts have been devoted to developing intelligent systems in healthcare, specifically for brain cancer. Unlike X-rays, MRI allows for visualizing internal body structures without being hazardous. One of the primary uses of MRI in the medical field is to precisely image the internal structures of the brain, specifically the soft tissues. These images present three-dimensional scans of the patient’s brain, which can be viewed in any of its three corresponding image planes: axial, coronal, and sagittal [6], as shown in Figure 1.

However, manual tumor detection and segmentation based on imaging and human interpretation are challenging; for instance, tumors can have a mixture of low and high-grade characteristics. Images may be also blurred, which can cause tumors to merge with healthy tissue. For these reasons, computed-aided diagnosis systems have been developed as fully and fast automatic diagnostic methods to improve detection accuracy. In the last decade, MRI images processed with deep learning (DL) systems have been utilized to develop precise segmentation frameworks for estimating the location and dimensions of brain tumors. These algorithms employed the UNET architecture as a backbone network; however, utilizing UNET can be challenging due to potential underfitting and overfitting issues, requiring powerful computer resources.

The main contributions of this paper are summarized below:

1.: An automatic preprocessing of 3D MRI images to extract the most significant cross-sectional T2 image modality.
2.: A lightweight UNET for brain tumor segmentation.
3.: Restructuring a modified and lightweight UNET network for biomedical image segmentation and tumor detection, particularly MRIs.
4.: Optimizing UNET by reducing convolutional filters for a lighter and portable architecture.
5.: Implementing an Adaptive Learning Rate strategy to minimize the cost function optimally.

2. Literature Review

Several studies have examined machine and deep learning techniques for medical image segmentation [7] along with segmenting brain tumors, with the latter gaining prominence. Razzak et al. [8] developed a convolutional neural network called the Two-Pathway-Group Convolutional Neural Network (2PG-CNN), simultaneously incorporating local and global features. Such a study utilized the BraTS 2013 and 2015 databases with an image adaptation to 2D. They set the learning rate to 0.005 with a decay of 0.1. The 2PG-CNN network represents an improvement over the original Two-Pathway CNN, as it reduces instabilities and overfitting parameter sharing by implementing equivariance in the network.

Moreover, Hao et al. [9] proposed a hybrid approach that integrates maximum and average pooling within the convolution process during downsampling to prevent feature losses associated with traditional pooling operations. This approach required fine tuning the kernel pooling weights to extract discriminant features and was used to analyze the BraTS 2018 and 2019 databases using Fully Convolutional Networks (FCN), UNET, and UNET++. The results showed significant improvements across benchmarks for similar literature approaches. In the MRI analysis, Aghalari et al. [10] introduced an enhanced version of the UNET model to tackle the problem of linear blocks in the original model. This approach removed slices containing exclusively black pixels across the entire image and omitted the T1 sequence due to its low contrast. Instead, they incorporated other MRI modalities, such as T1C, FLAIR, and T2. Similarly, Walsh et al. [6] introduced a lightweight UNET that effectively trained the network, using Binary Cross-Entropy Loss as the cost function and accommodating variable prediction image sizes. The network effectively converts the 3D BITE database into 2D images, presenting superior performance in detecting brain malformations compared to algorithms such as K-Means, Fuzzy C-Means, LinktNet, and standard UNET.

Meanwhile, Ottom et al. [11] innovatively merged the principles of Adversarial Networks (ANs) and UNETs. So, the skip connection concept and concatenated tensors from the AN were integrated with the UNET encoder–decoder architecture, creating the Z-Net model. The results are promising, as the proposed network demonstrated superior performance to the conventional UNET. However, the Z-Net model required a substantial dataset with reliable ground truths for practical training. Similarly, Ahmad et al. [12] recommended using a non-weighted loss function to minimize false positives and mitigate the impact of large weight hyper-parameters associated with lesions. They introduced an enhanced version of the UNET network named MH UNET, which incorporates dense connected blocks in the encoder–decoder architecture to reduce training parameters. Additionally, they designed a hierarchical block between the encoder and decoder, leading to improved performance that surpassed the results achieved with the BraTS 2018, 2019, and 2020 datasets.

Latif et al. [13] presented a modified version of the conventional UNET and Inception-UNET network architecture, MI-UNET. This approach involved integrating INCEPTION modules into the network and employing varying filter sizes in each phase. The choice of filter size was based on the spatial concentration of the slices to capture features more effectively. In addition, the method used 3D slice-by-slice images (155 slices), incorporating three loss functions to address data imbalance: cross-entropy, weighted cross-entropy, and soft-dice score. Comparatively, the cross-entropy and weighted cross-entropy loss functions yielded superior results to the soft-dice score.

More recently, Montaha et al. [14] proposed selecting the most optimal slice from the 3D images in the BraTS2020 database to improve computational efficiency and reduce memory usage. The 2D UNET model was used to identify tumor regions in the T1 MRI sequence images, testing the Adaptive Moment Estimation (ADAM) optimizer with a learning rate of

0.001

. Ranjbarzadeh et al. [15] proposed an optimized CNN with an Improved Chimp Optimization Algorithm (IChOA) integrating Multi-Modality MRI images, using four different MRI sequence images (T1, Flair, T1ce, and T2) to enhance brain tumor segmentation accuracy in the BraTS 2018 dataset. A preprocessing including a Z-score normalization and manual threshold created binary masks to delineate the same object in the original image. This strategy permitted balancing data and removing correlated data to extract 17 relevant features from several brain regions. In the same context, Ghazouani et al. [16] proposed a multi-modality model for brain tumor segmentation on the public BraTS 2021 that combines Transformers and CNN modules as an encoder–decoder structure. So, the encoder incorporates enhanced local self-attention transformer blocks to improve the embedded local information. Then, the extracted features fed the decoder part via skip connections. All input 3D images were normalized by Z-score in the four MRI modalities. In the same thought, Yue et al. [17] introduced the ACFNet using three parallel streams, each adopting a similar encoder–decoder structure to the U-Net. Specifically, the upper and lower streams generate coarse predictions from T1 and T1ce and T2 and FLAIR modality. In contrast, the middle stream employs an adaptive feature fusion model to bridge the gap between them and improve the prediction. For the training and testing process, they extracted 2D slices from the axial planes in the BraTS 2020 dataset to rebuild the dataset and solve a lightweight architecture. Each 2D slice is cropped to

224 \times 224

to remove redundant background regions and normalized between

[0, 1]

.

In summary, Table 1 presents an overview of the various datasets and methodologies used in the recent literature to detect brain tumors, considering different BraTS datasets.

3. Mathematical Background

This section briefly explains the main components of the UNET architecture and the cyclical learning rate paradigm in the optimization process.

3.1. Convolutional Layer

In a Convolutional Neural Network (CNN), the input to a convolutional layer is represented by an n-dimensional array called a tensor, which typically corresponds to an image or a feature map denoted as I. This layer applies a set of learnable filters (kernels) K to the input tensor to produce an output feature map S. Each kernel convolves the input tensor and calculates a scalar product between its weights and a local input region, as shown in Figure 2.

In the case of a single kernel and a single channel input tensor, the convolution process is described by

S (i, j) = (I * K) (i, j) = \sum_{m} \sum_{n} I (m, n) K (i - m, j - n),

(1)

where ∗ is the convolution operator. The output feature maps generated by a convolutional layer can extract intricate and diverse patterns and features of the input data. Subsequently, these feature maps are transmitted to the succeeding layer within the network for additional processing.

3.2. Pooling Layer

CNNs commonly incorporate pooling layers to reduce the spatial dimensions (height and width) of the feature maps produced by the convolutional layers. The main objective of a pooling layer is to reduce the input data’s size through downsampling and minimize the computational workload for subsequent layers. In addition, the pooling layer functions independently on each feature map, using a sliding window across the input feature map. It can reduce overfitting by offering a type of regularization, introducing a level of translation invariance within the input data. The most common methods are max pooling,

f_{m a x} (x) = max {x_{i}}, \forall i \in [1, 2, \dots, N],

(2)

which selects the highest intensity value from a region, highlighting edges within an image and avoiding undesirable features of the image background, and average pooling,

f_{a v g} (x) = E {| x |} = \frac{1}{N} \sum_{i = 1}^{N} | x_{i} |,

(3)

where

x_{i}

represents the pixels from the kernel region of size N, and

E {\cdot}

is the mathematical expectation.

3.3. Dropout Layer

A dropout layer is a regularization technique employed in deep neural networks to prevent overfitting, where an input unit of a feature map or feature vector is dropped out with probability p, which is equivalent to sampling using a Bernoulli distribution with parameter p. In a 2D convolutional context, the dropout operation can be considered as a multiplication by a binary mask with entries sampled from a Bernoulli distribution [18]. Thus, the convolutional layer dropout is formally described as follows,

S (i, j) = ((I \cdot δ)) * K) (i, j),

(4)

where I corresponds to the feature map being processed in each layer,

δ

is the dropout operation, following a Bernoulli distribution, and

K (i, j)

the convolution operation between the masked image

I \cdot δ

and the filter K. This strategy may turn off many layer pixels in the training stage with a probability p. This process allows for the efficient estimation of the average of all possibly trained dropout networks and reduces the possibility of overfitting.

3.4. Batch Normalization

During the training process, within the intermediate convolutional layers, the weights undergo updates relative to the current input data, resulting in varying weight distributions, promoting the occurrence of Internal Covariate Shifts (ICSs) [19]. To avoid this problem, Batch Normalization (BN) applies a transformation that maintains the mean output (

μ

) near zero and their standard deviation (

σ

) close to 1.

Without BN, the statistics of activations in each layer can also exhibit heavy-tailed behavior. Ergo, some neurons can become highly active while others remain almost dormant, resulting in a highly skewed distribution [20]. The equation for batch normalization is formulated as follows:

B N {(y_{i})}^{b} = γ \cdot \frac{y_{i}^{(b)} - μ (y_{i})}{σ (y_{i})} + β,

(5)

where

y_{i}^{(b)}

represents the value of the output

y_{i}

on the b-th input of a batch, while

β

and

γ

are the learned parameters regulating the output’s mean and variance.

3.5. Activation Functions

Activation functions provide nonlinearity to the model. Thus, it is possible to improve convergence in the training process of any model [21]. However, each function has a unique quality when employed in a structure or a model. A commonly utilized activation function in UNET is the Rectified Linear Unit (ReLU), which is given by

R e L U (x) = max (0, x) .

(6)

This activation function helps to avoid the vanishing gradient problem during the weight update process. However, when negative values exist, some neurons may die. With these deaths, it is no longer possible to modify their weights.

Consequently, alternative activation functions were adopted, yielding minimal outputs for negative inputs. Therefore, this value enables the weight to be updated. The first is the Exponential Linear Unit (ELU), which is defined as follows,

ELU (x) = \{\begin{matrix} x & if x > 0, \\ α (e^{x} - 1) & otherwise, \end{matrix}

(7)

where

α

is a positive parameter, typically

α

= 1. ELU effectively handles negative values and maintains linearity for positive values, contributing to faster convergence than ReLU. The minimal activation on the negative side helps prevent neuron death.

On the other hand, the Scaled Exponential Linear UNIT (SELU) [22] function is suitable for deep networks and provides auto-normalization. SELU is formally defined by

S E L U (x) = λ \{\begin{matrix} x & if x > 0, \\ α (e^{x} - 1) & otherwise, \end{matrix}

(8)

where

α = 1.67326

and

λ = 1.0507

. The constant

λ

serves to adjust the output activation smoothly. This adjustment maintains a consistent mean and variance, helping in this way to self-normalize. Conversely,

α

sets the gradient of the exponential component for negative inputs, ensuring that the mean remains near zero. It is noteworthy that this activation function helps to batch normalize.

Cyclical Learning Rate

The cyclical learning rate (CLR) [23] is a method that establishes a dynamic learning rate, fluctuating between lower and higher values at regular intervals during the training process, as illustrated in Figure 3.

The term “step size” refers to the number of iterations in each step. A cycle consists of two steps, and one step is the number of epochs used. This method is designed to help the optimizer efficiently reach a minimum point. However, the specific choice of learning rate can significantly impact the convergence speed. Although the primary objective of the optimizer is to determine the values that minimize the error function, there are instances where the optimizer may become trapped in a local minimum, failing to reach the global minimum. Therefore, variable learning rates are utilized to prevent stagnation and the potential of settling for a non-global solution. This approach allows the algorithm to move away from a local minimum and resume training eventually. The equation ruling the weight update process is as follows:

θ_{t + 1} = θ_{t} - α_{t} \nabla f (θ_{t}),

(9)

where

θ_{t + 1}

is the updated weights from

θ_{t}

, ∇ is the function’s gradient, and

α_{t}

is the learning rate that will have a variable behavior of the cyclical form.

3.6. UNET Architecture

The UNET [24] is the baseline model for image segmentation, which is commonly referred to as vanilla UNET. It consists of two main components: an encoder and a decoder. On the one hand, the first reduces dimensionality and extracts feature maps at different scales until a latent feature space is reached. On the other hand, in the upstream path, the algorithm simultaneously increases the feature map dimension by upsampling convolutions to produce a final segmentation image. The model also includes connections between the network’s encoder and decoder components, which are known as skip connections. These connections allow for preserving spatial information from the encoder and entering it into the decoder at different scales, avoiding the vanishing gradient problem.

The vanilla UNET consists of four downsampling and their respective four upsampling blocks. Each step has two

3 \times 3

convolutional layers followed by ReLU activations. The downsampling path applies a

2 \times 2

max-pooling layer, and the upsampling path is a

2 \times 2

up-convolution. Each block has 64, 128, 256, and 512 kernels until it reaches a 1024-channel latent space.

4. Materials and Method

4.1. BraTS Datasets

The BraTS datasets include task 1 for segmentation and task 2 for classification. Both tasks involve multi-parametric Magnetic Resonance Imaging (mpMRI) scans stored in Neuroimaging Informatics Technology Initiative (NIfTI) files with the extension .nii.gz. The datasets were obtained from the Brain Tumor Segmentation Challenge (2017, 2020, and 2021) and contain segmentation labels that indicate the presence of a tumor. The labels are as follows: 1 for the Necrotic part (NCR), 2 for peritumoral edematous/invaded tissue (ED), 4 for enhancing tumor (ET), and 0 indicating the absence of a tumor. The main characteristics of each dataset are summarized in Table 2. In neurological imaging, T2 sequences provide a nuanced perspective on structural abnormalities [25]. The comparative images of the T1 and T2 sequences, alongside the reference of the mask indicating the tumor location, are displayed. The tumor is more prominently visible in the T2 sequence, as depicted in Figure 4.

Notably, these datasets comprise four distinct MRI types: native (T1), T2-weighted (T2), post-contrast T1-weighted (T1ce), and fluid-attenuated inversion recovery (FLAIR), with each type containing 155 slices per volume. Closer, the complete tumor is encompassed when focusing solely on regions 1, 2, and 4.

4.2. Overall Framework

Figure 5 shows the functional blocks used to implement the proposed architecture. The BraTS databases provided the diverse modalities of images required to test the proposed algorithms. Such images are preprocessed for signal and image better conditioning to continue with data normalizing. Next, 70% of the data are randomly chosen for training, and the remainder is used for validation purposes. Stochastic Gradient Descent (SGD) was used as the optimizer to minimize the loss function with a cyclical learning rate to help avoid local minima. Finally, the lightweight UNET focused on tumor detection is tested and evaluated. A highly sensitive strategy to select the dynamic learning rate provided a reliable optimization process during network convergence.

4.3. Data Prepossessing

In contrast to some works in the literature, where the whole 3D sequence was analyzed or where a mixture of image modalities was employed, in this work, the best slice of the T2 voxel image was selected with the help of the ground truth image. This ground truth image contains 155 slices, as shown in Figure 6, which captures the different 3D regions of the brain tumor. This way, the slice with the highest tumor content was selected to minimize the computational cost, obtaining 140, 240, and 277 images for each dataset.

Afterward, each resulting reduced dataset was normalized to have a mean of zero and a standard deviation of one by the Z-normalization defined as

Z = \frac{X - μ}{σ},

(10)

where

μ

and

σ

are the mean and standard deviation of the data X, respectively.

After normalizing the images, each was resized to a standard dimension of 240 × 240 pixels. The dataset was divided into 70% for training and 30% for validation. Within the training set, 80% was used for training the model and 20% was used for testing.

4.4. Proposed Model

The proposed model is illustrated in Figure 7. It is based on the UNET architecture but with some improvements. The initial filter count is reduced, and an adaptive learning rate and activation function are incorporated into each block. The model has four levels, each with two convolution blocks of 3 × 3 pixel filters. The first level’s convolution block has 16 filters, which increase to 32, 64, 128, and 256 in the subsequent levels. Additionally, each level includes two batch normalization blocks (one after each convolution) and a 2 × 2 pixel max pooling, which reduces the dimension by half in each level. Lastly, a dropout layer is added between each convolution and batch normalization pair.

Table 3 compares the parameters utilized in the model, demonstrating that the modified UNET decreases the number of trainable parameters.

The improvements to the internal structure of the base model were based on various concepts. The primary goal was to prevent neuron death; this problem was resolved by utilizing SeLU as the activation function, which maintains a small value in the negative part of the function, and employing the HeNormal initialization of the network weights. The model is available in https://github.com/fdhernandezgutierrez/LUNET_Brain-Tumor (accessed on 27 September 2024).

4.5. Implementation Details

The proposed model was implemented using Python (ver. 3.9.18) and the TensorFlow deep learning framework (ver. 2.15.0). The NiBabel package (ver. 5.0.0 ) was used to load the medical images. The experiment was conducted on Google Collaboratory, which is a cloud-based platform that provides free access to GPUs specifically designed for deep learning tasks. The hardware configuration used in the experiment included a six-core, 12-thread Intel Xeon processor running at 2.2 GHz with 84 GB of RAM and an NVIDIA Model A100 graphics processor with 40 GB of RAM.

The main goal of the training process is to adjust the model weights derived from the error function’s minimum value. However, achieving this objective can cause significant delays in time or even prevent the process from converging. This effect is mainly due to the gap between successive steps, which can cause divergence in any optimization technique.

To address training convergence, the study employed an adaptive learning rate to avoid local minima. This strategy was implemented by using the cyclical learning rate algorithm (CLR) [29]. The learning rate was adjusted between 0.0001 and 0.8 using a triangular pattern with a scaling function of

0 . 99^{(x - 1)}

. The SGD optimizer is recommended for use with the adaptive learning rate as it performs better in such scenarios. This approach aimed to prevent the model from becoming trapped at a local minimum and instead reach the global minimum. The training process spanned 500 epochs and utilized the SGD optimizer. The step is calculated by dividing the training images by the batch size of 64. Binary Cross-Entropy was used as the loss function to measure the difference between the predicted images and binary labels. The formula used in the triangular cyclic function (i.e., sawtooth pattern) with a slight decay is given by

C L R = 0 . 99^{(x - 1)},

(11)

where x denotes the current iteration or epoch within the cyclic pattern.

5. Numerical Results

After careful consideration, the BraTS 2021 and 2020 databases were mainly studied and discussed. Despite their recent introduction, these datasets have yet to be extensively referenced in the existing literature. In addition, to facilitate comparisons, this study also incorporates the BraTS 2017 database, which is a widely used medical imaging resource for brain tumor segmentation and classification efforts. However, including the BraTS 2017 dataset posed an initial challenge due to its volumetric nature. As a result, preprocessing steps were required to address the observed sharp gray-level variations in the MRI images.

5.1. Evaluations Metrics

The performance of the UNET model is evaluated using several metrics, including the Dice Similarity Coefficient (DSC), Intersection over Union (IoU), Sensitivity, Specificity, and Hausdorff distance (HD). In this respect, the DSC measures the overlap between the predicted segmentation and the ground truth. In contrast, the IoU measures the intersection ratio to the union between the predicted segmentation and the ground truth. Sensitivity and Specificity measure the ability of the model to identify positive and negative regions, respectively. The Hausdorff distance measures the maximum distance between two point sets, quantifying the degree of mismatch between them.

5.1.1. Dice Similarity Coefficient

The Dice Similarity Coefficient (DSC) is a metric used to measure similarity or overlap between two datasets, particularly in binary segmentation tasks [30]. This coefficient ranges from 0 to 1, with 1 indicating perfect overlap. DSC is formally described by

Mean Dice = 2 \sum \frac{A \cap B}{A \cup B'}

(12)

where A represents the segmented image, and B represents the ground truth.

5.1.2. Intersection over Union

Intersection over Union (IoU) is a technique used to evaluate segmentation performance, measuring the similarity between two samples using the Jaccard Index [31]. It is also invariant to the scale, as tested by Rezatofighi et al. [32], and is calculated as follows

Mean IoU = \sum \frac{A \cap B}{A \cup B} .

(13)

Here, A corresponds to the ground-truth image, while B corresponds to the predicted segmentation.

A \cap B

represents the overlap between two images.

5.1.3. Sensitivity

Sensitivity is a metric that evaluates the ability of a model or algorithm to identify true positives accurately. In image segmentation, Sensitivity measures the proportion of positive pixels correctly identified as positive [33], which is computed by

Sensitivity = \sum \frac{A \cap B}{B},

(14)

5.1.4. Specificity

Specificity, also known as the true negative rate, evaluates the ability to identify true negative samples correctly [33]. This binary classification measure is defined by

Specificity = \sum \frac{A \cap B^{'}}{B^{'}},

(15)

where

B^{'}

denotes the pixels that are not part of the target class (negative pixels)).

5.1.5. Hausdorff Distance

The Hausdorff distance (HD) identifies the maximum distance between two sets of surface points, effectively highlighting the outliers or discrepancies. This metric is particularly useful for assessing the worst-case differences between the predicted and ground truth segmentations, which is measured as follows

H (A, B) = max \{max_{a \in A} min_{b \in B} d (a, b), max_{b \in B} min_{a \in A} d (b, a)\},

(16)

where A and B denote the mask image and the predicted image, respectively, and

d (a, b)

denotes the Euclidean distance between points a and b in the images.

5.2. Performance Estimation of All Trained Models

Each model faced five repetitions to be statistically reliable. First, the model was designed with a fixed learning value of

0.0001

. As an optimizer, the Adam method was used as the base UNET model, and the initialization of the network weights was developed by the HeNormal distribution. As in all tests, 500 epochs were typically used. However, after exhaustive experimentation, it was detected that a cyclic learning rate provided reduced variance. According to the statistical results from binary segmentation metrics, this finding resulted in a reliable model, as shown in Figure 8, Figure 9 and Figure 10.

For instance, Figure 8 shows that the cyclical learning rate used in the BraTS2017 dataset showed a lower dispersion when measuring the selected binary segmentation metrics.

Notice that the segmentation results for the BraTS 2017 database illustrate the metrics spread across five repetitions.

Similarly, Figure 9 shows a lower dispersion and better metrics results for the BraTS 2020 database when using the cyclical learning rate strategy. In addition, Figure 9a shows that the results present a more significant dispersion than those in Figure 9b.

Consequently, the Dice score and IoU indicate a higher confidence level in the repeatability of the experiment, as it is biased toward values higher than the mean. These outcomes suggest that the distribution of the results from repeated experiments demonstrates more excellent performance stability, as shown by the violin plot while using CLR.

Finally, the dispersion observed for the BraTS 2021 dataset was notably reduced compared to previous databases even with the implementation of cyclical learning rates, as shown in Figure 10. The variability observed between the cyclical learning rate (CLR) and fixed learning rate models can be attributed to several factors. The CLR strategy, by periodically adjusting the learning rate, enables the model to escape local minima and explore different regions of the loss landscape. For instance, Figure 10b shows that the modified UNET model yielded average values of 0.6070 ± 0.011, 0.721 ± 0.007, 0.872 ± 0.003, 0.973 ± 0.0003, for IoU, DSC, Sensitivity, and Specificity, respectively.

Table 4 summarizes the averaged binary classification metrics derived from the different BraTS databases featured in this study for detecting and segmenting brain tumors. The table compares the segmentation performance on three BraTS datasets. BraTS 2021 has the best metrics (IoU, DSC, HD, Sensitivity), indicating that the model performs best on this dataset. Conversely, BraTS 2020 has the lowest scores, suggesting that the model struggles more with this dataset. Specificity remains consistently high across all datasets, indicating strong performance in identifying non-tumor regions.

5.3. Segmentation Results

This section presents the segmentation results performed using the base and modified model along with the ground truth.

Figure 11 shows the segmented images for the BraTS 2017 dataset. The first column displays the original unprocessed images. The second column shows the images obtained while applying the UNET base model. Next, the third column shows the images obtained by applying the modified lightweight model, and finally, the last column presents the ground truth.

For instance, Figure 11b shows that the UNET base model yielded 0.531, 0.694, 0.999, 0.956, and 50.219 for IoU, DSC, Sensitivity, Specificity, and Hausdorff distance, respectively. In contrast, Figure 11c shows that the proposed model yielded: 0.835, 0.910, 0.883, 0.997, and 10.0 for IoU, DSC, Sensitivity, Specificity, and Hausdorff distance, respectively. These results proved an improvement in the recognition rate of the proposed model with respect to the UNET base model for the BraTS2017 database.

Figure 12 shows images from the BraTS 2020 datasets. The column ratio is consistent with the results in Figure 11. Figure 12j shows the UNET base model results, yielding 0.888, 0.941, 0.998, 0.994, and 5.099 for IoU, DSC, Sensitivity, Specificity, and Hausdorff distance, respectively. Conversely, Figure 12k shows the corresponding values for the modified or lightweight model, yielding 0.954, 0.977, 0.975, 0.999, and 4.0 for IoU, DSC, Sensitivity, Specificity, and Hausdorff distance, respectively.

Lastly, Figure 13 shows images and results from the BraTS 2021 datasets; likewise, the modified lightweight model in non-uniform regions behaves better than the UNET base model. Visually, there is a noticeable difference between the base and modified UNET models. Figure 13g achieved 0.933, 0.965, 0.949, 0.998, and 3.162 for IoU, DSC, Sensitivity, Specificity, and Hausdorff distance, respectively. In contrast, Figure 13f achieved improved results of 0.895, 0.944, 0.926, 0.996, and 8.062 for IoU, DSC, Sensitivity, Specificity, and Hausdorff distance, respectively.

In a significantly reduced quantity, there were also unsatisfactory results in this research. Figure 14 shows some instances where the proposed model fails to segment the entire content of the unhealthy tissue as well as incorrectly segmenting areas where the tumor was not present. Figure 14i presents the original image for the mask given in Figure 14l, where the base model yielded the reference values of 0.785, 0.879, 0.962, 0.988, and 67.80 for IoU, DSC, Sensitivity, Specificity, and Hausdorff distance, respectively. On the other hand, Figure 14k shows the modified or proposed model, yielding 0.582, 0.736, 0.588, 0.999, and 13, for IoU, DSC, Sensitivity, Specificity, and Hausdorff distance, respectively. It is essential to notice that the Hausdorff distance was higher for Figure 14j compared to Figure 14k, whereas the IoU metric showed the opposite behavior. This discrepancy occurs because the IoU measures the overlap between the mask and predicted images, while the Hausdorff distance measures the distance between their corresponding contours.

The ablation analysis of the model is summarized in Table 5 to compare the numerical results for the state-of-the-art models based on the UNET trained with the BraTS 2021 database. This comprehensive ablation study details the impact of various components and configurations on the network performance. This research aims to show how using a cyclical learning rate and batch normalization enhances the model’s performance.

The lightweight proposed model includes two convolutional layers and a dropout layer at each level. This network is trained with the Adam optimizer and uses the HeNormal weight initializer. Under these circumstances, the model starts with 64 filters in the initial layer; this quantity is duplicated in each subsequent layer until it reaches 1024 filters in the latent space. Subsequently, a batch normalization layer, Base (BN), was added to each convolutional layer, reducing the values of most metrics. Then, a cyclic learning rate was added to the modified model to avoid being trapped in local optima using the Adam optimizer, Base (BN & CLR), but it gave the worst results. Fortunately, the results were substantially improved by switching to an SGD optimizer while maintaining all previous tuning characteristics (BN & CLR & SGD).

Finally, the proposed model was modified by changing the activation function and adjusting the number of initial filters of Base (BN & CLR & SGD & SeLU). Despite having potentially slower convergence, SGD is often preferred over Adam for its better generalization performance, simplicity, and lower memory usage. Although faster and easier to use, ADAM can sometimes lead to overfitting and suboptimal generalization [34]. Meanwhile, SeLU is preferred for deep neural networks because of its self-normalizing properties, which improve the training stability and speed. ReLU, while effective, can lead to issues like dead neurons and requires additional normalization techniques [35].

Table 5 shows that concerning the DSC metric, the best results correspond to the model proposed by Mokhtar et al. [36]. The second-best results are shared between the lightweight proposed model and the method proposed by Akbar et al. [37]. However, regarding the IoU metric, the best result corresponds to the lightweight proposed model. In addition, Table 5 highlights the Base model (BN & CLR & SGD & SeLU) for its excellent balance between performance and efficiency. With a DSC of 0.860 and only 5.3 GFlops, this model achieves competitive results compared to more computationally expensive models such as UNET++ (97.7 GFlops), making it ideal for applications with limited resources.

Table 5. Different metrics comparison for models trained with the BraTS 2021 dataset.

Reference	IoU	DSC	HD	Sensitivity	Specificity	Params ( $\times 10^{6}$ )	GFlops
Sun et al. [38]	-	0.819	2.662	0.811	-	$47.6$	73.2
Xu et al. [39]	0.703	0.788	-	-	-	$2.6$	6.08
Raza et al. [40]	-	0.8601	-	-	-	-	-
Akbar et al. [37]	-	0.8933	15.83	0.9278	-	-	-
Mokhtar et al. [36]	-	0.903	9.9	0.96	0.99	-	-
UNET++	0.78755	0.86705	18.2394	0.86363	0.99629	$31.389$	97.7
Base	0.72527	0.80977	36.37064	0.77456	0.99738	$1.940$	84.5
Base (BN)	0.70029	0.78856	13.5851	0.80518	0.98998	$1.946$	5.3
Base (BN & CLR)	0.00691	0.01353	18.0803	0.14846	0.81738	$1.946$	5.3
Base (BN & CLR & SGD)	0.57465	0.66023	12.2119	0.64639	0.99444	$1.946$	5.3
Base (BN & CLR & SGD & SeLU)	0.78070	0.860	12.0603	0.856	0.9964	$1.946$	5.3

5.4. Discussion

This study presented one strategy to improve the UNET architecture focused on brain tumor segmentation by integrating cyclical learning rates (CLRs) and selecting the best slices from 3D MRI, resulting in improved performance. The modified or lightweight model achieved a Dice coefficient of 0.78, surpassing the baseline UNET’s 0.72, demonstrating CLR’s effectiveness in accelerating convergence and enhancing accuracy and other binary classification metrics. CLR facilitated the escape of local minima and prevented overfitting, leading to a better generalization of segmenting brain tumors.

Despite promising results, limitations include the dataset’s relatively small size and highly demanding computational resources. However, our findings highlight the potential of CLR and smart 2D slice selectors in enhancing deep learning models for medical image analysis with significant implications for clinical practice.

6. Conclusions

This work proposes a lightweight tumor segmentation framework built upon a UNET model by optimizing the convolutional blocks. Three main modifications were implemented: a SeLU activation function, Batch Normalization, and Dropout. One significant contribution of the proposed strategy is automatically selecting the most informative slice of the T2 image modality and using a preprocessing step to normalize the intensities across the input images. Moreover, by incorporating a CLR strategy, the training process boosts its performance, avoiding overfitting and allowing the model to achieve remarkable results in a few epochs. Notably, for the BraTS2021 dataset, the proposed model and training strategy obtained 0.7807 for IoU, 0.860 for DSC, 0.656 for Sensitivity, and 0.9964 for Specificity, surpassing the vanilla UNET in the first three metrics by 6%. Furthermore, the presented study provides a valuable comparison concerning the vanilla UNET and the proposed approach for the BraTS 2017, 2020, and 2021 datasets. The suggested approach achieves accurate tumor segmentation while maintaining robustness in handling the complexity of MRI data, showcasing its potential to advance the field of brain tumor segmentation significantly. One potential avenue for future work is developing a multimodal UNET that employs additional optimal slices to capture more information while maintaining the model’s low computational cost. It will also include attention modules to improve feature extraction.

Author Contributions

Conceptualization, M.A.I.-M.; Data curation, O.A.-C. and J.R.-P.; Funding acquisition, J.G.A.-C.; Investigation, F.D.H.-G., D.F.Z.-G. and J.G.A.-C.; Methodology, E.G.A.-B., O.A.-C. and E.O.-M.; Resources, M.A.I.-M.; Software, F.D.H.-G., D.F.Z.-G. and J.R.-P.; Supervision, J.G.A.-C.; Validation, E.G.A.-B., D.F.Z.-G. and M.A.I.-M.; Visualization, O.A.-C. and E.O.-M.; Writing—original draft, F.D.H.-G. and E.G.A.-B.; Writing—review and editing, J.R.-P., E.O.-M. and J.G.A.-C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partly supported by the University of Guanajuato CIIC (Convocatoria Institucional de Investigación Científica, UG) project 243/2024 and Grant NUA 143745. Partially by the Mexican Council of Science and Technology CONAHCyT Grant 838509/1080385.

Institutional Review Board Statement

Ethical review and approval are waived for this study.

Informed Consent Statement

No formal written consent was required for this study.

Data Availability Statement

Data are available under a formal demand.

Acknowledgments

The authors thank the University of Guanajuato for the facilities and support given to develop this project.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of this study, in the collection, analyses, data interpretation, manuscript writing, or decision to publish the results.

References

Hu, A.; Razmjooy, N. Brain tumor diagnosis based on metaheuristics and deep learning. Int. J. Imaging Syst. Technol. 2021, 31, 657–669. [Google Scholar] [CrossRef]
Sharif, M.; Amin, J.; Raza, M.; Anjum, M.A.; Afzal, H.; Shad, S.A. Brain tumor detection based on extreme learning. Neural Comput. Appl. 2020, 32, 15975–15987. [Google Scholar] [CrossRef]
International Agency for Research on Cancer. IARC Global Cancer Observatory. Available online: https://gco.iarc.who.int/today/en/dataviz/maps-heatmap?cancers=31&types=0&sexes=1&palette=Blues&mode=population (accessed on 29 April 2024).
Badža, M.M.; Barjaktarović, M.Č. Classification of Brain Tumors from MRI Images Using a Convolutional Neural Network. Appl. Sci. 2020, 10, 1999. [Google Scholar] [CrossRef]
Bhandari, A.; Koppen, J.; Agzarian, M. Convolutional neural networks for brain tumour segmentation. Insights Imaging 2020, 11, 77. [Google Scholar] [CrossRef] [PubMed]
Walsh, J.; Othmani, A.; Jain, M.; Dev, S. Using U-Net network for efficient brain tumor segmentation in MRI images. Healthc. Anal. 2022, 2, 100098. [Google Scholar] [CrossRef]
Shoaib, M.A.; Lai, K.W.; Chuah, J.H.; Hum, Y.C.; Ali, R.; Dhanalakshmi, S.; Wang, H.; Wu, X. Comparative studies of deep learning segmentation models for left ventricle segmentation. Front. Public Health 2022, 10, 981019. [Google Scholar] [CrossRef]
Razzak, M.I.; Imran, M.; Xu, G. Efficient Brain Tumor Segmentation With Multiscale Two-Pathway-Group Conventional Neural Networks. IEEE J. Biomed. Health Inform. 2019, 23, 1911–1919. [Google Scholar] [CrossRef]
Hao, K.; Lin, S.; Qiao, J.; Tu, Y. A Generalized Pooling for Brain Tumor Segmentation. IEEE Access 2021, 9, 159283–159290. [Google Scholar] [CrossRef]
Aghalari, M.; Aghagolzadeh, A.; Ezoji, M. Brain tumor image segmentation via asymmetric/symmetric UNet based on two-pathway-residual blocks. Biomed. Signal Process. Control. 2021, 69, 102841. [Google Scholar] [CrossRef]
Ottom, M.A.; Rahman, H.A.; Dinov, I.D. Znet: Deep Learning Approach for 2D MRI Brain Tumor Segmentation. IEEE J. Transl. Eng. Health Med. 2022, 10, 1800508. [Google Scholar] [CrossRef] [PubMed]
Ahmad, P.; Jin, H.; Alroobaea, R.; Qamar, S.; Zheng, R.; Alnajjar, F.; Aboudi, F. MH UNet: A Multi-Scale Hierarchical Based Architecture for Medical Image Segmentation. IEEE Access 2021, 9, 148384–148408. [Google Scholar] [CrossRef]
Latif, U.; Shahid, A.R.; Raza, B.; Ziauddin, S.; Khan, M.A. An end-to-end brain tumor segmentation system using multi-inception-UNET. Int. J. Imaging Syst. Technol. 2021, 31, 1803–1816. [Google Scholar] [CrossRef]
Montaha, S.; Azam, S.; Rakibul Haque Rafid, A.K.M.; Hasan, M.Z.; Karim, A. Brain Tumor Segmentation from 3D MRI Scans Using U-Net. SN Comput. Sci. 2023, 4, 386. [Google Scholar] [CrossRef]
Ranjbarzadeh, R.; Zarbakhsh, P.; Caputo, A.; Tirkolaee, E.B.; Bendechache, M. Brain tumor segmentation based on optimized convolutional neural network and improved chimp optimization algorithm. Comput. Biol. Med. 2024, 168, 107723. [Google Scholar] [CrossRef]
Ghazouani, F.; Vera, P.; Ruan, S. Efficient brain tumor segmentation using Swin transformer and enhanced local self-attention. Int. J. Comput. Assist. Radiol. Surg. 2024, 19, 273–281. [Google Scholar] [CrossRef] [PubMed]
Yue, G.; Zhuo, G.; Zhou, T.; Liu, W.; Wang, T.; Jiang, Q. Adaptive Cross-Feature Fusion Network with Inconsistency Guidance for Multi-Modal Brain Tumor Segmentation. IEEE J. Biomed. Health Inform. 2023, 1–11. [Google Scholar] [CrossRef]
Baldi, P.; Sadowski, P. The dropout learning algorithm. Artif. Intell. 2014, 210, 78–122. [Google Scholar] [CrossRef] [PubMed]
Santurkar, S.; Tsipras, D.; Ilyas, A.; Madry, A. How Does Batch Normalization Help Optimization? arXiv 2019, arXiv:1805.11604. [Google Scholar]
Bjorck, J.; Gomes, C.P.; Selman, B. Understanding Batch Normalization. arXiv 2018, arXiv:1806.02375. [Google Scholar]
Dubey, S.R.; Singh, S.K.; Chaudhuri, B.B. Activation functions in deep learning: A comprehensive survey and benchmark. Neurocomputing 2022, 503, 92–108. [Google Scholar] [CrossRef]
Klambauer, G.; Unterthiner, T.; Mayr, A.; Hochreiter, S. Self-Normalizing Neural Networks. arXiv 2017, arXiv:1706.02515. [Google Scholar]
Oymak, S. Provable Super-Convergence With a Large Cyclical Learning Rate. IEEE Signal Process. Lett. 2021, 28, 1645–1649. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015; Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F., Eds.; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar] [CrossRef]
Amin, J.; Sharif, M.; Raza, M.; Saba, T.; Anjum, M.A. Brain tumor detection using statistical and machine learning method. Comput. Methods Programs Biomed. 2019, 177, 69–79. [Google Scholar] [CrossRef] [PubMed]
BraTS 2017: Multimodal Brain Tumor Segmentation Challenge. Available online: https://www.med.upenn.edu/sbia/brats2017/data.html (accessed on 2 January 2024).
BraTS 2020: Multimodal Brain Tumor Segmentation Challenge. Available online: https://www.med.upenn.edu/cbica/brats2020/ (accessed on 2 January 2024).
BraTS 2021: Multimodal Brain Tumor Segmentation Challenge. Available online: https://www.med.upenn.edu/cbica/brats2021/ (accessed on 2 January 2024).
Smith, L.N. Cyclical learning rates for training neural networks. In Proceedings of the 2017 IEEE Winter Conference on Applications of Computer Vision, WACV 2017, Santa Rosa, CA, USA, 24–31 March 2017; pp. 464–472. [Google Scholar] [CrossRef]
Zou, K.H.; Warfield, S.K.; Bharatha, A.; Tempany, C.M.; Kaus, M.R.; Haker, S.J.; Wells, W.M.; Jolesz, F.A.; Kikinis, R. Statistical validation of image segmentation quality based on a spatial overlap index1: Scientific reports. Acad. Radiol. 2004, 11, 178–189. [Google Scholar] [CrossRef] [PubMed]
Zanddizari, H.; Nguyen, N.; Zeinali, B.; Chang, J.M. A new preprocessing approach to improve the performance of CNN-based skin lesion classification. Med. Biol. Eng. Comput. 2021, 59, 1123–1131. [Google Scholar] [CrossRef] [PubMed]
Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized Intersection over Union: A Metric and a Loss for Bounding Box Regression. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 658–666. [Google Scholar] [CrossRef]
Müller, D.; Soto-Rey, I.; Kramer, F. Towards a guideline for evaluation metrics in medical image segmentation. BMC Res. Notes 2022, 15, 210. [Google Scholar] [CrossRef] [PubMed]
Zhang, Z. Improved Adam Optimizer for Deep Neural Networks. In Proceedings of the 2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS), Banff, AB, Canada, 4–6 June 2018; pp. 1–2. [Google Scholar] [CrossRef]
Chen, Z.; Zhao, W.; Deng, L.; Ding, Y.; Wen, Q.; Li, G.; Xie, Y. Large-scale self-normalizing neural networks. J. Autom. Intell. 2024, 3, 101–110. [Google Scholar] [CrossRef]
Mokhtar, M.; Abdel-Galil, H.; Khoriba, G. Brain Tumor Semantic Segmentation using Residual U-Net++ Encoder-Decoder Architecture. Int. J. Adv. Comput. Sci. Appl. 2023, 14, 1110–1117. [Google Scholar] [CrossRef]
Akbar, A.S.; Fatichah, C.; Suciati, N. Single level UNet3D with multipath residual attention block for brain tumor segmentation. J. King Saud Univ. Comput. Inf. Sci. 2022, 34, 3247–3258. [Google Scholar] [CrossRef]
Sun, J.; Hu, M.; Wu, X.; Tang, C.; Lahza, H.; Wang, S.; Zhang, Y. MVSI-Net: Multi-view attention and multi-scale feature interaction for brain tumor segmentation. Biomed. Signal Process. Control. 2024, 95, 106484. [Google Scholar] [CrossRef]
Xu, Q.; Ma, Z.; HE, N.; Duan, W. DCSAU-Net: A deeper and more compact split-attention U-Net for medical image segmentation. Comput. Biol. Med. 2023, 154, 106626. [Google Scholar] [CrossRef] [PubMed]
Raza, R.; Ijaz Bajwa, U.; Mehmood, Y.; Waqas Anwar, M.; Hassan Jamal, M. dResU-Net: 3D deep residual U-Net based brain tumor segmentation from multimodal MRI. Biomed. Signal Process. Control. 2023, 79, 103861. [Google Scholar] [CrossRef]

Figure 1. The three brain perspective planes are used in medical imaging and 3D-brain reconstruction.

Figure 2. Visualization of a single channel input convolution layer. (a) 16 kernels, (b) 16 feature maps. Different gray tones represent the kernel weights used to obtain the feature maps.

Figure 3. Graphical representation of a cyclic learning rate: showing fluctuations in the learning rate across training iterations, demonstrating characteristic cycles of increases and decreases.

Figure 4. The influence of T1 and T2 contrasts on the image masks: unveiling their different effects and implications for image processing in the BraTS 2021 dataset. (a) T1 sequence, (b) T2 sequence, and (c) Mask.

Figure 5. General description of the proposed method. The dotted box represents the same frame T2 original and segmented image are introduced to independent processes. The dashed line represent a sub-process in the Data Processing block.

Figure 6. Displaying multiple slices within a single voxel of the mask image, highlighting the slice with the highest content of the mask image content.

Figure 7. UNET network architecture modified from the original.

Figure 8. The violin plot illustrates the model’s behavior across different repetitions. By employing a cyclical learning rate, the variance in the distribution is reduced, resulting in a model with a symmetrical distribution that closely aligns with the actual metrics. Consequently, this produces a more accurate model. (a) Fixed learning rate. (b) Cyclical learning rate for the BraTS 2017 dataset.

Figure 9. The violin plot data produced across five repetitions of the BraTS 2020 dataset using the modified UNET model indicates an enhancement. The median aligns closely with the actual value. (a) Fixed learning rate. (b) Cyclical learning rate.

Figure 10. For the case of the BraTS 2021 dataset, employing the modified model for the violin distribution leads to a decrease in variance across five replicates, achieving consistent results. (a) Fixed learning rate. (b) Cyclical learning rate.

Figure 11. Brain tumor segmentation results using the BraTS 2017 dataset. (a,e,i) Original images; (b,f,j) UNET base; (c,g,k) UNET modified; and (d,h,l) Target images.

Figure 12. Visualization of brain tumor segmentation employing the BraTS 2020 dataset. (a,e,i) Original images; (b,f,j) UNET base; (c,g,k) UNET modified; and (d,h,l) Target images.

Figure 13. Visualization of brain tumor segmentation employing the BraTS 2021 dataset. (a,e,i) Original images; (b,f,j) UNET base; (c,g,k) UNET modified; and (d,h,l) Target images.

Figure 14. Special cases where the model behaved poorly with the BraTS 2021 dataset. (a,e,i) Original images; (b,f,j) UNET base; (c,g,k) UNET modified; and (d,h,l) Target images.

Table 1. Datasets and methods used in the literature for brain tumor detection.

Article	Dataset	Method	Year
Razzak et al. [8]	BraTS 2013 and 2015	Two-Pathway Group CNN	2019
Hao et al. [9]	BraTS 2018 and 2019	Generalized Pooling (FCN, UNET, UNET++)	2021
Walsh et al. [6]	BITE	Lightweight UNET	2022
Ottom et al. [11]	The Cancer Genome Atlas Low-Grade	Z-Net	2022
Aghalari et al. [10]	BraTS 2013 and 2018	Asymmetric/Symmetric UNET based on two-pathway residual blocks	2021
Ahmad et al. [12]	BraTS 2018, 2019 and 2020	MH UNET	2021
Latif et al. [13]	BraTS 2015, 2017 and 2019	MI-UNET	2021
Montaha et al. [14]	BraTS 2020	UNET	2023
Ranjbarzadeh et al. [15]	BraTS 2018	CNN + IChOA	2024
Ghazouani et al. [16]	BraTS 2021	Transformers + CNN	2024
Yue et al. [17]	BraTS 2020	Multi-stream UNET	2024

Table 2. A comparative analysis detailing each database’s structural framework alongside the image dimensions.

Database	Image Size	Training Images	Tested Images
BraTS 2017 [26]	$240 \times 240 \times 155$	140	60
BraTS 2020 [27]	$240 \times 240 \times 155$	240	105
BraTS 2021 [28]	$240 \times 240 \times 155$	277	120

Table 3. Comparison Table of hyper-parameters between the base and the proposed UNET improved model.

Network	Total Params	Trainable Params
Original UNET	7,771,681	7,765,601
Our lightweight UNET	1,946,897	1,943,857

Table 4. Tabulation of averaged metrics derived from three distinct databases.

Database	IoU	DSC	HD	Sensitivity	Specificity
BraTS 2017	0.6915	0.7909	24.2216	0.806	0.9924
BraTS 2020	0.6209	0.7139	17.7724	0.6529	0.9983
BraTS 2021	0.7807	0.860	12.0603	0.856	0.9964

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hernandez-Gutierrez, F.D.; Avina-Bravo, E.G.; Zambrano-Gutierrez, D.F.; Almanza-Conejo, O.; Ibarra-Manzano, M.A.; Ruiz-Pinales, J.; Ovalle-Magallanes, E.; Avina-Cervantes, J.G. Brain Tumor Segmentation from Optimal MRI Slices Using a Lightweight U-Net. Technologies 2024, 12, 183. https://doi.org/10.3390/technologies12100183

AMA Style

Hernandez-Gutierrez FD, Avina-Bravo EG, Zambrano-Gutierrez DF, Almanza-Conejo O, Ibarra-Manzano MA, Ruiz-Pinales J, Ovalle-Magallanes E, Avina-Cervantes JG. Brain Tumor Segmentation from Optimal MRI Slices Using a Lightweight U-Net. Technologies. 2024; 12(10):183. https://doi.org/10.3390/technologies12100183

Chicago/Turabian Style

Hernandez-Gutierrez, Fernando Daniel, Eli Gabriel Avina-Bravo, Daniel F. Zambrano-Gutierrez, Oscar Almanza-Conejo, Mario Alberto Ibarra-Manzano, Jose Ruiz-Pinales, Emmanuel Ovalle-Magallanes, and Juan Gabriel Avina-Cervantes. 2024. "Brain Tumor Segmentation from Optimal MRI Slices Using a Lightweight U-Net" Technologies 12, no. 10: 183. https://doi.org/10.3390/technologies12100183

APA Style

Hernandez-Gutierrez, F. D., Avina-Bravo, E. G., Zambrano-Gutierrez, D. F., Almanza-Conejo, O., Ibarra-Manzano, M. A., Ruiz-Pinales, J., Ovalle-Magallanes, E., & Avina-Cervantes, J. G. (2024). Brain Tumor Segmentation from Optimal MRI Slices Using a Lightweight U-Net. Technologies, 12(10), 183. https://doi.org/10.3390/technologies12100183

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Brain Tumor Segmentation from Optimal MRI Slices Using a Lightweight U-Net

Abstract

1. Introduction

2. Literature Review

3. Mathematical Background

3.1. Convolutional Layer

3.2. Pooling Layer

3.3. Dropout Layer

3.4. Batch Normalization

3.5. Activation Functions

Cyclical Learning Rate

3.6. UNET Architecture

4. Materials and Method

4.1. BraTS Datasets

4.2. Overall Framework

4.3. Data Prepossessing

4.4. Proposed Model

4.5. Implementation Details

5. Numerical Results

5.1. Evaluations Metrics

5.1.1. Dice Similarity Coefficient

5.1.2. Intersection over Union

5.1.3. Sensitivity

5.1.4. Specificity

5.1.5. Hausdorff Distance

5.2. Performance Estimation of All Trained Models

5.3. Segmentation Results

5.4. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI