nnSegNeXt: A 3D Convolutional Network for Brain Tissue Segmentation Based on Quality Evaluation

Liu, Yuchen; Song, Chongchong; Ning, Xiaolin; Gao, Yang; Wang, Defeng

doi:10.3390/bioengineering11060575

Open AccessArticle

nnSegNeXt: A 3D Convolutional Network for Brain Tissue Segmentation Based on Quality Evaluation

by

Yuchen Liu

^1,2

,

Chongchong Song

¹,

Xiaolin Ning

^1,2,3,

Yang Gao

^1,3,* and

Defeng Wang

^1,*

¹

School of Instrumentation Science and Opto-Electronics Engineering, Beihang University, Beijing 100191, China

²

Institute of Large-Scale Scientific Facility and Centre for Zero Magnetic Field Science, Beihang University, Beijing 100191, China

³

National Institute of Extremely-Weak Magnetic Field Infrastructure, Hangzhou 310051, China

^*

Authors to whom correspondence should be addressed.

Bioengineering 2024, 11(6), 575; https://doi.org/10.3390/bioengineering11060575

Submission received: 8 May 2024 / Revised: 28 May 2024 / Accepted: 30 May 2024 / Published: 6 June 2024

(This article belongs to the Special Issue Machine Learning Methods for Biomedical Imaging)

Download

Browse Figures

Versions Notes

Abstract

Accurate and automated segmentation of brain tissue images can significantly streamline clinical diagnosis and analysis. Manual delineation needs improvement due to its laborious and repetitive nature, while automated techniques encounter challenges stemming from disparities in magnetic resonance imaging (MRI) acquisition equipment and accurate labeling. Existing software packages, such as FSL and FreeSurfer, do not fully replace ground truth segmentation, highlighting the need for an efficient segmentation tool. To better capture the essence of cerebral tissue, we introduce nnSegNeXt, an innovative segmentation architecture built upon the foundations of quality assessment. This pioneering framework effectively addresses the challenges posed by missing and inaccurate annotations. To enhance the model’s discriminative capacity, we integrate a 3D convolutional attention mechanism instead of conventional convolutional blocks, enabling simultaneous encoding of contextual information through the incorporation of multiscale convolutional features. Our methodology was evaluated on four multi-site T1-weighted MRI datasets from diverse sources, magnetic field strengths, scanning parameters, temporal instances, and neuropsychiatric conditions. Empirical evaluations on the HCP, SALD, and IXI datasets reveal that nnSegNeXt surpasses the esteemed nnUNet, achieving Dice coefficients of 0.992, 0.987, and 0.989, respectively, and demonstrating superior generalizability across four distinct projects with Dice coefficients ranging from 0.967 to 0.983. Additionally, extensive ablation studies have been implemented to corroborate the effectiveness of the proposed model. These findings represent a notable advancement in brain tissue analysis, suggesting that nnSegNeXt holds the promise to significantly refine clinical workflows.

Keywords:

deep learning; medical image analysis; brain tissue segmentation; convolutional attention mechanism; data quality evaluation

1. Introduction

The segmentation of brain tissue in magnetic resonance imaging (MRI) scans into constituent elements such as white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF) is instrumental in facilitating the diagnostic process for neurological pathologies like epilepsy, Alzheimer’s disease, and multiple sclerosis. Diseases with psychiatric and neurodegenerative origins often involve changes in cerebral tissue morphology, such as alterations in the volume or configuration of deep gray matter structures, cortical thickness, surface area, and convoluted brain patterns [1]. Therefore, the morphometric analysis of cerebral tissue serves as a critical biomarker for disease diagnosis and acts as an effective diagnostic tool [2,3]. In addition, brain tissue segmentation in MRI scans is valuable for preoperative evaluation, surgical planning [4], and the development of radiation therapy plans [5].

Manual segmentation, although accurate, is laborious, repetitive, and subjective, making it impractical even for experts when dealing with large-scale datasets. In the past, numerous conventional techniques have been proposed for cerebral tissue segmentation, including intensity thresholding [6,7], deformable models [8,9,10], clustering [11,12], and other machine learning algorithms. However, these techniques have faced significant challenges due to the complex structure of the brain, variations in tissue morphology and texture, and inherent features of MRI scans, which have limited their performance [13].

In recent years, deep-learning-based methods, particularly those based on fully convolutional networks (FCNs) [14], have emerged as a robust alternative to traditional machine learning algorithms for cerebral tissue segmentation tasks. Among these methods, the U-Net architecture [15] has gained considerable attention in medical image segmentation. However, as medical data often exist in 3D volumetric form, 3D convolution kernels are necessary. To address this, Cicek et al. [16] extended the U-Net architecture to handle 3D data, resulting in the development of the 3D U-Net for brain tissue segmentation. V-Net [17] utilizes residual connections to accelerate network convergence and provide excellent feature representation. SegNet [18] incorporates non-linear upsampling during decoding to reduce parameters and computational complexity. SegResNet [19] employs a residual encoder–decoder architecture with an auxiliary branch for input data reconstruction. nnU-Net [20,21] demonstrates that minor modifications to the U-Net architecture can yield competitive performance in medical image segmentation.

Attention-based Transformer architectures, along with convolutional networks, have demonstrated promising results in medical image segmentation. Attention U-Net [22] employs attention blocks to refine features before merging them with decoder outputs, while TransUNet [23] integrates a Vision Transformer at critical points to enhance performance. Cao et al. [24] proposes a pure Transformer-based network, Swin-UNet, and applies it to medical image segmentation tasks. It utilizes a hierarchical Swin Transformer [25] as the encoder to extract contextual features. The TABS [26] introduces a novel CNN-Transformer hybrid architecture to improve brain tissue segmentation. By designing a multiscale feature representation and a two-layer fusion module, a fine fusion of global and local features is realized. UNETR [27] eliminates the need for a CNN-based feature extractor by employing a ViT [28] encoder. nnFormer [29] combines convolutional layers and transformer layers in an interleaved encoder–decoder fashion. Although these attention-based architectures have significantly contributed to image segmentation, many solutions heavily rely on extensive labeled datasets. Additionally, the accuracy of labeling is crucial, and automated segmentation toolkits such as FSL [30] or FreeSurfer [1] cannot perfectly substitute ground truth due to inaccurate labeling and limited generalization capabilities.

Recent studies have explored the use of quality assessment methodologies to enhance the effectiveness of deep convolutional models in medical image analysis. For instance, Roy et al. [31] employed a Bayesian fully convolutional neural network and model uncertainty to modulate brain segmentation quality. They created uncertainty maps and three structure-wise uncertainty indices by generating Monte Carlo samples from the posterior distribution and using dropout during testing. Additionally, Hann et al. [32] developed a quality-control-centric framework for medical image segmentation, utilizing the Dice similarity coefficient prediction methodology to identify optimal segmentations and enhance precision and efficiency. Some researchers have also used deep neural networks to regress evaluation metrics for segmentation tasks. For instance, Li et al. [33] introduced an entropy-weighted Dice loss function to improve subcortical structure segmentation accuracy by training a neural network to better differentiate between foreground and background regions within ambiguous boundary voxels of subcortical structures. However, these approaches require creating training sets for regressors or error map predictors.

To address these limitations, nnSegNeXt presents a novel approach that leverages the edge overlap between input and labeled segments (see Figure 1). This overlap is a reliable measure of segmentation quality, which is then utilized during training to adjust the weights assigned to each image dynamically. This dynamic adjustment significantly enhances the overall accuracy of the segmentation. It has been observed that a higher degree of overlap corresponds to a higher level of accuracy in the segmentation results. Furthermore, we enhance the data preprocessing process that generates multi-center labels to further verify the neural network’s accuracy. This additional step improves the robustness of our framework and ensures more precise training results. Consequently, our approach effectively tackles the challenges of missing labels and inaccuracies, improving image segmentation accuracy. Our approach offers the following significant contributions:

We present a novel framework for brain tissue segmentation, leveraging a quality evaluation approach. This framework consists of two essential processes: dataset preprocessing and network training.
We incorporate a 3D Multiscale Convolutional Attention Module instead of conventional convolutional blocks, enabling simultaneous encoding of contextual information. These attention mechanisms significantly curtail computational overhead while eliciting spatial attention via multiscale convolutional features.
We devise a Data Quality Loss metric that appraises label quality on training images, thereby attenuating the impact of label quality on segmentation precision during the training process.

Figure 1. The proposed segmentation framework. The framework is composed of two main stages: preprocessing and network training. During the preprocessing stage, the dataset underwent several processing steps, such as bias field correction, brain extraction, affine registration, and FSL FAST, to produce the corresponding labels. In the network training stage, nnSegNeXt was trained using a weighted loss function on the preprocessed data.

2. Method

2.1. The Proposed Segmentation Framework

The nnSegNeXt framework is presented in Figure 1 and comprises two main stages: preprocessing and network training.

2.1.1. The Preprocessing Stage

In the initial phase of data preparation, the methodology delineated by Feng et al. [34] was refined and applied to preprocess the brain tissue imagery. The inaugural procedure entailed the implementation of a bias field correction [35] to rectify any unevenness in image intensity. Subsequently, the imagery was standardized to a resolution of

197 \times 233 \times 189

pixels, ensuring uniformity across all datasets. To prevent the omission of essential anatomical structures, the HD-Bet technique [20] was employed for the segregation of brain tissue from non-cerebral elements. Following this, the FSL FLIRT (version 6.0.7.11, created by the Analysis Group, FMRIB, Oxford, UK.) tool [36], with its trilinear interpolation capability, was utilized for dataset alignment to the MNI152 isotropic standard space, employing a 1 mm³ brain template for affine registration. The final stage involved the application of FSL FAST for the delineation of different brain tissues within the imagery, a pivotal step for the ensuing analysis. This comprehensive preprocessing protocol ensured the maintenance of data integrity and consistency, facilitating accurate brain tissue segmentation.

2.1.2. The Network Training Stage

During the network training stage, we trained nnSegNeXt on the preprocessed data using a weighted loss function that considers image quality. The network architecture is detailed in the following section.

2.2. Network Architecture

The nnSegNeXt network, depicted in Figure 2, involves processing input data

X \in R^{H \times W \times D \times S}

. It consists of five stages with downsampling rates of 2, 4, 8, 16, and 32. The shallow stages of the encoder (Stages 1 and 2) employ downsampling and 3D convolutional layers with a

3 \times 3 \times 3

kernel size, while the deeper stages (Stages 3, 4, and 5) integrate downsampling and a 3D Convolutional Attention Module for capturing global information. The bottleneck uses a 3D Convolutional Attention Module to provide a sufficient receptive field to the decoder, which shares a highly symmetrical architecture with the encoder. Strided deconvolution upsamples low-resolution feature maps to high-resolution ones, with skip connections linking corresponding features in the encoding and decoding paths. In line with the nnU-Net training framework, our approach optimizes network learning through a weighted deep-supervised loss function. This function incorporates both the low-resolution outputs from the initial stages and the output from the final stage. Notably, only the output from the final stage is used as the final result. By considering the features of different stages’ hidden layers during the training process, this method enhances the network’s training effectiveness and generalization ability. This architecture emphasizes the replacement of Batch Normalization (BatchNorm) with Instance Normalization (InstanceNorm) for increased stability. By normalizing features per instance and channel, InstanceNorm allows for greater flexibility in handling style variations, which has proven to be beneficial in our application. The 3D Convolutional Attention Module and the loss function are detailed in the following sections.

2.3. 3D Multiscale Convolutional Attention Module

We have implemented attention mechanisms similar to those utilized in SegNeXt for both the encoder and the decoder. However, we significantly improved this approach by optimizing the Multiscale Convolutional Attention (MSCA) module in SegNeXt to a three-dimensional Multiscale Convolutional Attention (3DMSCA) module. Instead of relying on self-attention mechanisms, we upgraded the MSCA module to a three-dimensional Multiscale Convolutional Attention Module. Additionally, we used InstanceNorm instead of BatchNorm to address the challenges presented by medical images, and modified the size of the multiscale convolution kernel to better suit medical images. Our 3DMSCA module comprises three components, as illustrated in Figure 3: a depth-wise convolution for local information aggregation, a multi-branch depth-wise band convolution for capturing multiscale contexts, and a

1 \times 1 \times 1

convolution for modeling relationships among channels. The output of the

1 \times 1 \times 1

convolution serves as attention weights that reweigh the inputs of 3DMSCA. Mathematically, our 3DMSCA can be formulated as follows:

\begin{matrix} X_{o u t} = {Conv}_{1 \times 1 \times 1} (\sum_{i = 0}^{3} {Sca}_{i} ({Conv}_{D} (X_{i n}))) \otimes X_{i n}, \end{matrix}

(1)

where

X_{i n}

denotes the input feature to the network, while

X_{o u t}

represents the corresponding output. The operation ⊗ refers to an element-wise matrix multiplication process. The layer denoted as

{Conv}_{D}

denotes a depth-wise convolution, whereas

{Sca}_{i}

,

i \in {0, 1, 2, 3}

represents the specific branch shown in Figure 3. The

{Sca}_{0}

branch corresponds to the identity connection. To approximate standard convolutions with large kernels, we deploy three depth-wise strip convolutions in each branch, agreeing with reference guidance [37]. In this case, the kernel sizes for corresponding branches are set to 5, 7, and 11. We prefer to use depth-wise strip convolutions because of the lightweight nature of strip convolution operations. Specifically, we can replicate a standard 3D convolution with a kernel size of

5 \times 5 \times 5

by deploying a set of

5 \times 1 \times 1

,

1 \times 5 \times 1

, and

1 \times 1 \times 5

convolutions.

Loss Function

Our proposed nnSegNeXt loss function consists of two parts: the segmentation loss

L_{seg}

and the data quality loss

L_{Data}

. The segmentation loss

L_{seg}

adopts a weighted deep-supervised loss function and is composed of Dice and multi-class cross-entropy loss between the predicted and ground truth labels. The Dice loss is a widely used metric for the evaluation of segmentation algorithms, as it measures the overlap between the predicted and ground truth labels [38]. The multi-class cross-entropy loss, on the other hand, penalizes the differences between the predicted probabilities and the ground truth labels [39]. The Dice loss and the multi-class cross-entropy loss are defined as follows:

\begin{array}{l} ℓ_{dice} (P, G) = 1 - \frac{1}{K} \sum_{k} \frac{2 \sum_{i \in Ω} P^{k} (i) G^{k} (i)}{\sum_{i \in Ω} {(P^{k} (i))}^{2} + \sum_{i \in Ω} {(G^{k} (i))}^{2}}, \\ ℓ_{cross - entropy} (P, G) = - \frac{1}{N} \sum_{i \in Ω} \sum_{k} P_{k}^{i} log (G_{k}^{i}), \end{array}

(2)

where

P

and

G

are the predicted and ground truth labels, respectively.

k \in K

represents the k-th class, which consists of four different classes: background, GM, WM, and CSF.

P_{k}^{i}

is the predicted probability of the k-th class for pixel

i

, while

G_{k}^{i}

is the corresponding ground truth label.

Ω

denotes all the pixels in the predicted segmentation result

P

and its corresponding ground truth

G

.

The overall segmentation loss is then given by the following:

\begin{matrix} L_{Seg} (P, G) = \sum_{s} w \cdot (ℓ_{dice} (P_{s}, G) + ℓ_{cross - entropy} (P_{s}, G)), \end{matrix}

(3)

where

s \in S

represents the s-th stage. Due to the size difference between

P_{s}

and

G

, we upsample the low-resolution prediction

P_{s}

to the same size as

G

for loss calculation. w represents the weight

w_{s}

of the s-th stage output prediction, with weights assigned according to resolution in ascending order [0.03125, 0.0625, 0.125, 0.25, 0.5].

To evaluate the accuracy of preprocessed image labels, our method is based on edge extraction and a comparison of edge overlap. Specifically, we employ the Canny operator [40] to extract edges from both the original input patches and their corresponding labeled patches. We then compare the degree of overlap between the edges to obtain quality weight scores for the labels, denoted as

W_{Data}

. Patches with a higher edge overlap are considered to have more accurate labels and should receive more attention during subsequent training to enhance the precision of the segmentation results. The degree of edge overlap is quantitatively measured using the Dice metric, and the segmentation weight scores are used to guide the network training. This allows us to assess the accuracy of the image labels and optimize the performance of deep learning models. The quality score of input patches,

W_{Data}

, is calculated using the following equation:

\begin{matrix} W_{Data} (E_{I}, E_{L}) = \frac{2 \sum_{i \in Ω} E_{I} (i) E_{L} (i)}{\sum_{i \in Ω} {(E_{I} (i))}^{2} + \sum_{i \in Ω} {(E_{L} (i))}^{2}}, \end{matrix}

(4)

where

E_{I}

and

E_{L}

represent the edge maps of the input patch and labeled patch, respectively. The overlap of the two edge maps allows us to evaluate the accuracy of the image labels. The data quality loss

L_{Data}

is defined as the product of the quality score

W_{Data}

and the cross-entropy loss, and is used to guide the network training process, as follows:

\begin{matrix} L_{Data} (P, G) = W_{Data} (E_{I}, E_{L}) \cdot ℓ_{cross - entropy} (P, G) . \end{matrix}

(5)

Finally, the total loss

L_{Total}

, which incorporates both the image quality loss and the segmentation loss, is expressed as follows:

\begin{matrix} L_{Total} = L_{Seg} + λ \cdot L_{Data}, \end{matrix}

(6)

where

λ

represents the trade-off parameter that weighs the importance of each component.

3. Experiments

3.1. Datasets

We conducted our initial experiment by collecting MRI scan data from a diverse cohort of healthy subjects representing different age groups from three distinct datasets: HCP [41], SALD [42], and IXI (https://brain-development.org/ixi-dataset/, accessed on 5 May 2024). Although all datasets employed the MPRAGE sequence, discrepancies existed in other scanning parameters. Specifically, the datasets had varying field strengths, with HCP and SALD utilizing 3T scans, while IXI used 1.5T scans. Furthermore, different scanners were used to obtain the datasets, with the Philips scanner employed for the IXI dataset instead of the Siemens scanner. In addition, these datasets differed in specific scan parameter characteristics, such as repetition/echo time and flip angles. Moreover, to evaluate the model’s generalizability, we employed the IBSR dataset (https://www.nitrc.org/projects/ibsr, accessed on 5 May 2024), a labeled dataset widely utilized in brain tissue segmentation tasks. However, the dataset contained only 18 instances with a voxel size of

0.875 \times 1.5 \times 0.875

mm. We trained the network using this dataset to higight its superiority. A detailed overview of the demographic details and acquisition parameters for all four datasets is provided in Table 1. The HCP, SALD, and IXI datasets contained 200, 251, and 224 scans, respectively, all partitioned into training and test sets at a 4:1 ratio. This distribution facilitated a comprehensive evaluation of our model across diverse datasets and scanning parameters, thereby enhancing the robustness and generalizability of our findings.

3.2. Evaluation Metrics

In evaluating the segmentation performance of various methods, we conducted our experiment utilizing the Dice coefficient [38] and the 95th percentile of the Hausdorff distance [43]. The Dice coefficient (DC) measures the degree of overlap between the predicted segmentation outcome and the ground truth, and is represented as a percentage ranging from 0% (indicating a complete mismatch) to 100% (representing a perfect match), as depicted in the following Equation (7):

DC (G, P) = 2 \frac{G \cap P}{G + P} \cdot 100 %

(7)

where P denotes the predicted segmentation result, while G signifies the ground truth. The Dice coefficient measures the extent of overlap between the predicted segmentation result (P) and the ground truth (G). The Hausdorff distance (HD) quantifies the distance between the predicted segmentation result and the ground truth. Nevertheless, the conventional HD is exceedingly sensitive to outliers. As a result, we utilized the 95th percentile of the HD for outlier suppression. The 95th percentile of the HD is defined as follows:

\begin{matrix} h_{95} (P, G) & = {}^{95}K_{p \in P}^{t h} min_{g \in G} ∥ g - p ∥ \\ HD (G, P) & = max {h_{95} (P, G), h_{95} (G, P)} \end{matrix}

(8)

where p denotes an element of the predicted segmentation result P, and g represents an element of the ground truth G. A smaller HD value indicates greater proximity between the segmentation prediction and the ground truth, thus reflecting superior segmentation performance.

3.3. Implementation Details

The experiments were conducted using PyTorch (version 2.2.0) [44] on an NVIDIA RTX 3060 with 12 GB RAM. To ensure fair comparisons, all U-shaped fully convolutional neural networks (FCNNs) utilized five scales of feature maps and maintained a similar number of feature channels at each stage along the encoding and decoding paths. Instead of providing the entire MRI volumes as input to the networks, the images were cropped to sizes of

128 \times 128 \times 128

. The network’s performance was evaluated by continuing the training process until the model’s performance on the validation set ceased to improve, with loss computation excluding background voxels. The initial learning rate was set to 0.01, and a “poly” decay strategy described in Equation (9) was employed. The weight decay was set to

3 \times 10^{- 5}

. Demonstrating the effectiveness of the proposed network, a 5-fold cross-validation was conducted, with 500 training epochs, where one epoch included 250 iterations. The default optimizer was SGD, with a momentum of 0.99. For other hyperparameters, the weighting parameter

λ

in Equation (9) was set to 1, and standard data augmentations, such as axial flip and rotation, were applied during training to enhance performance.

lr = initial_lr \times {(1 - \frac{epoch_id}{\max_epoch})}^{0.9} .

(9)

3.4. Results

In this section, we delved into the performance and generality of our model. We initially present a comparative analysis of our model’s performance with state-of-the-art CNN-based and Transformer-based models. We then proceed to discuss the generality of our model, again in comparison with the top CNN-based and Transformer-based models. Further, we extend our validation to ISBR datasets. Subsequently, statistical validation was achieved through paired t-tests with a Bonferroni correction applied to determine the significance of enhancements attributed to the nnSegNeXt method over nnUNet. Additionally, we report the findings of an ablation study conducted on the HCP, SALD, and IXI datasets.

3.4.1. Model Performance

We compared nnSegNeXt with state-of-the-art CNN-based models using the HCP, SALD, and IXI datasets. Table 2 demonstrates that nnSegNeXt consistently outperformed other CNN models in both the Dice coefficient and HD95 for gray matter (GM), white matter (WM), and cerebrospinal fluid (CSF) tissue types across all datasets. Notably, on the HCP dataset, nnSegNeXt achieved the highest Dice score of 0.992 and the lowest HD95 value of 0.277. Similar trends were observed for the SALD and IXI datasets, highlighting nnSegNeXt as a superior model for accurate brain tissue segmentation with remarkable generalization capability across diverse datasets. Additionally, Figure 4 provides qualitative comparisons across all methods, and Figure 5 displays exemplary segmentation outputs for the performance testing of all datasets.

Table 3 demonstrates that the nnSegNeXt model consistently outperforms other models concerning Dice and HD95 scores. On the HCP dataset, the nnSegNeXt model achieved Dice coefficients of 0.991 for gray matter (GM) and 0.994 for white matter (WM), surpassing other models significantly. Furthermore, on the SALD and IXI datasets, nnSegNeXt demonstrated superior results compared with other models in terms of Dice coefficients and HD95 metrics for GM, WM, and CSF. Specifically, on the SALD dataset, nnSegNeXt exhibited GM Dice coefficient and HD95 values of 0.984 and 0.546, WM Dice coefficient and HD95 values of 0.991 and 0.285, and CSF Dice coefficient and HD95 values of 0.986 and 0.486. Additionally, qualitative comparisons across all methods can be found in Figure A4 in the Appendix A, while Figure 5 displays the visualization of representative segmentation outputs for all models.

3.4.2. Model Generality

The generality of nnSegNeXt was evaluated by comparing its performance with those of other CNN-based models on brain tissue segmentation across multiple datasets, including HCP → SALD, SALD → HCP, HCP → IXI, and SALD → IXI. Table 4 indicates that the nnSegNeXt model consistently outperformed other models, demonstrating superior segmentation across all four datasets with higher Dice coefficients, smaller HD95 values, and overall better average performance. For example, in the HCP → IXI and SALD → IXI experiments, the nnSegNeXt model achieved average Dice coefficients of 0.937 and 0.910, respectively, surpassing other models in terms of HD95 values. A qualitative comparison of the models’ generality is depicted in Figure A1 in Appendix A, and representative segmentation outputs for all models are displayed in Figure A2 in Appendix A.

Table 5 demonstrates nnSegNeXt’s consistent outperformance of other Transformer-based models across multiple datasets. Specifically, on the HCP → SALD dataset, nnSegNeXt achieved an impressive Dice score of 0.967 for GM segmentation, surpassing all other models. Additionally, for WM and CSF segmentation, nnSegNeXt attained the highest Dice scores of 0.974 and 0.982, respectively. The comparative results of model generality and representative segmentation outputs are depicted in Figure A3 and Figure A4 in Appendix A.

3.5. Validation on IBSR Dataset

We conducted additional validation using the publicly available labeled brain tissue segmentation datasets IBSR to confirm the validity of the model. Despite the limited data and differences in labeling between the sulcal CSF regions, which were labeled as GM in IBSR and as CSF in other datasets, our method displayed superior performance compared with leading segmentation frameworks such as nnUNet and nnFormer, even with low-quality datasets. Table 6 presents a performance comparison of nnSegNeXt with other leading models on the IBSR dataset. nnSegNeXt achieved the highest Dice scores of 0.944, 0.922, and 0.796 for GM, WM, and CSF segmentations, respectively. These results demonstrate the substantial advantages of nnSegNeXt in accurately segmenting brain tissues. The comparative results of model performance and representative segmentation outputs are illustrated in Figure A5 and Figure A6 in Appendix A.

3.5.1. Comparison with nnUNet

In this section, we conducted a comparative analysis between nnSegNeXt and the renowned top-tier 3D medical image segmentation model, nnUNet. Observing the average performance metrics in Table 7, nnSegNeXt consistently demonstrates superior average performance. For instance, nnSegNeXt surpasses nnUNet across all three public datasets, achieving lower values in both Dice and HD95, with average DSC values of 0.992, 0.987, and 0.989, respectively. The term “Meandiff” refers to the average performance discrepancy between nnSegNeXt and nnUNet. A positive “Meandiff” for the DSC indicates enhanced segmentation precision by nnSegNeXt. Conversely, negative values for the HD95 score suggest superior edge delineation capabilities by nnSegNeXt. This indicates that nnSegNeXt may offer a more accurate object boundary delineation under the HD95 metric.

To further substantiate the performance superiority of nnSegNeXt over nnUNet, we employed paired t-tests with a Bonferroni correction [45] to calculate the p-values for the nnSegNeXt and nnUNet methods across the HCP, SALD, and IXI datasets. As performed in Table 7, we presented two sets of p-values for both HD95 and DSC across the three public datasets. The significantly low p-values (well below 0.05) confirm the statistically significant performance improvement of nnSegNeXt over nnUNet.

3.5.2. Ablation Study

We conducted ablation experiments and evaluated the performance on three different datasets using the Dice similarity coefficient (DSC) as the default evaluation metric, as shown in Table 8. The most basic baseline model excluded the MSCAN layer and

L_{Data}

. Subsequently, we replaced the convolutional layer in the deeper network layers with the MSCAN layer, which resulted in a noteworthy improvement in segmentation accuracy of 0.6%, 0.6%, and 0.4% on the respective datasets. This approach also achieved a higher average DSC compared with SegResNet and TransBTS, as observed in the previous experiments. However, when attempting to replace all the convolutional layers, there was a decrease in accuracy, attributed to the initial struggle of the Transformer block to efficiently capture spatial dependencies within the large medical image data. Moreover, the features from Stages 1 and 2 contained excessive low-level information, hindering the performance. In contrast, convolutional layers excel at capturing local features and preserving spatial information, which is crucial in medical imaging. Therefore, we retained the convolutional layers in the initial stages. Additionally, we experimented with the

L_{Data}

loss function and identified its significant impact on the overall performance of nnSegNeXt. In conclusion, our ablation study highlights the crucial role of the nnSegNeXt architecture with MSCAN and

L_{Data}

components in its effectiveness, suggesting its potential as a superior and more efficient method for brain tissue segmentation based on quality assessment.

4. Discussion

In this section, we investigated the issue of label dependency in medical image segmentation tasks within clinical settings. The variability in label quality, influenced by differences in scanning devices, software processing environments, and the expertise of annotators, presents a challenge for the selection of training strategies. In response to this issue, we introduce the nnSegNeXt framework, which aims to enhance segmentation accuracy through the optimization of data preprocessing and training procedures.

The quality of data annotation is intrinsically connected to the dependability of training models. Zhang et al. [46] provide a framework for the prediction of segmentation errors and the assessment of segmentation quality for Whole-Heart Segmentation, thereby advancing the precision and trustworthiness of automated segmentation technologies. Zhang et al. [47] improved the quality of crowdsourced labels using noise correction methods and assessed their impact on learning models. Marmanis et al. [48] proposed a trainable deep convolutional neural network that enhances segmentation quality by integrating semantic segmentation and edge detection. Cheng et al. [49] proposed a new segmentation evaluation metric—boundary IoU, concentrating on the improvement of boundary quality to augment segmentation precision. Zhu et al. [50] introduced a brain tumor segmentation approach that fuses semantic and edge features, realized through the design of a graph-convolution-based multi-feature inference block. Unlike other methods, our approach assesses label quality by extracting edges from the training data and incorporates a 3D Multiscale Convolutional Attention Module and a quality loss function, effectively increasing segmentation precision.

Despite its strengths, nnSegNeXt’s performance is somewhat influenced by the dataset’s image quality, suggesting an area for future optimization. Additionally, expanding the model’s testing on more diverse datasets could further its generalization capabilities and explore potential clinical applications. Future research should endeavor to broaden the utilization of this methodology to additional challenges in semantic segmentation, such as the delineation of brain tumors and skin lesions.

5. Conclusions

In this study, we presented nnSegNeXt, a novel framework for brain tissue segmentation designed to address the challenges of missing and inaccurate labels. The essence of nnSegNeXt lies in its innovative substitution of traditional convolutional blocks with three-dimensional Multiscale Convolutional Attention Modules. This design choice enables the model to encode contextual information more effectively, enhancing its ability to focus on relevant features for more accurate segmentation. Moreover, nnSegNeXt incorporates a data quality loss function, which significantly reduces the model’s reliance on the quality of the training dataset, bolstering the model’s versatility and robustness across various scenarios. The results revealed that nnSegNeXt achieved superior segmentation accuracy compared with various CNN and Transformer-based methods, demonstrating its effectiveness for medical image segmentation. This endeavor could significantly improve segmentation accuracy and efficiency, particularly in clinical settings where rapid and precise imaging analysis is crucial for timely diagnosis and treatment planning.

Author Contributions

Conceptualization, Y.L. and D.W.; methodology, Y.L., C.S., and D.W.; software, Y.L. and C.S.; analysis, Y.L.; resources, X.N. and Y.G.; data curation, Y.L.; writing—original draft preparation, Y.L.; writing—review and editing, Y.G. and D.W.; visualization, Y.L.; supervision, X.N. and Y.G.; project administration, Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Joint Funds of the National Natural Science Foundation of China (No. U23A20434); Innovation Program for Quantum Science and Technology, Hefei National Laboratory, Hefei 230088, China (No. 2021ZD0300500/2021ZD03005 03); and Development and Application of Extremely Weak Magnetic Field Measurement Technology Based on Atomic Magnetometer (No. 2022-189-181).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The code is publicly available and accessible at https://github.com/Liuyuchen0224/nnSegNeXt (accessed on 5 May 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Figure A1. Qualitative results of generality comparison with state-of-the-art CNN-based models on the HCP, SALD, and IXI datasets. (a) HCP → SALD, (b) SALD → HCP, (c) HCP → IXI, (d) SALD → IXI. Boxplots showing Dice scores for different brain MR tissues using the proposed nnSegNeXt and existing image registration methods. The convention HCP → SALD signifies that the dataset HCP is utilized as the training set, while the dataset SALD is deployed for the subsequent inference process.

Figure A2. Qualitative results of performance comparison with state-of-the-art Transformer-based models on the (a) HCP, (b) SALD, and (c) IXI datasets. Boxplots showing Dice scores for different brain MR tissues using the proposed nnSegNeXt and existing image registration methods.

Figure A3. Qualitative results of generality comparison with state-of-the-art Transformer-based models on the HCP, SALD, and IXI datasets. (a) HCP → SALD, (b) SALD → HCP, (c) HCP → IXI, (d) SALD → IXI. Boxplots showing Dice scores for different brain MR tissues using the proposed nnSegNeXt and existing image registration methods. The convention HCP → SALD signifies that the dataset HCP is utilized as the training set, while the dataset SALD is deployed for the subsequent inference process.

Figure A4. Visualization of generality comparison with state-of-the-art models on the HCP, SALD, and IXI datasets. Red indicates gray matter (GM), green indicates white matter (WM), and blue indicates cerebrospinal fluid (CSF). Zoom-in regions are provided below each image.

Figure A5. Qualitative results of performance comparison with state-of-the-art models on the IBSR dataset. Boxplots showing Dice scores for different brain MR tissues using the proposed nnSegNeXt and existing image registration methods.

Figure A6. Visualization of performance comparison with state-of-the-art models on the IBSR dataset. Red indicates gray matter (GM), green indicates white matter (WM), and blue indicates cerebrospinal fluid (CSF). Zoom-in regions are provided below each image.

References

Fischl, B.; Salat, D.H.; Busa, E.; Albert, M.; Dieterich, M.; Haselgrove, C.; van der Kouwe, A.; Killiany, R.; Kennedy, D.; Klaveness, S.; et al. Whole Brain Segmentation: Automated Labeling of Neuroanatomical Structures in the Human Brain. Neuron 2002, 33, 341–355. [Google Scholar] [CrossRef] [PubMed]
Igual, L.; Soliva, J.C.; Gimeno, R.; Escalera, S.; Vilarroya, O.; Radeva, P. Automatic Internal Segmentation of Caudate Nucleus for Diagnosis of Attention-Deficit/Hyperactivity Disorder. In Proceedings of the Image Analysis and Recognition: 9th International Conference, ICIAR 2012, Aveiro, Portugal, 25–27 June 2012; pp. 222–229. [Google Scholar]
Li, D.J.; Huang, B.L.; Peng, Y. Comparisons of Artificial Intelligence Algorithms in Automatic Segmentation for Fungal Keratitis Diagnosis by Anterior Segment Images. Front. Neurosci. 2023, 17, 1195188. [Google Scholar] [CrossRef] [PubMed]
Kikinis, R.; Shenton, M.; Iosifescu, D.; McCarley, R.; Saiviroonporn, P.; Hokama, H.; Robatino, A.; Metcalf, D.; Wible, C.; Portas, C.; et al. A Digital Brain Atlas for Surgical Planning, Model-Driven Segmentation, and Teaching. IEEE Trans. Vis. Comput. Graph. 1996, 2, 232–241. [Google Scholar] [CrossRef]
Pitiot, A.; Delingette, H.; Thompson, P.M.; Ayache, N. Expert Knowledge-Guided Segmentation System for Brain MRI. NeuroImage 2004, 23, S85–S96. [Google Scholar] [CrossRef] [PubMed]
Otsu, N. A Threshold Selection Method from Gray-Level Histograms. IEEE Trans. Syst. Man Cybern. 1979, 9, 62–66. [Google Scholar] [CrossRef]
Maitra, M.; Chatterjee, A. A Novel Technique for Multilevel Optimal Magnetic Resonance Brain Image Thresholding Using Bacterial Foraging. Measurement 2008, 41, 1124–1134. [Google Scholar] [CrossRef]
Kass, M.; Witkin, A.; Terzopoulos, D. Snakes: Active Contour Models. Int. J. Comput. Vision 1988, 1, 321–331. [Google Scholar] [CrossRef]
Cootes, T.F.; Taylor, C.J.; Cooper, D.H.; Graham, J. Active Shape Models-Their Training and Application. Comput. Vis. Image Underst. 1995, 61, 38–59. [Google Scholar] [CrossRef]
Cootes, T.; Edwards, G.; Taylor, C. Active Appearance Models. IEEE Trans. Pattern Anal. Mach. Intell. 2001, 23, 681–685. [Google Scholar] [CrossRef]
Chuang, K.S.; Tzeng, H.L.; Chen, S.; Wu, J.; Chen, T.J. Fuzzy C-Means Clustering with Spatial Information for Image Segmentation. Comput. Med. Imaging Graph. 2006, 30, 9–15. [Google Scholar] [CrossRef]
Deoni, S.C.L.; Rutt, B.K.; Parrent, A.G.; Peters, T.M. Segmentation of Thalamic Nuclei Using a Modified K-Means Clustering Algorithm and High-Resolution Quantitative Magnetic Resonance Imaging at 1.5 T. NeuroImage 2007, 34, 117–126. [Google Scholar] [CrossRef] [PubMed]
Kruggel, F.; Turner, J.; Muftuler, L.T. Impact of Scanner Hardware and Imaging Protocol on Image Quality and Compartment Volume Precision in the ADNI Cohort. NeuroImage 2010, 49, 2123–2133. [Google Scholar] [CrossRef] [PubMed]
Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Lecture Notes in Computer Science, Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Munich, Germany, 5–9 October 2015; Springer International Publishing: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
çicek, O.; Abdulkadir, A.; Lienkamp, S.S.; Brox, T.; Ronneberger, O. 3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation. In Lecture Notes in Computer Science, Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2016, Athens, Greece, 17–21 October 2016; Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W., Eds.; Springer International Publishing: Berlin/Heidelberg, Germany, 2016; pp. 424–432. [Google Scholar]
Milletari, F.; Navab, N.; Ahmadi, S.A. V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. In Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA, 25–28 October 2016; pp. 565–571. [Google Scholar]
Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef] [PubMed]
Myronenko, A. 3D MRI Brain Tumor Segmentation Using Autoencoder Regularization. In Lecture Notes in Computer Science, Proceedings of the Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries, Granada, Spain, 16 September 2018; Springer International Publishing: Berlin/Heidelberg, Germany, 2019; pp. 311–320. [Google Scholar]
Isensee, F.; Schell, M.; Pflueger, I.; Brugnara, G.; Bonekamp, D.; Neuberger, U.; Wick, A.; Schlemmer, H.P.; Heiland, S.; Wick, W.; et al. Automated Brain Extraction of Multisequence MRI Using Artificial Neural Networks. Hum. Brain Mapp. 2019, 40, 4952–4964. [Google Scholar] [CrossRef] [PubMed]
Isensee, F.; Jaeger, P.F.; Kohl, S.A.A.; Petersen, J.; Maier-Hein, K.H. nnU-Net: A Self-Configuring Method for Deep Learning-Based Biomedical Image Segmentation. Nat. Methods 2021, 18, 203–211. [Google Scholar] [CrossRef] [PubMed]
Oktay, O.; Schlemper, J.; Folgoc, L.L.; Lee, M.; Heinrich, M.; Misawa, K.; Mori, K.; McDonagh, S.; Hammerla, N.Y.; Kainz, B.; et al. Attention U-Net: Learning Where to Look for the Pancreas. arXiv 2018, arXiv:1804.03999. [Google Scholar]
Chen, B.; Liu, Y.; Zhang, Z.; Lu, G.; Kong, A.W.K. TransAttUnet: Multi-Level Attention-Guided U-Net With Transformer for Medical Image Segmentation. IEEE Trans. Emerg. Top. Comput. Intell. 2023, 8, 55–68. [Google Scholar] [CrossRef]
Cao, H.; Wang, Y.; Chen, J.; Jiang, D.; Zhang, X.; Tian, Q.; Wang, M. Swin-Unet: Unet-Like Pure Transformer for Medical Image Segmentation. In Computer Vision—ECCV 2022 Workshops; Springer Nature: Cham, Switzerland, 2023; Volume 13803, pp. 205–218. [Google Scholar]
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
Rao, V.M.; Wan, Z.; Arabshahi, S.; Ma, D.J.; Lee, P.Y.; Tian, Y.; Zhang, X.; Laine, A.F.; Guo, J. Improving Across-Dataset Brain Tissue Segmentation for MRI Imaging Using Transformer. Front. Neuroimaging 2022, 1, 1023481. [Google Scholar] [CrossRef] [PubMed]
Hatamizadeh, A.; Tang, Y.; Nath, V.; Yang, D.; Myronenko, A.; Landman, B.; Roth, H.R.; Xu, D. UNETR: Transformers for 3D Medical Image Segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2022; pp. 574–584. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: New York, NY, USA, 2017; Volume 30. [Google Scholar]
Zhou, H.Y.; Guo, J.; Zhang, Y.; Han, X.; Yu, L.; Wang, L.; Yu, Y. nnFormer: Volumetric Medical Image Segmentation via a 3D Transformer. IEEE Trans. Image Process. 2023, 32, 4036–4045. [Google Scholar] [CrossRef] [PubMed]
Jenkinson, M.; Beckmann, C.F.; Behrens, T.E.J.; Woolrich, M.W.; Smith, S.M. FSL. NeuroImage 2012, 62, 782–790. [Google Scholar] [CrossRef]
Roy, A.G.; Conjeti, S.; Navab, N.; Wachinger, C. Inherent Brain Segmentation Quality Control from Fully ConvNet Monte Carlo Sampling. In Lecture Notes in Computer Science, Proceedings of the Medical Image Computing and Computer Assisted Intervention—MICCAI 2018, Granada, Spain, 16–20 September 2018; Springer International Publishing: Berlin/Heidelberg, Germany, 2018; pp. 664–672. [Google Scholar]
Hann, E.; Biasiolli, L.; Zhang, Q.; Popescu, I.A.; Werys, K.; Lukaschuk, E.; Carapella, V.; Paiva, J.M.; Aung, N.; Rayner, J.J.; et al. Quality Control-Driven Image Segmentation Towards Reliable Automatic Image Analysis in Large-Scale Cardiovascular Magnetic Resonance Aortic Cine Imaging. In Lecture Notes in Computer Science, Proceedings of the Medical Image Computing and Computer Assisted Intervention—MICCAI 2019, Shenzhen, China, 13–17 October 2019; Springer International Publishing: Berlin/Heidelberg, Germany, 2019; pp. 750–758. [Google Scholar]
Li, X.; Wei, Y.; Wang, L.; Fu, S.; Wang, C. MSGSE-Net: Multi-Scale Guided Squeeze-and-Excitation Network for Subcortical Brain Structure Segmentation. Neurocomputing 2021, 461, 228–243. [Google Scholar] [CrossRef]
Feng, X.; Tustison, N.J.; Patel, S.H.; Meyer, C.H. Brain Tumor Segmentation Using an Ensemble of 3D U-Nets and Overall Survival Prediction Using Radiomic Features. Front. Comput. Neurosci. 2020, 14, 25. [Google Scholar] [CrossRef] [PubMed]
Sled, J.; Zijdenbos, A.; Evans, A. A Nonparametric Method for Automatic Correction of Intensity Nonuniformity in MRI Data. IEEE Trans. Med. Imaging 1998, 17, 87–97. [Google Scholar] [CrossRef] [PubMed]
Smith, S.M.; Jenkinson, M.; Woolrich, M.W.; Beckmann, C.F.; Behrens, T.E.J.; Johansen-Berg, H.; Bannister, P.R.; De Luca, M.; Drobnjak, I.; Flitney, D.E.; et al. Advances in Functional and Structural MR Image Analysis and Implementation as FSL. NeuroImage 2004, 23, S208–S219. [Google Scholar] [CrossRef] [PubMed]
Peng, C.; Zhang, X.; Yu, G.; Luo, G.; Sun, J. Large Kernel Matters—Improve Semantic Segmentation by Global Convolutional Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4353–4361. [Google Scholar]
Dice, L.R. Measures of the amount of ecologic association between species. Ecology 1945, 26, 297–302. [Google Scholar] [CrossRef]
De Boer, P.T.; Kroese, D.P.; Mannor, S.; Rubinstein, R.Y. A tutorial on the cross-entropy method. Ann. Oper. Res. 2005, 134, 19–67. [Google Scholar] [CrossRef]
Ding, L.; Goshtasby, A. On the Canny Edge Detector. Pattern Recognit. 2001, 34, 721–725. [Google Scholar] [CrossRef]
Van Essen, D.C.; Smith, S.M.; Barch, D.M.; Behrens, T.E.J.; Yacoub, E.; Ugurbil, K. The WU-Minn Human Connectome Project: An Overview. NeuroImage 2013, 80, 62–79. [Google Scholar] [CrossRef] [PubMed]
Wei, D.; Zhuang, K.; Ai, L.; Chen, Q.; Yang, W.; Liu, W.; Wang, K.; Sun, J.; Qiu, J. Structural and Functional Brain Scans from the Cross-Sectional Southwest University Adult Lifespan Dataset. Sci. Data 2018, 5, 180134. [Google Scholar] [CrossRef]
Beauchemin, M.; Thomson, K.; Edwards, G. On the Hausdorff Distance Used for the Evaluation of Segmentation Results. Can. J. Remote Sens. 1998, 24, 3–8. [Google Scholar] [CrossRef]
Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: New York, NY, USA, 2019; Volume 32. [Google Scholar]
Armstrong, R.A. When to use the B onferroni correction. Ophthalmic Physiol. Opt. 2014, 34, 502–508. [Google Scholar] [CrossRef] [PubMed]
Zhang, R.; Chung, A.C.S. A Fine-Grain Error Map Prediction and Segmentation Quality Assessment Framework for Whole-Heart Segmentation. In Lecture Notes in Computer Science, Proceedings of the Medical Image Computing and Computer Assisted Intervention—MICCAI 2019, Shenzhen, China, 13–17 October 2019; pp. 550–558.
Zhang, J.; Sheng, V.S.; Li, T.; Wu, X. Improving crowdsourced label quality using noise correction. IEEE Trans. Neural Netw. Learn. Syst. 2017, 29, 1675–1688. [Google Scholar] [CrossRef] [PubMed]
Marmanis, D.; Schindler, K.; Wegner, J.D.; Galliani, S.; Datcu, M.; Stilla, U. Classification with an edge: Improving semantic image segmentation with boundary detection. ISPRS J. Photogramm. Remote Sens. 2018, 135, 158–172. [Google Scholar] [CrossRef]
Cheng, B.; Girshick, R.; Dollár, P.; Berg, A.C.; Kirillov, A. Boundary IoU: Improving object-centric image segmentation evaluation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 15334–15342. [Google Scholar]
Zhu, Z.; He, X.; Qi, G.; Li, Y.; Cong, B.; Liu, Y. Brain tumor segmentation based on the fusion of deep semantics and edge information in multimodal MRI. Inf. Fusion 2023, 91, 376–387. [Google Scholar] [CrossRef]

Figure 2. Architectural design of nnSegNeXt. The neural network comprises four encoder layers, four decoder layers, and a bottleneck layer. Additionally, we utilize deep supervision at every decoder layer, accompanied by reduced loss weights at lower resolutions. The dashed box shows the downsampling, convolutional, and upsampling layers. We emphasize that InstanceNorm replaces the original BatchNorm to improve stability.

Figure 3. Illustration of the proposed 3DMSCA. We implement a depth-wise convolution with a kernel size of

l \times m \times n

and d. We extract multiscale features through convolutions and apply them as attention weights to reweigh the input of 3DMSCA.

Figure 3. Illustration of the proposed 3DMSCA. We implement a depth-wise convolution with a kernel size of

l \times m \times n

and d. We extract multiscale features through convolutions and apply them as attention weights to reweigh the input of 3DMSCA.

Figure 4. Qualitative results of performance comparison with state-of-the-art CNN-based models on the (a) HCP, (b) SALD, and (c) IXI datasets. Boxplots showing Dice scores for different brain MR tissues using the proposed nnSegNeXt and existing image registration methods.

Figure 5. Visualization of model performance on HCP, SALD, and IXI. Red indicates gray matter (GM), green indicates white matter (WM), and blue indicates cerebrospinal fluid (CSF). Zoom-in regions are provided below each image.

Table 1. Demographic details and acquisition parameters of the HCP, SALD, IXI, and IBSR datasets.

Scan Parameters	HCP	SALD	IXI	IBSR
Scanner	Siemens Skyra	Siemens TrioTim	Philips Intera	-
Field Strength	3T	3T	1.5T	3T
Sequence	MPRAGE	MPRAGE	MPRAGE	MPRAGE
Voxel Size (mm)	$1.0 \times 1.0 \times 1.0$	$1.0 \times 1.0 \times 1.0$	$1.0 \times 1.0 \times 1.0$	$0.875 \times 1.5 \times 0.875$
TR/TE (ms)	2400/2.14	1900/2.52	9.81/4.60	-
FA (degrees)	8	90	8	-
Number of Scans (Train/Test)	160/40	200/51	179/45	15/3
Age Range (years)	22-35	19-80	7-71	-

Table 2. Performance comparison with other CNN-based models on brain tissue segmentation. Bold text indicates superior performance, with equally performing model metrics both in bold. The upward arrow indicates superior performance with higher numbers, while the downward arrow indicates better performance with lower numbers.

Datasets	Models	GM		WM		CSF		Average
Datasets	Models	Dice↑	HD95↓	Dice↑	HD95↓	Dice↑	HD95↓	Dice↑	HD95↓
HCP	UNet	$0.964 \pm 0.006$	$1.272 \pm 0.473$	$0.978 \pm 0.003$	$0.753 \pm 0.187$	$0.953 \pm 0.010$	$1.684 \pm 0.392$	$0.965 \pm 0.006$	$1.236 \pm 0.351$
	SegNet	$0.966 \pm 0.010$	$1.213 \pm 0.410$	$0.980 \pm 0.003$	$0.726 \pm 0.168$	$0.955 \pm 0.036$	$1.595 \pm 0.367$	$0.967 \pm 0.016$	$1.178 \pm 0.315$
	VoxResNet	$0.980 \pm 0.003$	$0.698 \pm 0.351$	$0.987 \pm 0.003$	$0.625 \pm 0.189$	$0.976 \pm 0.004$	$0.835 \pm 0.251$	$0.981 \pm 0.004$	$0.719 \pm 0.264$
	SegResNet	$0.983 \pm 0.002$	$0.588 \pm 0.208$	$0.989 \pm 0.002$	$0.575 \pm 0.156$	$0.978 \pm 0.005$	$0.758 \pm 0.242$	$0.983 \pm 0.003$	$0.640 \pm 0.202$
	nnUNet	$0.989 \pm 0.001$	$0.425 \pm 0.185$	$0.992 \pm 0.001$	$0.352 \pm 0.147$	$0.986 \pm 0.002$	$0.425 \pm 0.157$	$0.989 \pm 0.001$	$0.401 \pm 0.163$
	nnSegNeXt (ours)	$0.991 \pm 0.001$	$0.300 \pm 0.166$	$0.994 \pm 0.001$	$0.175 \pm 0.138$	$0.990 \pm 0.002$	$0.205 \pm 0.133$	$0.992 \pm 0.001$	$0.227 \pm 0.146$
SALD	UNet	$0.966 \pm 0.011$	$1.245 \pm 0.487$	$0.979 \pm 0.007$	$0.732 \pm 0.223$	$0.974 \pm 0.010$	$0.912 \pm 0.262$	$0.973 \pm 0.009$	$0.963 \pm 0.324$
	SegNet	$0.967 \pm 0.003$	$1.173 \pm 0.312$	$0.981 \pm 0.001$	$0.652 \pm 0.194$	$0.975 \pm 0.006$	$0.876 \pm 0.272$	$0.974 \pm 0.003$	$0.900 \pm 0.259$
	VoxResNet	$0.969 \pm 0.007$	$1.095 \pm 0.299$	$0.982 \pm 0.004$	$0.624 \pm 0.165$	$0.974 \pm 0.007$	$0.912 \pm 0.288$	$0.975 \pm 0.006$	$0.877 \pm 0.251$
	SegResNet	$0.973 \pm 0.006$	$0.947 \pm 0.290$	$0.986 \pm 0.002$	$0.495 \pm 0.173$	$0.974 \pm 0.009$	$0.915 \pm 0.266$	$0.977 \pm 0.006$	$0.786 \pm 0.243$
	nnUNet	$0.977 \pm 0.003$	$0.843 \pm 0.245$	$0.987 \pm 0.004$	$0.463 \pm 0.155$	$0.980 \pm 0.008$	$0.685 \pm 0.172$	$0.981 \pm 0.005$	$0.664 \pm 0.191$
	nnSegNeXt (ours)	$0.984 \pm 0.002$	$0.546 \pm 0.139$	$0.991 \pm 0.001$	$0.346 \pm 0.139$	$0.986 \pm 0.002$	$0.486 \pm 0.134$	$0.987 \pm 0.002$	$0.459 \pm 0.137$
IXI	UNet	$0.958 \pm 0.018$	$1.480 \pm 0.434$	$0.976 \pm 0.008$	$0.856 \pm 0.267$	$0.964 \pm 0.020$	$1.274 \pm 0.301$	$0.966 \pm 0.016$	$1.203 \pm 0.334$
	SegNet	$0.931 \pm 0.068$	$2.437 \pm 0.515$	$0.956 \pm 0.041$	$1.587 \pm 0.479$	$0.943 \pm 0.055$	$2.545 \pm 0.497$	$0.943 \pm 0.055$	$2.190 \pm 0.497$
	VoxResNet	$0.965 \pm 0.014$	$1.320 \pm 0.445$	$0.981 \pm 0.004$	$0.664 \pm 0.164$	$0.968 \pm 0.023$	$1.164 \pm 0.339$	$0.972 \pm 0.013$	$1.049 \pm 0.316$
	SegResNet	$0.969 \pm 0.016$	$1.092 \pm 0.280$	$0.980 \pm 0.010$	$0.721 \pm 0.219$	$0.977 \pm 0.020$	$0.832 \pm 0.276$	$0.975 \pm 0.015$	$0.882 \pm 0.258$
	nnUNet	$0.981 \pm 0.005$	$0.675 \pm 0.162$	$0.988 \pm 0.003$	$0.450 \pm 0.121$	$0.986 \pm 0.005$	$0.439 \pm 0.143$	$0.985 \pm 0.004$	$0.521 \pm 0.142$
	nnSegNeXt (ours)	$0.986 \pm 0.003$	$0.454 \pm 0.140$	$0.991 \pm 0.002$	$0.382 \pm 0.125$	$0.990 \pm 0.003$	$0.261 \pm 0.146$	$0.989 \pm 0.003$	$0.366 \pm 0.137$

Table 3. Performance comparison with other transformer-based models on brain tissue segmentation. Bold text indicates superior performance, with equally performing model metrics both in bold. The upward arrow indicates superior performance with higher numbers, while the downward arrow indicates better performance with lower numbers.

Datasets	Models	GM		WM		CSF		Average
Datasets	Models	Dice↑	HD95↓	Dice↑	HD95↓	Dice↑	HD95↓	Dice↑	HD95↓
HCP	Attention UNet	$0.976 \pm 0.007$	$0.854 \pm 0.266$	$0.986 \pm 0.003$	$0.518 \pm 0.265$	$0.970 \pm 0.009$	$1.058 \pm 0.220$	$0.977 \pm 0.006$	$0.810 \pm 0.250$
	Swin-UNet	$0.984 \pm 0.004$	$0.532 \pm 0.142$	$0.991 \pm 0.002$	$0.357 \pm 0.185$	$0.981 \pm 0.008$	$0.661 \pm 0.195$	$0.985 \pm 0.005$	$0.517 \pm 0.174$
	UNETR	$0.985 \pm 0.003$	$0.525 \pm 0.151$	$0.993 \pm 0.002$	$0.325 \pm 0.178$	$0.979 \pm 0.006$	$0.775 \pm 0.215$	$0.986 \pm 0.004$	$0.542 \pm 0.181$
	TransBTS	$0.978 \pm 0.006$	$0.772 \pm 0.159$	$0.989 \pm 0.008$	$0.368 \pm 0.182$	$0.968 \pm 0.011$	$1.120 \pm 0.330$	$0.978 \pm 0.008$	$0.753 \pm 0.224$
	TABS	$0.986 \pm 0.004$	$0.482 \pm 0.132$	$0.992 \pm 0.006$	$0.324 \pm 0.189$	$0.983 \pm 0.006$	$0.536 \pm 0.202$	$0.987 \pm 0.006$	$0.447 \pm 0.174$
	nnFormer	$0.989 \pm 0.002$	$0.376 \pm 0.130$	$0.994 \pm 0.003$	$0.175 \pm 0.180$	$0.986 \pm 0.004$	$0.425 \pm 0.145$	$0.990 \pm 0.003$	$0.325 \pm 0.152$
	nnSegNeXt (ours)	$0.991 \pm 0.001$	$0.321 \pm 0.125$	$0.994 \pm 0.001$	$0.175 \pm 0.150$	$0.990 \pm 0.002$	$0.336 \pm 0.195$	$0.992 \pm 0.001$	$0.277 \pm 0.157$
SALD	Attention UNet	$0.959 \pm 0.020$	$1.465 \pm 0.310$	$0.977 \pm 0.007$	$0.801 \pm 0.182$	$0.968 \pm 0.017$	$1.132 \pm 0.329$	$0.968 \pm 0.015$	$1.133 \pm 0.274$
	Swin-UNet	$0.973 \pm 0.007$	$1.048 \pm 0.373$	$0.986 \pm 0.004$	$0.518 \pm 0.150$	$0.979 \pm 0.010$	$0.752 \pm 0.171$	$0.979 \pm 0.007$	$0.773 \pm 0.231$
	UNETR	$0.980 \pm 0.005$	$0.716 \pm 0.162$	$0.990 \pm 0.003$	$0.724 \pm 0.175$	$0.984 \pm 0.007$	$0.514 \pm 0.210$	$0.985 \pm 0.005$	$0.651 \pm 0.182$
	TransBTS	$0.962 \pm 0.017$	$1.344 \pm 0.384$	$0.980 \pm 0.007$	$1.315 \pm 0.133$	$0.970 \pm 0.009$	$1.074 \pm 0.312$	$0.971 \pm 0.011$	$1.244 \pm 0.276$
	TABS	$0.978 \pm 0.005$	$0.743 \pm 0.145$	$0.990 \pm 0.003$	$0.318 \pm 0.137$	$0.979 \pm 0.006$	$0.737 \pm 0.185$	$0.982 \pm 0.004$	$0.599 \pm 0.156$
	nnFormer	$0.980 \pm 0.005$	$0.723 \pm 0.152$	$0.991 \pm 0.002$	$0.306 \pm 0.132$	$0.982 \pm 0.006$	$0.637 \pm 0.181$	$0.984 \pm 0.004$	$0.555 \pm 0.155$
	nnSegNeXt (ours)	$0.984 \pm 0.002$	$0.546 \pm 0.135$	$0.991 \pm 0.001$	$0.285 \pm 0.141$	$0.986 \pm 0.002$	$0.486 \pm 0.165$	$0.987 \pm 0.002$	$0.439 \pm 0.147$
IXI	Attention UNet	$0.947 \pm 0.042$	$1.857 \pm 0.353$	$0.967 \pm 0.032$	$1.147 \pm 0.320$	$0.961 \pm 0.024$	$1.417 \pm 0.325$	$0.958 \pm 0.033$	$1.474 \pm 0.333$
	Swin-UNet	$0.969 \pm 0.024$	$1.092 \pm 0.210$	$0.984 \pm 0.016$	$0.512 \pm 0.151$	$0.972 \pm 0.021$	$0.981 \pm 0.298$	$0.975 \pm 0.020$	$0.862 \pm 0.220$
	UNETR	$0.973 \pm 0.012$	$0.948 \pm 0.295$	$0.987 \pm 0.005$	$0.469 \pm 0.148$	$0.976 \pm 0.017$	$0.846 \pm 0.251$	$0.979 \pm 0.011$	$0.754 \pm 0.231$
	TransBTS	$0.961 \pm 0.018$	$1.427 \pm 0.145$	$0.982 \pm 0.004$	$0.624 \pm 0.162$	$0.964 \pm 0.028$	$1.315 \pm 0.335$	$0.969 \pm 0.017$	$1.122 \pm 0.214$
	TABS	$0.976 \pm 0.009$	$0.842 \pm 0.185$	$0.985 \pm 0.005$	$0.516 \pm 0.152$	$0.982 \pm 0.014$	$0.668 \pm 0.173$	$0.981 \pm 0.009$	$0.675 \pm 0.170$
	nnFormer	$0.979 \pm 0.006$	$0.764 \pm 0.176$	$0.989 \pm 0.004$	$0.374 \pm 0.137$	$0.984 \pm 0.008$	$0.575 \pm 0.152$	$0.984 \pm 0.006$	$0.571 \pm 0.155$
	nnSegNeXt (ours)	$0.986 \pm 0.003$	$0.554 \pm 0.150$	$0.991 \pm 0.002$	$0.282 \pm 0.155$	$0.990 \pm 0.003$	$0.361 \pm 0.140$	$0.989 \pm 0.003$	$0.399 \pm 0.148$

Table 4. Generality comparison with other CNN-based models on brain tissue segmentation. The convention HCP → SALD signifies that the dataset HCP is utilized as the training set, while the dataset SALD is deployed for the subsequent inference process. Bold text indicates superior performance, with equally performing model metrics both in bold. The upward arrow indicates superior performance with higher numbers, while the downward arrow indicates better performance with lower numbers.

Projects	Models	GM		WM		CSF		Average
Projects	Models	Dice↑	HD95↓	Dice↑	HD95↓	Dice↑	HD95↓	Dice↑	HD95↓
HCP → SALD	UNet	$0.932 \pm 0.069$	$2.447 \pm 0.451$	$0.955 \pm 0.044$	$1.551 \pm 0.321$	$0.946 \pm 0.032$	$1.832 \pm 0.489$	$0.944 \pm 0.048$	$1.943 \pm 0.420$
	SegNet	$0.937 \pm 0.067$	$2.220 \pm 0.414$	$0.955 \pm 0.051$	$1.544 \pm 0.305$	$0.956 \pm 0.032$	$1.615 \pm 0.476$	$0.949 \pm 0.050$	$1.793 \pm 0.398$
	VoxResNet	$0.950 \pm 0.056$	$1.775 \pm 0.381$	$0.960 \pm 0.050$	$1.446 \pm 0.398$	$0.972 \pm 0.011$	$0.984 \pm 0.263$	$0.961 \pm 0.039$	$1.402 \pm 0.347$
	SegResNet	$0.951 \pm 0.043$	$1.741 \pm 0.375$	$0.963 \pm 0.039$	$1.326 \pm 0.392$	$0.974 \pm 0.009$	$0.973 \pm 0.258$	$0.963 \pm 0.030$	$1.347 \pm 0.342$
	nnUNet	$0.964 \pm 0.024$	$1.272 \pm 0.332$	$0.975 \pm 0.014$	$0.952 \pm 0.245$	$0.975 \pm 0.014$	$0.883 \pm 0.212$	$0.972 \pm 0.017$	$1.036 \pm 0.263$
	nnSegNeXt (ours)	$0.967 \pm 0.021$	$1.168 \pm 0.345$	$0.974 \pm 0.015$	$0.948 \pm 0.232$	$0.982 \pm 0.008$	$0.460 \pm 0.155$	$0.974 \pm 0.015$	$0.859 \pm 0.244$
SALD → HCP	UNet	$0.959 \pm 0.006$	$1.524 \pm 0.492$	$0.966 \pm 0.009$	$1.245 \pm 0.310$	$0.962 \pm 0.007$	$1.364 \pm 0.300$	$0.962 \pm 0.007$	$1.378 \pm 0.367$
	SegNet	$0.976 \pm 0.003$	$0.842 \pm 0.258$	$0.982 \pm 0.002$	$0.624 \pm 0.175$	$0.973 \pm 0.006$	$0.975 \pm 0.285$	$0.977 \pm 0.003$	$0.814 \pm 0.239$
	VoxResNet	$0.965 \pm 0.006$	$1.236 \pm 0.362$	$0.976 \pm 0.004$	$0.898 \pm 0.289$	$0.960 \pm 0.011$	$1.413 \pm 0.320$	$0.967 \pm 0.007$	$1.182 \pm 0.324$
	SegResNet	$0.966 \pm 0.007$	$1.225 \pm 0.367$	$0.974 \pm 0.004$	$0.916 \pm 0.278$	$0.964 \pm 0.011$	$1.272 \pm 0.315$	$0.968 \pm 0.007$	$1.138 \pm 0.320$
	nnUNet	$0.980 \pm 0.003$	$0.715 \pm 0.210$	$0.988 \pm 0.001$	$0.575 \pm 0.134$	$0.972 \pm 0.008$	$0.980 \pm 0.256$	$0.980 \pm 0.004$	$0.757 \pm 0.200$
	nnSegNeXt (ours)	$0.982 \pm 0.002$	$0.624 \pm 0.211$	$0.986 \pm 0.002$	$0.482 \pm 0.150$	$0.981 \pm 0.005$	$0.601 \pm 0.163$	$0.983 \pm 0.003$	$0.569 \pm 0.175$
HCP → IXI	UNet	$0.921 \pm 0.094$	$1.052 \pm 0.293$	$0.946 \pm 0.120$	$1.239 \pm 0.310$	$0.937 \pm 0.049$	$2.231 \pm 0.406$	$0.935 \pm 0.088$	$1.507 \pm 0.336$
	SegNet	$0.929 \pm 0.129$	$1.044 \pm 0.279$	$0.948 \pm 0.132$	$1.222 \pm 0.305$	$0.952 \pm 0.054$	$1.715 \pm 0.390$	$0.943 \pm 0.105$	$1.327 \pm 0.325$
	VoxResNet	$0.939 \pm 0.102$	$1.067 \pm 0.263$	$0.954 \pm 0.118$	$1.202 \pm 0.262$	$0.961 \pm 0.027$	$1.364 \pm 0.275$	$0.951 \pm 0.083$	$1.211 \pm 0.267$
	SegResNet	$0.945 \pm 0.069$	$1.015 \pm 0.254$	$0.963 \pm 0.079$	$1.130 \pm 0.292$	$0.960 \pm 0.026$	$1.464 \pm 0.284$	$0.956 \pm 0.058$	$1.203 \pm 0.277$
	nnUNet	$0.957 \pm 0.045$	$0.954 \pm 0.227$	$0.974 \pm 0.041$	$0.901 \pm 0.234$	$0.965 \pm 0.025$	$1.257 \pm 0.228$	$0.966 \pm 0.037$	$1.037 \pm 0.230$
	nnSegNeXt (ours)	$0.959 \pm 0.053$	$0.936 \pm 0.230$	$0.974 \pm 0.048$	$0.806 \pm 0.189$	$0.969 \pm 0.027$	$1.068 \pm 0.235$	$0.967 \pm 0.043$	$0.937 \pm 0.218$
SALD → IXI	UNet	$0.950 \pm 0.021$	$1.756 \pm 0.391$	$0.968 \pm 0.017$	$1.128 \pm 0.310$	$0.964 \pm 0.026$	$1.264 \pm 0.363$	$0.961 \pm 0.021$	$1.383 \pm 0.355$
	SegNet	$0.965 \pm 0.015$	$1.243 \pm 0.288$	$0.980 \pm 0.007$	$0.814 \pm 0.275$	$0.971 \pm 0.029$	$1.028 \pm 0.285$	$0.972 \pm 0.017$	$1.009 \pm 0.282$
	VoxResNet	$0.957 \pm 0.016$	$1.549 \pm 0.377$	$0.975 \pm 0.009$	$0.885 \pm 0.289$	$0.964 \pm 0.030$	$1.251 \pm 0.299$	$0.965 \pm 0.019$	$1.228 \pm 0.321$
	SegResNet	$0.956 \pm 0.018$	$1.568 \pm 0.374$	$0.976 \pm 0.009$	$0.837 \pm 0.278$	$0.960 \pm 0.031$	$1.416 \pm 0.284$	$0.964 \pm 0.019$	$1.274 \pm 0.187$
	nnUNet	$0.967 \pm 0.012$	$1.153 \pm 0.313$	$0.984 \pm 0.004$	$0.528 \pm 0.234$	$0.967 \pm 0.024$	$1.164 \pm 0.226$	$0.972 \pm 0.013$	$0.948 \pm 0.257$
	nnSegNeXt (ours)	$0.967 \pm 0.016$	$1.144 \pm 0.315$	$0.981 \pm 0.008$	$0.681 \pm 0.250$	$0.974 \pm 0.015$	$0.904 \pm 0.244$	$0.974 \pm 0.013$	$0.910 \pm 0.270$

Table 5. Generality comparison with other transformer-based models on brain tissue segmentation. The convention HCP → SALD signifies that the dataset HCP is utilized as the training set, while the dataset SALD is deployed for the subsequent inference process. Bold text indicates superior performance, with equally performing model metrics both in bold. The upward arrow indicates superior performance with higher numbers, while the downward arrow indicates better performance with lower numbers.

Projects	Models	GM		WM		CSF		Average
Projects	Models	Dice↑	HD95↓	Dice↑	HD95↓	Dice↑	HD95↓	Dice↑	HD95↓
HCP → SALD	Attention UNet	$0.949 \pm 0.060$	$1.812 \pm 0.412$	$0.968 \pm 0.041$	$1.139 \pm 0.315$	$0.962 \pm 0.025$	$1.344 \pm 0.262$	$0.960 \pm 0.042$	$1.432 \pm 0.330$
	Swin-UNet	$0.951 \pm 0.052$	$1.716 \pm 0.385$	$0.963 \pm 0.052$	$1.331 \pm 0.324$	$0.978 \pm 0.017$	$0.792 \pm 0.243$	$0.964 \pm 0.034$	$1.280 \pm 0.317$
	UNETR	$0.954 \pm 0.050$	$1.622 \pm 0.371$	$0.966 \pm 0.042$	$1.246 \pm 0.317$	$0.974 \pm 0.012$	$0.984 \pm 0.216$	$0.965 \pm 0.035$	$1.284 \pm 0.301$
	TransBTS	$0.939 \pm 0.074$	$2.117 \pm 0.452$	$0.953 \pm 0.070$	$1.676 \pm 0.333$	$0.963 \pm 0.020$	$1.348 \pm 0.316$	$0.952 \pm 0.055$	$1.714 \pm 0.367$
	TABS	$0.951 \pm 0.062$	$1.739 \pm 0.362$	$0.963 \pm 0.049$	$1.334 \pm 0.317$	$0.979 \pm 0.016$	$0.761 \pm 0.174$	$0.964 \pm 0.042$	$1.278 \pm 0.284$
	nnFormer	$0.958 \pm 0.035$	$1.488 \pm 0.358$	$0.967 \pm 0.025$	$1.113 \pm 0.305$	$0.979 \pm 0.011$	$0.705 \pm 0.186$	$0.968 \pm 0.024$	$1.102 \pm 0.283$
	nnSegNeXt (ours)	$0.967 \pm 0.021$	$1.164 \pm 0.314$	$0.974 \pm 0.015$	$0.952 \pm 0.045$	$0.982 \pm 0.008$	$0.660 \pm 0.148$	$0.974 \pm 0.015$	$0.925 \pm 0.169$
SALD → HCP	Attention UNet	$0.963 \pm 0.005$	$1.306 \pm 0.328$	$0.974 \pm 0.005$	$0.912 \pm 0.253$	$0.959 \pm 0.010$	$1.452 \pm 0.262$	$0.965 \pm 0.007$	$1.223 \pm 0.281$
	Swin-UNet	$0.971 \pm 0.006$	$1.025 \pm 0.269$	$0.980 \pm 0.004$	$0.694 \pm 0.152$	$0.966 \pm 0.009$	$1.210 \pm 0.285$	$0.972 \pm 0.006$	$0.976 \pm 0.235$
	UNETR	$0.972 \pm 0.005$	$0.984 \pm 0.258$	$0.977 \pm 0.005$	$0.806 \pm 0.135$	$0.975 \pm 0.009$	$0.876 \pm 0.198$	$0.975 \pm 0.006$	$0.889 \pm 0.197$
	TransBTS	$0.958 \pm 0.010$	$1.484 \pm 0.262$	$0.968 \pm 0.008$	$1.135 \pm 0.219$	$0.960 \pm 0.010$	$1.415 \pm 0.256$	$0.962 \pm 0.009$	$1.345 \pm 0.246$
	TABS	$0.973 \pm 0.007$	$0.947 \pm 0.249$	$0.978 \pm 0.009$	$0.772 \pm 0.185$	$0.970 \pm 0.008$	$1.068 \pm 0.274$	$0.974 \pm 0.008$	$0.929 \pm 0.236$
	nnFormer	$0.976 \pm 0.004$	$0.862 \pm 0.263$	$0.984 \pm 0.003$	$0.635 \pm 0.163$	$0.973 \pm 0.007$	$0.948 \pm 0.189$	$0.978 \pm 0.005$	$0.815 \pm 0.205$
	nnSegNeXt (ours)	$0.982 \pm 0.002$	$0.629 \pm 0.194$	$0.986 \pm 0.002$	$0.483 \pm 0.128$	$0.981 \pm 0.005$	$0.640 \pm 0.147$	$0.983 \pm 0.003$	$0.584 \pm 0.156$
HCP → IXI	Attention UNet	$0.936 \pm 0.076$	$2.244 \pm 0.467$	$0.959 \pm 0.110$	$1.401 \pm 0.319$	$0.951 \pm 0.035$	$1.731 \pm 0.378$	$0.948 \pm 0.074$	$1.792 \pm 0.388$
	Swin-UNet	$0.944 \pm 0.102$	$1.931 \pm 0.471$	$0.960 \pm 0.117$	$1.471 \pm 0.347$	$0.965 \pm 0.038$	$1.232 \pm 0.263$	$0.956 \pm 0.074$	$1.545 \pm 0.360$
	UNETR	$0.945 \pm 0.086$	$1.952 \pm 0.453$	$0.962 \pm 0.102$	$1.362 \pm 0.319$	$0.963 \pm 0.031$	$1.382 \pm 0.289$	$0.957 \pm 0.073$	$1.565 \pm 0.354$
	TransBTS	$0.929 \pm 0.107$	$2.508 \pm 0.494$	$0.948 \pm 0.161$	$1.842 \pm 0.391$	$0.953 \pm 0.030$	$1.663 \pm 0.326$	$0.943 \pm 0.100$	$2.004 \pm 0.404$
	TABS	$0.952 \pm 0.057$	$1.728 \pm 0.418$	$0.968 \pm 0.087$	$1.186 \pm 0.313$	$0.967 \pm 0.027$	$1.104 \pm 0.264$	$0.962 \pm 0.057$	$1.339 \pm 0.332$
	nnFormer	$0.953 \pm 0.063$	$1.615 \pm 0.431$	$0.969 \pm 0.071$	$1.132 \pm 0.288$	$0.968 \pm 0.023$	$1.126 \pm 0.279$	$0.963 \pm 0.052$	$1.291 \pm 0.333$
	nnSegNeXt (ours)	$0.959 \pm 0.053$	$1.436 \pm 0.368$	$0.974 \pm 0.048$	$0.906 \pm 0.289$	$0.969 \pm 0.027$	$1.068 \pm 0.258$	$0.967 \pm 0.043$	$1.137 \pm 0.305$
SALD → IXI	Attention UNet	$0.945 \pm 0.052$	$1.915 \pm 0.383$	$0.967 \pm 0.026$	$1.168 \pm 0.338$	$0.958 \pm 0.053$	$1.485 \pm 0.306$	$0.956 \pm 0.043$	$1.523 \pm 0.342$
	Swin-UNet	$0.956 \pm 0.045$	$1.562 \pm 0.374$	$0.969 \pm 0.006$	$1.092 \pm 0.283$	$0.967 \pm 0.032$	$1.165 \pm 0.282$	$0.964 \pm 0.030$	$1.273 \pm 0.313$
	UNETR	$0.957 \pm 0.011$	$1.523 \pm 0.369$	$0.971 \pm 0.008$	$1.025 \pm 0.251$	$0.966 \pm 0.028$	$1.221 \pm 0.247$	$0.965 \pm 0.016$	$1.256 \pm 0.289$
	TransBTS	$0.945 \pm 0.032$	$1.951 \pm 0.432$	$0.969 \pm 0.019$	$1.022 \pm 0.241$	$0.957 \pm 0.029$	$1.527 \pm 0.293$	$0.957 \pm 0.026$	$1.500 \pm 0.322$
	TABS	$0.962 \pm 0.024$	$1.344 \pm 0.276$	$0.981 \pm 0.009$	$0.661 \pm 0.173$	$0.968 \pm 0.028$	$1.102 \pm 0.314$	$0.970 \pm 0.020$	$1.036 \pm 0.254$
	nnFormer	$0.965 \pm 0.019$	$1.236 \pm 0.251$	$0.983 \pm 0.008$	$0.593 \pm 0.152$	$0.971 \pm 0.026$	$1.011 \pm 0.281$	$0.973 \pm 0.018$	$0.947 \pm 0.228$
	nnSegNeXt (ours)	$0.967 \pm 0.016$	$1.163 \pm 0.238$	$0.981 \pm 0.008$	$0.660 \pm 0.183$	$0.974 \pm 0.015$	$0.904 \pm 0.258$	$0.974 \pm 0.013$	$0.909 \pm 0.226$

Table 6. Performance comparison on the IBSR dataset. Bold text indicates superior performance, with equally performing model metrics both in bold. The upward arrow indicates superior performance with higher numbers, while the downward arrow indicates better performance with lower numbers.

Models	GM		WM		CSF		Average
Models	Dice↑	HD95↓	Dice↑	HD95↓	Dice↑	HD95↓	Dice↑	HD95↓
UNet	$0.939 \pm 0.002$	$2.187 \pm 0.103$	$0.914 \pm 0.006$	$1.805 \pm 0.076$	$0.765 \pm 0.005$	$2.276 \pm 0.038$	$0.872 \pm 0.004$	$2.089 \pm 0.073$
SegNet	$0.938 \pm 0.001$	$2.276 \pm 0.038$	$0.914 \pm 0.004$	$1.943 \pm 0.168$	$0.766 \pm 0.001$	$2.138 \pm 0.038$	$0.873 \pm 0.002$	$2.119 \pm 0.082$
VoxResNet	$0.941 \pm 0.001$	$2.187 \pm 0.103$	$0.919 \pm 0.003$	$1.805 \pm 0.076$	$0.781 \pm 0.005$	$1.821 \pm 0.016$	$0.880 \pm 0.003$	$1.938 \pm 0.065$
SegResNet	$0.943 \pm 0.004$	$1.805 \pm 0.076$	$0.922 \pm 0.011$	$1.520 \pm 0.022$	$0.792 \pm 0.005$	$1.715 \pm 0.057$	$0.886 \pm 0.007$	$1.680 \pm 0.052$
nnUNet	$0.943 \pm 0.004$	$1.805 \pm 0.076$	$0.922 \pm 0.011$	$1.715 \pm 0.057$	$0.790 \pm 0.005$	$1.609 \pm 0.076$	$0.885 \pm 0.007$	$1.710 \pm 0.070$
Attention UNet	$0.940 \pm 0.001$	$1.805 \pm 0.076$	$0.920 \pm 0.003$	$1.414 \pm 0.000$	$0.769 \pm 0.009$	$2.049 \pm 0.079$	$0.876 \pm 0.004$	$1.756 \pm 0.052$
Swin-UNet	$0.942 \pm 0.002$	$1.813 \pm 0.022$	$0.921 \pm 0.004$	$1.614 \pm 0.037$	$0.785 \pm 0.006$	$1.834 \pm 0.067$	$0.883 \pm 0.004$	$1.754 \pm 0.042$
UNETR	$0.941 \pm 0.001$	$1.911 \pm 0.016$	$0.918 \pm 0.004$	$1.715 \pm 0.057$	$0.787 \pm 0.003$	$1.715 \pm 0.057$	$0.882 \pm 0.003$	$1.781 \pm 0.044$
TransBTS	$0.942 \pm 0.003$	$1.805 \pm 0.076$	$0.922 \pm 0.008$	$1.520 \pm 0.022$	$0.786 \pm 0.001$	$1.959 \pm 0.103$	$0.884 \pm 0.004$	$1.761 \pm 0.067$
TABS	$0.941 \pm 0.004$	$1.913 \pm 0.048$	$0.921 \pm 0.009$	$1.492 \pm 0.041$	$0.790 \pm 0.005$	$1.729 \pm 0.081$	$0.884 \pm 0.005$	$1.711 \pm 0.057$
nnFormer	$0.941 \pm 0.003$	$1.805 \pm 0.076$	$0.920 \pm 0.009$	$1.488 \pm 0.038$	$0.794 \pm 0.006$	$1.626 \pm 0.022$	$0.885 \pm 0.006$	$1.610 \pm 0.046$
nnSegNeXt (ours)	$0.944 \pm 0.005$	$1.715 \pm 0.057$	$0.922 \pm 0.015$	$01.488 \pm 0.119$	$0.796 \pm 0.009$	$1.626 \pm 0.022$	$0.887 \pm 0.010$	$1.569 \pm 0.066$

Table 7. Performance comparison with nnUNet. Bold text indicates superior performance, with equally performing model metrics both in bold. The upward arrow indicates superior performance with higher numbers, while the downward arrow indicates better performance with lower numbers.

Datasets	Models	Average		Meandiff	p-Values
Datasets	Models	Dice↑	HD95↓	Meandiff	p-Values
HCP	nnUNet	$0.989 \pm 0.001$	$0.401 \pm 0.163$	0.0034 (Dice)	0.0002 (Dice)
HCP	nnSegNeXt	$0.992 \pm 0.001$	$0.227 \pm 0.146$	−0.174 (HD95)	0.0045 (HD95)
SALD	nnUNet	$0.981 \pm 0.005$	$0.664 \pm 0.191$	0.0055 (Dice)	0.0001 (Dice)
SALD	nnSegNeXt	$0.987 \pm 0.002$	$0.459 \pm 0.137$	−0.205 (HD95)	0.0001 (HD95)
IXI	nnUNet	$0.985 \pm 0.004$	$0.521 \pm 0.142$	0.0041 (Dice)	<0.005 (Dice)
IXI	nnSegNeXt	$0.989 \pm 0.003$	$0.366 \pm 0.137$	−0.155 (HD95)	<0.005 (HD95)

Table 8. Impact of the different modules used in nnSegNeXt. Bold text indicates superior performance. nnSegNeXt w/o

L_{Data}

means nnSegNeXt without the data quality loss. nnSegNeXt w/o 3DMSCA denotes the replacement of 3DMSCA with the convolutional layer in Stages 3, 4, and 5. nnSegNeXt w/o Conv denotes the replacement of the convolutional layer with 3DMSCA in Stages 1 and 2.

Table 8. Impact of the different modules used in nnSegNeXt. Bold text indicates superior performance. nnSegNeXt w/o

L_{Data}

means nnSegNeXt without the data quality loss. nnSegNeXt w/o 3DMSCA denotes the replacement of 3DMSCA with the convolutional layer in Stages 3, 4, and 5. nnSegNeXt w/o Conv denotes the replacement of the convolutional layer with 3DMSCA in Stages 1 and 2.

Architecture	HCP	SALD	IXI
nnSegNeXt w/o 3DMSCA and $L_{Data}$	0.985	0.978	0.981
nnSegNeXt w/o $L_{Data}$	0.991	0.985	0.986
nnSegNeXt w/o Conv and $L_{Data}$	0.989	0.983	0.985
nnSegNeXt w/o 3DMSCA	0.989	0.982	0.983
nnSegNeXt	0.992	0.987	0.989

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, Y.; Song, C.; Ning, X.; Gao, Y.; Wang, D. nnSegNeXt: A 3D Convolutional Network for Brain Tissue Segmentation Based on Quality Evaluation. Bioengineering 2024, 11, 575. https://doi.org/10.3390/bioengineering11060575

AMA Style

Liu Y, Song C, Ning X, Gao Y, Wang D. nnSegNeXt: A 3D Convolutional Network for Brain Tissue Segmentation Based on Quality Evaluation. Bioengineering. 2024; 11(6):575. https://doi.org/10.3390/bioengineering11060575

Chicago/Turabian Style

Liu, Yuchen, Chongchong Song, Xiaolin Ning, Yang Gao, and Defeng Wang. 2024. "nnSegNeXt: A 3D Convolutional Network for Brain Tissue Segmentation Based on Quality Evaluation" Bioengineering 11, no. 6: 575. https://doi.org/10.3390/bioengineering11060575

APA Style

Liu, Y., Song, C., Ning, X., Gao, Y., & Wang, D. (2024). nnSegNeXt: A 3D Convolutional Network for Brain Tissue Segmentation Based on Quality Evaluation. Bioengineering, 11(6), 575. https://doi.org/10.3390/bioengineering11060575

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

nnSegNeXt: A 3D Convolutional Network for Brain Tissue Segmentation Based on Quality Evaluation

Abstract

1. Introduction

2. Method

2.1. The Proposed Segmentation Framework

2.1.1. The Preprocessing Stage

2.1.2. The Network Training Stage

2.2. Network Architecture

2.3. 3D Multiscale Convolutional Attention Module

Loss Function

3. Experiments

3.1. Datasets

3.2. Evaluation Metrics

3.3. Implementation Details

3.4. Results

3.4.1. Model Performance

3.4.2. Model Generality

3.5. Validation on IBSR Dataset

3.5.1. Comparison with nnUNet

3.5.2. Ablation Study

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI