Abstract
Pulmonary lobe segmentation is vital for clinical diagnosis and treatment. Deep neural network-based pulmonary lobe segmentation methods have seen rapid development. However, there are challenges that remain, e.g., pulmonary fissures are always not clear or incomplete, especially in the complex situation of the trilobed right pulmonary, which leads to relatively poor results. To address this issue, this study proposes a novel method, called nmPLS-Net, to segment pulmonary lobes effectively using nmODE. Benefiting from its nonlinear and memory capacity, we construct an encoding network based on nmODE to extract features of the entire lung and dependencies between features. Then, we build a decoding network based on edge segmentation, which segments pulmonary lobes and focuses on effectively detecting pulmonary fissures. The experimental results on two datasets demonstrate that the proposed method achieves accurate pulmonary lobe segmentation.
Keywords:
pulmonary lobe segmentation; neural memory ordinary differential equation; multi-task learning MSC:
68T07
1. Introduction
Pulmonary lobe segmentation plays a crucial role in clinical practice, including assisting in diagnosing pulmonary diseases and planning lung surgeries. Relying on manual pulmonary lobe segmentation is time-consuming and labor-intensive. Furthermore, manual segmentation results can be influenced by subjective factors. Thus, an effective automatic method is vital to ensure high efficiency and repeatability.
Although automatic pulmonary lobe segmentation algorithms have made significant progress, there are still several challenges in their practical clinical application. Regarding right pulmonary lobe segmentation, it comprises three lobes and is more complex than the left ones. Specifically, the right middle lobe is sandwiched between two other lobes. Moreover, segmenting pulmonary fissures always exhibit another challenge for unclear or incomplete fissures. The existing methods rarely consider the crucial aspect of learning correlations between features and cannot effectively extract the overall information between samples, which is essential for resolving the abovementioned issue.
To address these issues, this study proposes a novel pulmonary lobe segmentation method using nmODE [], called nmPLS-Net. Firstly, this method benefits from the robust nonlinear expressive and memory capabilities of nmODE, and the constructed model can extract more complex and rich pulmonary features to support the precise segmentation of pulmonary lobes. Then, due to the significant challenges in identifying pulmonary fissures, nmPLS-Net leverages a fissure-enhanced associative learning approach, focusing on learning detailed features of pulmonary fissures. Finally, this study utilizes two datasets to evaluate the proposed method’s effectiveness. The experimental results demonstrate that the proposed method achieves competitive results compared to other state-of-the-art methods. The main contributions of this study can be summarized as follows:
- The main contribution of this study is the combination of nmODE with convolutional neural networks in the task of pulmonary lobe segmentation. This combination leverages nmODE’s high nonlinearity and memory capabilities better to identify global pulmonary features and the relationships between these features;
- This study employs a novel fissure-enhanced associative learning approach to make the model focus on detail identification near pulmonary fissures;
- This study conducted experiments on two datasets, and our network achieved competitive overall segmentation results, significantly outperforming previous works in segmenting the right middle pulmonary lobe.
2. Related Works
2.1. Medical Image Processing
Deep learning has found widespread application in medical image processing due to its simplicity and efficiency. Early deep learning image segmentation networks have used fully convolutional networks (FCNs) [], demonstrating good performance due to their ability to extract hierarchical features and end-to-end processing of inputs of arbitrary sizes. However, an FCN’s limitation lies in the fact that it under-utilizes features at different resolutions obtained from different layers, especially in large-resolution medical images. U-Net [], with its multi-level encoder-decoder architecture and skip connections, addresses the FCN’s limitations and has gained popularity in medical image processing. Shoaib et al. [] used several different state-of-the-art convolutional neural networks to perform left ventricle segmentation, achieving commendable results. Zhao [] and his team conducted a detailed investigation and analysis of deep learning methods for a neuroimaging diagnosis of Alzheimer’s disease.
The introduction of the transformer [] architecture had a profound impact on the field of deep learning. Based on the transformer, the vision transformer (ViT) has found widespread application in image processing. The swin-transformer [] proposed by Liu et al. addressed the computational challenges associated with ViT in image processing by incorporating a sliding window and partial attention mechanism. The swin-transformer v2 [] further improved upon the swin transformer. TransUNet [], developed by Chen et al., combines the strengths of U-Net and the transformer to process medical images. Shen et al. [] proposed a boundary-guided transformer that can simultaneously identify rectum and tumor regions in sagittal MR images of rectal cancer patients. For a comprehensive overview of related research, Liu et al. offered a survey []. Transformer-based approaches have shown superior performance in many aspects. However, due to its huge parameters, traditional 3D convolutional neural networks are currently more practical in this study.
Pulmonary lobe segmentation typically uses the entire CT scan as input. Three-dimensional U-Net structures are commonly employed for this task, such as V-Net []. Based on V-Net, Imran et al. [] introduced a progressive dense V-Net, enhancing performance through feature fusion. Ferreira et al. [] proposed FRV-Net, utilizing regularization techniques to achieve better segmentation results, even with limited data. Tang et al. [] employed a mixed loss function to enhance the model’s focus on challenging pixels in pulmonary lobes. While the previously mentioned methods have achieved good results, they often have a large number of parameters. To obtain a more efficient network, Lee et al. [] introduced an efficient PLS-Net for pulmonary lobe segmentation. They leveraged 3D depth-wise separable convolution to effectively reduce parameters while employing dilated residual modules to enhance the network’s receptive field without increasing the parameter count. Despite significantly reducing the model’s parameter count and improving the receptive field, PLS-Net exhibits subpar performance at the pulmonary lobe edges and with incomplete pulmonary fissures. To address the issue of incomplete fissures, Xie et al. [] harnessed relational modeling for pulmonary lobe segmentation, effectively capturing relationships and contextual information among pulmonary lobes. However, this approach comes at the cost of a substantially higher parameter count than PLS-Net, resulting in relatively slower computational speeds due to its intricate process. Fan et al. [] introduced a learnable interpolation and extrapolation network designed to handle incomplete pulmonary fissures and enhance segmentation performance. Nevertheless, this method necessitates five processing stages, including lung segmentation, pulmonary fissure segmentation, pulmonary fissure completion, pulmonary lobe segmentation, and post-processing, significantly diminishing its efficiency. Liu et al. [] took the enhancement of PLS-Net a step further with RPLS-Net by incorporating pulmonary lobe boundary segmentation as an auxiliary task, thus focusing on challenging edge pixels. This approach achieved superior results compared to PLS-Net without significantly increasing the parameter count. However, it is worth noting that RPLS-Net cannot learn correlations between features and heavily relies on an expanded receptive field to gather global information. This limitation leads to relatively poorer performance on the right middle pulmonary lobe compared to other pulmonary lobes.
2.2. Neural Ordinary Differential Equations
In recent years, deep learning has made remarkable strides across various applications. One of the most renowned neural networks in the realm of deep learning is Res-Net [], celebrated for its robustness due to the incorporation of residuals. Research conducted by Haber et al. [] has revealed that Res-Net can be viewed as a discrete counterpart of certain differential equations. The concept of neural ordinary differential equations (NODEs) was introduced by Chen et al. [], shifting the paradigm from discrete neural networks to continuous differential equations. The core concept in Res-Net, the residual, can be likened to a discrete differential equation form. Denote the output of the l-th layer in a neural network as , conventionally. In that case, the output of the subsequent layer () can be represented as , where f encompasses the computational processes of a single layer, including linear transformations, regularization, and nonlinear activation functions. In the case of Res-Net, the output of the -th layer is defined as . If considering l as time step t and interpreting the output as a function evolving over time steps, with the step interval of 1 being analogous to a time interval , the model’s computations can be conceptualized as the equation . As the number of layers in the model approaches infinity, i.e., as tends to zero, it arrives at the NODE expression, a differential equation illustrated in Equation (1). Obtaining the exact solution to this equation is nearly impossible. In practice, by providing an initial value to the abovementioned differential equation, an ODE solver library can be leveraged to approximate the solution as .
Compared to traditional neural networks, neural ordinary differential equations exhibit more nonlinearity and fewer parameters. Subsequently, many related studies have emerged. Dupont et al. [] pointed out that the NODE, when using the data input as the initial value, preserves the spatial structure of the input, and the authors provided examples illustrating the NODE’s limitations in this regard. However, this issue can be addressed by inputting the data through external input variables instead of using them as initial values for the differential equation []. For more related methods, please refer to Refs. [,].
3. Methods
3.1. Problem Formulation
The main task of this study is a multi-class segmentation problem. The input of the constructed model is a whole CT image, denoted by , and the labels are denoted by . , , and denote the depth, height, and width in an input image, respectively. denotes the constructed model. and denote the output of the main and fissure-enhanced associative learning task. denotes the loss function to optimize the constructed model.
3.2. Model
The architecture of the proposed model in this study was inspired by PLS-Net [] and RPLS-Net []. Figure 1 illustrates the workflow of the proposed method. The reason for choosing PLS-Net as the backbone network for this study is that, compared to commonly used backbone networks for pulmonary lobe segmentation, such as V-Net, PLS-Net has fewer parameters and yields better results. Additionally, using the same backbone network and dataset as in Ref. [] ensures a fair and accurate comparison of the performance of the nmODE and FEAL modules.
Figure 1.
The detailed structure of nmPLS-Net, the modules of the backbone network encoder, and the detailed structure of the nmODE module are displayed on the right. The color coding for each module is as shown in the figure. Within the DRDB module, DRDB × 1, DRDB × 2, and DRDB × 4 represent one, two, and four consecutive DRDB modules, respectively. Encoder1, Encoder2, and Encoder3 in the figure are denoted as , , and in this paper. Similarly, Decoder1, Decoder2, and Decoder3 are denoted as , , and .
The model inputs the original CT image and then produces pulmonary lobe result and fissure-enhanced associative learning result . This process can be represented as .
The model’s encoder consists of three hierarchical stages, denoted as , , and . Similarly, the decoder comprises three corresponding stages, namely , , and , and a classifier for generating results. Skip connections connect the same stage.
The model’s encoder comprises Dilated Separate Convolutions (DS Conv) and Dilated Residual Dense Block (DRDB). In the case of dilated separate convolution, its principle is similar to regular depth-wise separable convolution []. Additionally, dilated separate convolution employs dilated convolutions [] to increase the receptive field, thus enlarging the receptive field without increasing the parameter count. The Dilated Residual Dense Block, written as DRDB for simplicity, used in the model consists of dilated separate convolutions and their structure, as shown in Figure 1. This module combines the results of convolutions with different receptive fields in a residual manner to achieve a wide range of receptive field combinations. In each encoder stage, the dilation rate for the dilated separate convolution is set to 1, indicating no dilation, and the convolution stride is set to 2 for down-sampling. In the DRDB module, the dilation rates for the four dilated separate convolutions are set from top to bottom as , as recommended in []. The input for each encoder stage is concatenated with the original image down-sampled to the corresponding size to compensate for the information loss during down-sampling. In stage , the nmODE module is introduced to process the features extracted by its DRDB module. The input to the nmODE module is the output feature of the DRDB module, and the output is the solution of the differential equation in the nmODE module. The following section will provide a detailed description of the nmODE module.
The model’s decoder consists of Dilated Separate Convolutions (DS Conv) and Depth-wise Separable Deconvolutions (DS DeConv). The structure of dilated separate convolution is the same as that used in the encoder, with a dilation rate of 1. Depth-wise separable deconvolution employs deconvolution to perform depth-wise separable convolution for up-sampling. The outputs from the corresponding layers of the encoder, transmitted through skip connections, are used to supplement the missing feature information after up-sampling. The classifier comprises two convolution modules responsible for generating segmentation results from model outputs. Within the classifier, a fissure-reinforcement method is introduced, with further information available in Section 3.4.
3.3. Neural Memory Ordinary Differential Equation (nmODE)
Neural differential equations exhibit a higher degree of nonlinearity and better robustness, and the architecture of the NODEs is depicted in Figure 2a. However, regular NODE has some limitations, such as their inability to represent mappings such as . This limitation primarily arises since NODE treats data as initial values, which preserves the spatial structure of the input data []. Furthermore, differential equations can describe dynamical systems, and research suggests that attractors within dynamical systems are considered to be associated with memory capacity [,]. However, traditional neural differential equations often fail to harness the memory capacity offered by attractors.
Figure 2.
Architecture of NODE and nmODE. (a) represents the typical NODE architecture, where the initial value of the differential equation, denoted as , is derived from the data itself or the output of the previous layer, denoted as . The output is the numerical solution of the differential equation, denoted as . (b) represents the nmODE Architecture, where the initial value of the differential equation, denoted as , is a fixed value of 0. Data or the output of the previous layer serves as an external input into the module. The output is the same as in typical NODE, which is the numerical solution of the differential equation.
Yi [] proposed a new type of neural differential equation, called the neural memory differential equation (nmODE), which not only addresses the inherent limitations of traditional differential equations but also fully harnesses the memory capacity offered by dynamical systems. The nmODE treats the input data as external parameters rather than utilizing them as the initial conditions for the ordinary differential equation while employing a fixed initial value. By doing so, nmODE not only avoids the problem mentioned above but also separates the functionality of neurons into learning and memory components. Learning only occurs in the learning part, while the memory part is responsible for mapping the input to its global attractor, establishing a mapping from the input space to the memory space []. The structure of nmODE is illustrated in Figure 2b, where represents external input, and the learning part occurs within the transformation . In our model, represents the output of the DRDB module within the encoder stage, labeled as . The formula representation of the nmODE architecture is shown in Equation (2). Based on the nmODE architecture, Yi [] also proposed a novel and efficient implementation, which is structured as shown in Equation (3). Furthermore, Yi demonstrated in Ref. [] that a key property of this implementation is that it has a unique global attractor for each input, defining a mapping that transforms an external input x into an output within the memory space.
In pulmonary lobe segmentation, the right lung with three lobes is more challenging than the left lung, and cases with unclear or incomplete boundaries between pulmonary lobes can make segmentation even more difficult. Human annotators rely on the overall pulmonary structure and dependencies between objects when performing manual segmentation. Similarly, learning better global features and capturing dependencies between features can help address these challenges for neural network models.
In this study, an nmODE module was embedded in the encoder stage to further refine the features extracted by the encoder. The powerful nonlinear expressive capability of nmODE enables effective learning of feature representations in complex conditions. Embedding the nmODE module in PLS-Net leverages nmODE’s memory capacity and strong nonlinear capabilities to obtain better global features and learn the relationships between features. Across different samples, similar features are mapped to nearby attractors in the nmODE memory space, allowing nmODE to memorize similar information between samples and enabling the model to capture overall pulmonary characteristics. The nmODE module can simultaneously learn and remember dependencies between features during the mapping process. For individual samples, nmODE’s high nonlinearity and memory capabilities refine and fuse information from different channels of features more effectively, resulting in more global feature information. The primary reasons for placing the nmODE module in the last stage of the encoder are two-fold. Firstly, the lowest-level stage boasts the largest channel dimension and the most expansive receptive field. A higher channel dimension facilitates a more comprehensive exploration of profound connections among features. Moreover, a larger receptive field diminishes interference from superfluous fine-grained details and steers the input features toward a more globally contextualized state within the nmODE module. Second, it is for performance considerations. The lowest-level output has the lowest resolution, and placing the nmODE module here does not significantly increase memory usage or reduce model computation speed during training.
3.4. Fissure-Enhanced Associative Learning
Previous work [] proposed an auxiliary task involving segmenting the entire pulmonary lobe edge. Complete pulmonary segmentation is relatively straightforward, and our observations during training have shown that the network tends to distinguish the outer pulmonary contour first naturally. So, emphasizing the model’s attention on external boundaries is unnecessary. Instead, this study aim to direct the model’s focus toward the information within the pulmonary fissures, which are more challenging areas.
This study introduces a novel approach called Fissure-Enhanced Associative Learning (FEAL) to encourage the model to focus on voxels near pulmonary fissures and integrate pulmonary fissure information during classification. The associated loss is computed using the pulmonary fissure segmentation results as a secondary model output. This integration enables the model to identify voxels corresponding to pulmonary fissure locations accurately. Within the classifier, the FEAL module merges the pulmonary fissure segmentation output with the decoder’s output, employing a convolutional layer to combine the fissure information with the decoder’s output features and make corresponding predictions. Previously, in multitask learning approaches for pulmonary lobe segmentation, the model primarily relied on auxiliary tasks to encourage the decoder to prioritize pulmonary fissures. However, during the final classification step, the classifier independently handled pulmonary fissure segmentation. In contrast, our fissure-enhanced associative learning approach incorporates classification information during the classification process, further associating decoder features with pulmonary fissure information.
A simple and efficient algorithm can be used to automatically generate pulmonary fissure labels from the label mask of CT images without any manual intervention. Thanks to the fact that the values in CT label masks are limited to a few categories (e.g., a pulmonary lobe segmentation mask may have only 6 classes, with values ranging from 0 to 5 as integers), we determine whether a point belongs to the pulmonary fissure by sliding a window and verifying whether the values inside the window belong to two neighbor target categories simultaneously. The process is outlined in pseudocode in Algorithm 1. The pulmonary fissure segmentation results obtained by the algorithm are shown in Figure 3. This algorithm only needs to scan the image once and has a time complexity of , assuming the input size is . Compared to the methods used in previous work, which involved image processing with the scikit-image library, Gaussian filtering, and thresholding, this algorithm is simpler and more efficient. Additionally, it can mitigate the adverse effects of partial mislabeling. For example, in Figure 3, a small portion of the right upper lobe is incorrectly labeled as the left lower lobe, which is ignored in the generated pulmonary fissure mask.
| Algorithm 1 Pulmonary Fissure Label Generation Algorithm |
|
Figure 3.
(a) Lung CT image and its mask, with an annotation error circled in blue. (b) Pulmonary fissures separated by Algorithm 1. In this visual representation, distinct lung regions are color-coded: yellow for the right upper lobe, green for the right middle lobe, brown for the right lower lobe, blue for the left upper lobe, red for the left lower lobe, and white for pulmonary fissures.
3.5. Loss Functions
The loss function used for training is a mixed loss function consisting of two components: for calculating pulmonary lobe segmentation loss and for calculating the FEAL method loss. A hybrid loss function consisting of dice loss and focal loss is used as the primary loss . The calculation formula for the in the hybrid loss is as in Equation (4). In this equation, C represents the number of pulmonary lobe categories, and N represents the number of points in a single sample. and , respectively, denote the ground truth and the probability of point being predicted as category . is the smoothing coefficient; in this study, the value is set to 1. The dice loss is calculated separately for each category. However, there is a significant difference in the number of foreground and background pixels for each category. Therefore, the focal loss is introduced to balance the positive and negative samples. The formula for the used in the hybrid loss is as in Equation (5). In this equation, C, N, , are the same as in Equation (4). represents the weight for each category, and is a hyperparameter. is as shown in Equation (6), where and are the weight coefficients for the two types of losses.
Since positive samples account for only a small portion of the pulmonary fissure labels, leading to severe class imbalance, the focal loss is used as the loss function for the fissure-enhanced associative learning output to address this issue. The formula is as in Equation (7). In this context, and represent the predictions and ground truth for pulmonary fissure, while and represent the predictions and ground truth for each pixel. and are hyperparameters.
The overall loss function is as in Equation (8), where represents the weight of the fissure-enhanced associative learning loss function.
4. Experiments
4.1. Experimental Settings
Dataset: A base dataset consistent with Ref. [] was used in this study, which comprises 32 chest CT scan images with a maximum slice thickness of 1 mm. The pulmonary fissure labels used for fissure-enhanced associative learning were generated from pulmonary lobe labels using the fissure label generation algorithm. Experiments were conducted on another publicly available LUNA16 dataset, as presented in Ref. [], which contains 50 cases with unclear or incomplete pulmonary fissures and lesions to validate the effectiveness of our method. All CT data were resized to a (1 mm, 1 mm, 1 mm) spacing and filtered to include voxels with Hounsfield Unit (HU) values between 400 and 1000. The data were then normalized. A 20-voxel-wide region of uninformative background near each CT image’s boundary was cropped to improve training efficiency. For example, a CT of size was cropped to after this process.
Model and Training: Our model and training code were implemented using the PyTorch library []. The code of the model can be found at https://github.com/EdewagaPoe/nmPLS-Net-Segmenting-Pulmonary-Lobes-using-nmODE.git accessed on 4 November 2023. We employed PyTorch’s built-in automatic mixed precision training (AMP) to save GPU memory during training. Due to the non-uniform input sizes, the batch size was fixed at 1. The optimizer used was the Adam optimizer [] with a learning rate of and a weight decay of . The dilated residual convolution module of the model used dilation rates and growth rates of and 12, respectively, following the recommendations in Ref. []. The hyperparameters in the overall loss function, namely , , and , were set to , , and , respectively. and were both set to 1, and and were set to 2. The training was performed on an NVIDIA TITAN RTX GPU with 24 GB of VRAM. We trained the model for 100 epochs and selected the best-performing result on the validation set for testing.
Evaluation Metrics: This study used the dice coefficient (DC) as an evaluation metric for segmentation performance, as shown in Equation (9). Here, represents the output processed through the argmax function, denotes the L1 Norm, N is the total number of voxels in the output, is the predicted value for voxel i, and is the corresponding ground truth. The Jaccard index is also used as an evaluation metric for segmentation performance, as shown in Equation (10).
The average symmetric surface distance (ASSD) is used as another evaluation metric to measure the segmentation accuracy of pulmonary lobe boundaries. The formula for ASSD is expressed as in Equation (11). In this equation, v represents any voxel, S represents the set of surface voxels, denotes the L2 Norm, and represent the sets of surface voxels for and , respectively, and indicates the number of elements in a set.
4.2. Quantitative Experiment Results
This study conducted experiments on the base dataset using three networks: PLS-Net, RPLS-Net, and nmPLS-Net. The specific experimental results are shown in Table 1a. In this table and the subsequent tables, UR, MR, LR, UL, and LL represent the metric of upper right lobe, middle right lobe, lower right lobe, upper left lobe, and lower left lobe, respectively. The average dice coefficient achieved by nmPLS-Net is 0.9570, the average ASSD is 1.1133, the average precision is 0.9578, the average recall is 0.9615, and the average Jaccard index is 0.9185, surpassing the compared networks. From the experimental results, it can be seen that nmPLS-Net exhibits a significant improvement in overall segmentation performance, especially with a substantial enhancement in the segmentation of the right middle pulmonary lobe. At the same time, nmPLS-Net has also achieved the optimal ASSD value, which means that the network’s segmentation results along the pulmonary lobe edges are more precise. This performance improvement can be attributed to two key factors: Firstly, the nmODE module’s increased memory capacity and high nonlinearity enhance the network’s capability to capture global features and learn dependencies between features. This enables the model to predict results from a more comprehensive and interrelated perspective. Secondly, the fissure-enhanced associative learning approach concentrates the model’s training on voxels near pulmonary fissures and associates them with the fissure information, resulting in an enhanced segmentation performance for challenging fissure voxels. We also trained and validated on another publicly available dataset in []. As shown in Table 1b, the results are consistent with those on the base dataset. Our network still achieves the best performance in both overall segmentation and the segmentation of the right middle lobe, demonstrating the effectiveness and reliability of our approach.
Table 1.
Comparison of results on different datasets. UR, MR, LR, UL, and LL represent the metric of upper right lobe, middle right lobe, lower right lobe, upper left lobe, and lower left lobe, respectively. The best results in subtables are highlighted with red, and the second best results are highlighted with blue.
Furthermore, this study conducted training on the training set of the base dataset and subsequently validated the model using 20 randomly selected CT scans from the LUNA16 dataset, as presented in Table 2. Notably, nmPLS-Net continues to exhibit superior performance in the validation results, particularly in the right lung. While the overall performance decreases compared to directly training on the LUNA16 dataset, several factors may contribute to this phenomenon. Firstly, our training dataset comprises only 20 CT scans, in contrast to the 40 CT scans in the training set of the LUNA16 dataset, potentially impacting the model’s generalization ability. Secondly, inconsistencies in data sources and the subjective nature of manual annotations have led to disparities in data distribution between the training and validation sets, contributing to the observed performance drop. However, it is worth highlighting that our model demonstrates resilient performance under these circumstances, especially in segmenting the right middle lobe. This result underscores the enhancing effect of nmPLS-Net’s memory capabilities and the memorized features on the model’s generalization and robustness.
Table 2.
Comparison between the target method and previous work training on the base dataset and testing on 20 LUNA16 dataset’s CT volumes. The best results in the table are highlighted with red, and the second best results are highlighted with blue.
4.3. Ablation Experiment
Ablation experiments were conducted to analyze the contributions of various components in the methodology for performance improvement. The results of the ablation experiments are shown in Table 3. In cases where the nmODE module and fissure-enhanced associative learning were not used, the model reverted to PLS-Net, so separate experiments for this scenario were not conducted again. It can be observed that both the nmODE module and fissure-enhanced associative learning contribute to the segmentation performance of the model. The nmODE module, in particular, exhibits the most significant improvement in segmentation performance, with the performance boost primarily evident in the right middle lobe.
Table 3.
Ablation experiments, trained and tested on the base dataset. The right symbols at the first two columns represent the use of the corresponding module. FEAL represents fissure-enhanced associative learning. The best results in the table are highlighted with red, and the second best results are highlighted with blue.
Although adding fissure-enhanced associative learning on top of the nmODE module does not substantially improve overall segmentation performance, the enhancement in the segmentation performance of the right middle lobe is notable, indicating that fissure-enhanced associative learning does have an effect, aligning with the purpose of their incorporation into our methodology.
The results of the ablation experiments on the loss functions are presented in Table 4. Only the mean values of each metric are listed to simplify the table. The experimental results indicate that using a mixed loss function for pulmonary lobe segmentation tasks can lead to some improvement, although it has minimal overall impact. It can also be observed that dice or cross-entropy, as the loss function for the FEAL module, is less effective than not using the FEAL module. This is primarily due to the extreme class imbalance in the labels used by FEAL, where both dice and cross-entropy struggle to effectively calculate the loss contributions of the sparsely represented positive samples.
Table 4.
Ablation experiments for the loss function, trained and tested on the base dataset. mDC, mASSD, mPrecision, mRecall and mJaccard-Index are the mean dice coefficient (DC), mean average symmetric surface distance (ASSD), mean precision, mean recall, and mean Jaccard index, respectively. The content within the brackets in the table header represents (, ). The best results in the table are highlighted with red, and the second best results are highlighted with blue.
4.4. Qualitative Analysis
Quantitative evaluation validates the effectiveness of nmPLS-Net in segmenting right pulmonary lobes and in case of unclear pulmonary fissures. Figure 4 showcases the segmentation outcomes achieved by our network and the comparative networks across five distinct CT scans. In the five CT cases, the first CT represents a normal lung with intact and clear pulmonary fissures. In the remaining four CT cases, partial or complete unclear pulmonary fissures are present. It can be observed that in the CTs of rows 2, 3, and 5, nmPLS-Net consistently demonstrates superior performance. In contrast, the other networks exhibit poor performance, with some results displaying noticeable segmentation errors. While the segmentation results in the CT scans of the third row are generally suboptimal, nmPLS-Net still provides the closest approximation to the ground truth. This case underscores nmPLS-Net’s adeptness at effectively handling scenarios with unclear or incomplete pulmonary fissures.
Figure 4.
The segmentation results of three CT scans in nmPLS-Net and the comparative networks. In this visual representation, distinct lung regions are color-coded: yellow for the right upper lobe, green for the right middle lobe, brown for the right lower lobe, blue for the left upper lobe, and red for the left lower lobe. The first column represents the original images, and the subsequent four columns represent the segmentation results for the ground truth, PLS-Net, RPLS-Net, and nmPLS-Net, respectively.
To further substantiate the efficacy of nmODE, an analysis was conducted on feature maps generated by the model using six CT scans. These feature maps included those with dimensions before entering the nmODE module, as well as those with the same dimensions after passing through the nmODE module. Principal component analysis (PCA) was applied to reduce the dimensionality of these feature maps from to a compact representation of . These reduced-dimensional feature representations were then visualized on a 2D plane, and the outcomes are presented in Figure 5. Figure 5a provides a visual representation of features before entering the nmODE module, while Figure 5b showcases the features after undergoing processing by the nmODE module. Each color in the visualizations corresponds to the same CT scan across both figures. It can be observed that the original features, after being processed by nmODE, exhibit more pronounced overall similarity in shape, indicating that nmODE’s memory capacity can indeed extract common features among different samples. Furthermore, the distribution of features before processing by the nmODE module is more concentrated. In contrast, the feature distribution following nmODE processing exhibits a heightened level of uniformity and reveals stronger inter-feature correlations. These findings underscore the nmODE module’s adeptness in learning intricate feature correlations, thus enhancing its capacity to capture meaningful relationships within the data.
Figure 5.
Visualized feature-maps of 6 CT volumes. In (a,b), the same colors represent the output features of the same CT volumes and each scatterplot in the figure is obtained by projecting the various channels of the model’s encoder output features onto a two-dimensional plane after PCA dimension reduction. (a) shows the scatterplot of the feature mapping obtained before processing with the nmODE module, while (b) shows the scatterplot of the feature mapping obtained after processing with the nmODE module.
5. Discussion and Conclusions
This study proposes a 3D fully convolutional neural network, nmPLS-Net, which combines nmODE (nonlinear memory ordinary differential equation) with multitask learning to perform the end-to-end segmentation of pulmonary lobes in chest CT scans. The memory capabilities and high nonlinearity of the nmODE module enhance the extraction of global features, improving the model’s performance in segmenting the overall lung and complex scenarios in the right lung. Additionally, fissure-enhanced associative learning guides the model’s attention toward challenging voxels within pulmonary fissures. The model’s performance is validated on two distinct datasets, and the experimental results demonstrate that nmPLS-Net achieves notably superior results in right pulmonary lobe segmentation compared to previous methods. However, incorporating nmODE requires an ODE solver, which may slightly reduce computational speed. This problem can be addressed in future work through knowledge distillation, an area we plan to explore for further improvements.
Author Contributions
Conceptualization, Z.Y. and P.D.; methodology, P.D.; software, P.D. and H.N.; validation, P.D.; data curation, P.D. and H.N.; writing—original draft, P.D. and H.N.; writing—review and editing, Z.Y. and X.X.; supervision, Z.Y. and X.X.; All authors have read and agreed to the published version of the manuscript.
Funding
This work was supported by the National Major Science and Technology Projects of China under Grant 2018AAA0100201, the National Natural Science Foundation of China under Grant 62106163, Major Science and Technology Project from the Science & Technology Department of Sichuan Province under Grant 2020YFG0473, the Natural Science Foundation Project of Sichuan Province under Grant 2023YFG0283 and the CAAI-Huawei MindSpore Open Fund under Grant 21H1235.
Data Availability Statement
The base dataset is public. The LUNA16 dataset presented in this study is openly available in https://github.com/deep-voxel/automatic_pulmonary_lobe_segmentation_using_deep_learning at https://doi.org/10.1109/ISBI.2019.8759468, reference number [] (all accessed on 10 April 2019).
Conflicts of Interest
The authors declare no conflict of interest.
References
- Yi, Z. nmODE: Neural memory ordinary differential equation. Artif. Intell. Rev. 2023, 56, 14403–14438. [Google Scholar] [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings, Part III 18, Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
- Shoaib, M.A.; Lai, K.W.; Chuah, J.H.; Hum, Y.C.; Ali, R.; Dhanalakshmi, S.; Wang, H.; Wu, X. Comparative studies of deep learning segmentation models for left ventricle segmentation. Front. Public Health 2022, 10, 981019. [Google Scholar] [CrossRef] [PubMed]
- Zhao, Z.; Chuah, J.H.; Lai, K.W.; Chow, C.O.; Gochoo, M.; Dhanalakshmi, S.; Wang, N.; Bao, W.; Wu, X. Conventional machine learning and deep learning in Alzheimer’s disease diagnosis using neuroimaging: A review. Front. Comput. Neurosci. 2023, 17, 1038636. [Google Scholar] [CrossRef] [PubMed]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
- Liu, Z.; Hu, H.; Lin, Y.; Yao, Z.; Xie, Z.; Wei, Y.; Ning, J.; Cao, Y.; Zhang, Z.; Dong, L.; et al. Swin transformer v2: Scaling up capacity and resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 12009–12019. [Google Scholar]
- Chen, J.; Lu, Y.; Yu, Q.; Luo, X.; Adeli, E.; Wang, Y.; Lu, L.; Yuille, A.L.; Zhou, Y. Transunet: Transformers make strong encoders for medical image segmentation. arXiv 2021, arXiv:2102.04306. [Google Scholar]
- Shen, J.; Lu, S.; Qu, R.; Zhao, H.; Zhang, L.; Chang, A.; Zhang, Y.; Fu, W.; Zhang, Z. A boundary-guided transformer for measuring distance from rectal tumor to anal verge on magnetic resonance images. Patterns 2023, 4, 100711. [Google Scholar] [CrossRef] [PubMed]
- Liu, Y.; Zhang, Y.; Wang, Y.; Hou, F.; Yuan, J.; Tian, J.; Zhang, Y.; Shi, Z.; Fan, J.; He, Z. A survey of visual transformers. arXiv 2023, arXiv:2111.06091. [Google Scholar] [CrossRef] [PubMed]
- Milletari, F.; Navab, N.; Ahmadi, S.A. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA, 25–28 October 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 565–571. [Google Scholar]
- Imran, A.A.Z.; Hatamizadeh, A.; Ananth, S.P.; Ding, X.; Terzopoulos, D.; Tajbakhsh, N. Automatic segmentation of pulmonary lobes using a progressive dense V-network. In Proceedings of the International Workshop on Deep Learning in Medical Image Analysis, Granada, Spain, 20 September 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 282–290. [Google Scholar]
- Ferreira, F.T.; Sousa, P.; Galdran, A.; Sousa, M.R.; Campilho, A. End-to-end supervised lung lobe segmentation. In Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, Brazil, 8–13 July 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–8. [Google Scholar]
- Tang, H.; Zhang, C.; Xie, X. Automatic pulmonary lobe segmentation using deep learning. In Proceedings of the 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019), Venice, Italy, 8–11 April 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1225–1228. [Google Scholar]
- Lee, H.; Matin, T.; Gleeson, F.; Grau, V. Efficient 3D fully convolutional networks for pulmonary lobe segmentation in CT images. arXiv 2019, arXiv:1909.07474. [Google Scholar]
- Xie, W.; Jacobs, C.; Charbonnier, J.P.; Van Ginneken, B. Relational modeling for robust and efficient pulmonary lobe segmentation in CT scans. IEEE Trans. Med Imaging 2020, 39, 2664–2675. [Google Scholar] [CrossRef] [PubMed]
- Fan, X.; Xu, X.; Feng, J.; Huang, H.; Zuo, X.; Xu, G.; Ma, G.; Chen, B.; Wu, J.; Huang, Y.; et al. Learnable interpolation and extrapolation network for fuzzy pulmonary lobe segmentation. IET Image Process. 2023, 17, 3258–3270. [Google Scholar] [CrossRef]
- Liu, J.; Wang, C.; Guo, J.; Shao, J.; Xu, X.; Liu, X.; Li, H.; Li, W.; Yi, Z. RPLS-Net: Pulmonary lobe segmentation based on 3D fully convolutional networks and multi-task learning. Int. J. Comput. Assist. Radiol. Surg. 2021, 16, 895–904. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NA, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
- Haber, E.; Ruthotto, L. Stable architectures for deep neural networks. Inverse Probl. 2017, 34, 014004. [Google Scholar] [CrossRef]
- Chen, R.T.Q.; Rubanova, Y.; Bettencourt, J.; Duvenaud, D.K. Neural Ordinary Differential Equations. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 3–8 December 2018; Volume 31. [Google Scholar]
- Dupont, E.; Doucet, A.; Teh, Y.W. Augmented Neural ODEs. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; Volume 32. [Google Scholar]
- Chen, R.T.Q.; Amos, B.; Nickel, M. Learning Neural Event Functions for Ordinary Differential Equations. In Proceedings of the International Conference on Learning Representations, Virtual Event, 3–7 May 2021. [Google Scholar]
- Zhang, T.; Yao, Z.; Gholami, A.; Gonzalez, J.E.; Keutzer, K.; Mahoney, M.W.; Biros, G. ANODEV2: A coupled neural ODE framework. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; Volume 32. [Google Scholar]
- Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
- Yu, F.; Koltun, V. Multi-scale context aggregation by dilated convolutions. arXiv 2015, arXiv:1511.07122. [Google Scholar]
- Poucet, B.; Save, E. Attractors in memory. Science 2005, 308, 799–800. [Google Scholar] [CrossRef]
- Wills, T.J.; Lever, C.; Cacucci, F.; Burgess, N.; O’Keefe, J. Attractor dynamics in the hippocampal representation of the local environment. Science 2005, 308, 873–876. [Google Scholar] [CrossRef] [PubMed]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; Volume 32. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. In Proceedings of the International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).