DEHA-Net: A Dual-Encoder-Based Hard Attention Network with an Adaptive ROI Mechanism for Lung Nodule Segmentation

Usman, Muhammad; Shin, Yeong-Gil

doi:10.3390/s23041989

Open AccessArticle

DEHA-Net: A Dual-Encoder-Based Hard Attention Network with an Adaptive ROI Mechanism for Lung Nodule Segmentation

by

Muhammad Usman

^*

and

Yeong-Gil Shin

Department of Computer Science and Engineering, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, Republic of Korea

^*

Author to whom correspondence should be addressed.

Sensors 2023, 23(4), 1989; https://doi.org/10.3390/s23041989

Submission received: 3 January 2023 / Revised: 31 January 2023 / Accepted: 7 February 2023 / Published: 10 February 2023

(This article belongs to the Special Issue Deep Learning for Semantic Segmentation and Explainable AI Based on Sensing Technology)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Measuring pulmonary nodules accurately can help the early diagnosis of lung cancer, which can increase the survival rate among patients. Numerous techniques for lung nodule segmentation have been developed; however, most of them either rely on the 3D volumetric region of interest (VOI) input by radiologists or use the 2D fixed region of interest (ROI) for all the slices of computed tomography (CT) scan. These methods only consider the presence of nodules within the given VOI, which limits the networks’ ability to detect nodules outside the VOI and can also encompass unnecessary structures in the VOI, leading to potentially inaccurate segmentation. In this work, we propose a novel approach for 3D lung nodule segmentation that utilizes the 2D region of interest (ROI) inputted from a radiologist or computer-aided detection (CADe) system. Concretely, we developed a two-stage lung nodule segmentation technique. Firstly, we designed a dual-encoder-based hard attention network (DEHA-Net) in which the full axial slice of thoracic computed tomography (CT) scan, along with an ROI mask, were considered as input to segment the lung nodule in the given slice. The output of DEHA-Net, the segmentation mask of the lung nodule, was inputted to the adaptive region of interest (A-ROI) algorithm to automatically generate the ROI masks for the surrounding slices, which eliminated the need for any further inputs from radiologists. After extracting the segmentation along the axial axis, at the second stage, we further investigated the lung nodule along sagittal and coronal views by employing DEHA-Net. All the estimated masks were inputted into the consensus module to obtain the final volumetric segmentation of the nodule. The proposed scheme was rigorously evaluated on the lung image database consortium and image database resource initiative (LIDC/IDRI) dataset, and an extensive analysis of the results was performed. The quantitative analysis showed that the proposed method not only improved the existing state-of-the-art methods in terms of dice score but also showed significant robustness against different types, shapes, and dimensions of the lung nodules. The proposed framework achieved the average dice score, sensitivity, and positive predictive value of 87.91%, 90.84%, and 89.56%, respectively.

Keywords:

lung nodule segmentation; 3D segmentation; dual-encoder-based CNN; hard attention

1. Introduction

Lung cancer is the deadliest cancer type, and early detection is crucial for potentially life-saving treatment. The accurate quantification of pulmonary nodules, which may be associated with various conditions but are often indicative of lung cancer, is essential for the continuous monitoring of lung nodule volume to assess the malignancy and predict the likelihood of lung cancer [1,2]. However, the manual segmentation of nodules, which represents a necessary step in calculating their volume, is a laborious and time-consuming process that can also introduce variability between and within observers [3].

Computer-aided diagnosis (CAD) systems can significantly enhance the productivity of radiologists by assisting them in overcoming the challenges associated with the manual segmentation of pulmonary nodules. CAD systems consist of two subsystems, i.e., computer-aided detection (CADe) [4] and computer-aided diagnosis (CADx) [5]. The CADe system aims to distinguish between nodules and other structures, such as tissues and blood vessels. The CADx system then evaluates the detected nodules and determines whether they are benign or malignant tumors. The primary goal of these CAD systems is to improve the accuracy and efficiency of cancer diagnosis by radiologists. They are designed to assist decision making by providing additional information and reducing the time needed to interpret CT images. This work is focused on developing the CADx system for accurate lung nodule segmentation, which is challenging due to variable shapes, different sizes, and complicated surrounding tissues in the lung region. Various automatic segmentation frameworks for nodule quantification have been devised; such techniques consist of traditional image-processing-based methods and deep-learning-based approaches [6]. However, the significant heterogeneity of lung nodules, particularly the variations in the shape and contrast of lung nodules, hinders the development of a robust nodule segmentation framework. These variations, both within and between nodules, as well as the visual similarity between nodules and their surrounding non-nodule tissue, necessitate the use of a 3D volume of interest (VOI) as input to estimate the shape of the nodule accurately. Figure 1 demonstrates the intra-nodule and inter-nodule variations, showcasing the diversity between the forms of different nodules and the variations present within individual nodules. Providing a 3D VOI is quite a time-consuming and laborious task, as the radiologist has to specify the region of interest at each slice containing the nodule. A few studies have resolved this issue by utilizing a fixed ROI for all the slices; this approach requires only one ROI input from the user, which significantly reduces the time and hassle. However, employing a fixed ROI adds redundant non-nodular regions to the input ROI, leading to poor segmentation performance.

To address the issues related to using 3D VOIs as input and fixed ROIs, in our previous work [5], we proposed a novel approach using dynamic ROIs for the accurate volumetric segmentation of pulmonary nodules. To determine the dynamic ROIs, we proposed an adaptive region of interest (A-ROI) algorithm that utilizes a single 2D ROI provided by radiologists [5] to estimate the dynamic ROIs in the surrounding slices. This approach begins by segmenting the nodule in the initially provided ROI by employing residual-UNet and then utilizes the segmentation mask to determine the ROIs for the surrounding slices to extend the nodule segmentation in both directions. Concretely, the A-ROI algorithm dynamically adjusts the position and size of the bounding box for the adjacent slices to investigate the penetration of the nodule in the other slices. The technique demonstrated exceptional performance and outperformed the previous state-of-the-art methods. However, this previous approach required cropping the ROI, which can cause problems due to the inconsistent size of nodules, for instance, if the ROIs are too small or larger than the normalized dimensions used to input into the network. Similarly, the mask obtained after inference must be resized to match the original cropped ROI size, introducing error when interpolation is used to achieve the target dimensions. To address these issues, we propose a dual-encoder-based architecture that takes two inputs: the original slices and the ROI mask, eliminating the need to rescale the ROIs before and after inference. The A-ROI algorithm was then used to further produce ROI masks for surrounding slices for which ROIs were not provided. Specifically, the A-ROI algorithm was applied along the axial plane to provide an initial estimation of nodule shape, which was then used to extract a 3D VOI from the scan automatically. The extracted VOI was further utilized to create the coronal and sagittal views of the nodule, and the slices from these views were analyzed using two different dual-encoder-based architectures. Finally, a consensus module was employed to ensemble the three predictions from axial, coronal, and sagittal view models. Several experiments were performed on the LIDC dataset [7] to demonstrate the effectiveness of the proposed technique in terms of overall performance and robustness relative to the variations in the type and size of lung nodules.

2. Related Work

An accurate assessment of lung nodules is essential for evaluating their potential malignancy and the likelihood of being indicative of lung cancer. Subsequently, numerous researchers have made extensive efforts to devise an efficient nodule segmentation framework to assist radiologists. These studies can be classified into two categories, i.e., conventional image-processing-based methods and advanced deep-learning-based techniques [6].

Jamshid et al. [8] proposed a framework that segmented the nodule by employing region-growing techniques, such as contrast-based region growing and fuzzy connectivity region growing, and created a volumetric mask using a local adaptive segmentation algorithm that distinguishes between foreground and background regions within a specified window size. While the algorithm demonstrated good performance for isolated nodules, it could not effectively segment the attached ones. Using geodesic impact zones in a multi-threshold picture representation, Stefano et al. [9] offered a user-interactive algorithm that meets the fusion-segregation criterion based on both gray-level similarity and object shape. They extended their work in another study [10] by eliminating the need for user interaction. A correction procedure was then performed based on a 3D local shape analysis, allowing for the refinement of an initial nodule segmentation to distinguish possible vessels from the nodule itself without requiring input from the user. Rendon et al. [11] used morphological and threshold approaches to eliminate extraneous structures from a given ROI. The last step was to use a support vector machine (SVM) to categorize each pixel in the discovered space.

Although classical image-processing-based techniques achieve accurate lung nodule segmentation, such techniques are susceptible to the types of nodules. In contrast, recent deep-learning-based methods have made wast inroads into many medical imaging applications such as disease classification [12] and segmentation applications [13,14], including lung nodule segmentation [15]. The introduction of the UNet [16] architecture for medical image segmentation, in particular, has dramatically enhanced the performance of various crucial tasks, such as tumor segmentation [17]. As a result, there has been an increased focus on using deep learning for lung nodule segmentation. In [18], Tyagi et al. proposed a 3D conditional generative adversarial network (GAN) for lung nodule segmentation. They utilized the UNet architecture as the backbone of GAN. They employed a simple classification network as a discriminator, incorporating spatial squeeze, and channel excitation modules to differentiate between truth and fake segmentation. Similarly, Wang et al. [19] developed a method for nodule segmentation called central-focused convolutional neural networks (CF-CNNs). This approach uses a volumetric patch centered around the voxel of interest as input to the model. In addition, the team [20] also published a multi-view CNN that can perform nodule segmentation using input from different views (axial, coronal, and sagittal) of the same voxel. One potential limitation of this method is that the patch extraction process is the same for all nodules, which could lead to incorrect segmentation if the nodule is larger than the size of the patch. By using skip connections in the encoder and decoder paths, Tong et al. [21] enhanced the performance of UNet for nodule segmentation; however, the model was only intended for 2D segmentation. Hancock et al. [22] put forth a variation on the standard level-set picture segmentation technique in which, as opposed to being manually created, the velocity function is instead learned from data using regression machine learning techniques. They reported slightly improved performance when they applied this segmentation approach to the segmentation of lung nodules. Chen et al. [23] proposed an end-to-end multi-task learning framework that consists of joint classification and multi-channel segmentation networks. Both networks utilized the exact latent representation learned by the common encoder branch, improving lung segmentation performance. The study also incorporated an enhanced version of patches by using OTSU and SLIC methods. To extract local characteristics and detailed contextual information from lung nodules, Liu et al. [24] used a residual-block-based dual-path network, which significantly improved performance. They also employed a fixed VOI, which restricts the nodule search and lowers 3D segmentation performance. To avoid this issue, Chen et al. [25] proposed a fast multi-crop guided attention (FMGA) network for lung nodule segmentation by incorporating 2D- and 3D-cropped ROIs. They applied the greedy search algorithm to explore the penetration of lung nodules into the surrounding slices. Their framework also exploited a customized loss function, enabling the network to focus on improving the segmentation of nodule borders. Their results demonstrated the robustness of the proposed framework; however, the scheme failed to improve state-of-the-art methods in terms of the overall dice score.

In our previous work [5], we addressed the limitations of a fixed volume of interest (VOI) by introducing the concept of an adaptive 2D region of interest (ROI) in each slice, which significantly improved the ability to utilize deep learning. Most notably, cropped ROIs were fed to the deep residual UNet [26], which demonstrated promising performance along with several limitations. Particularly, due to the heterogeneity of lung nodules, numerous variations in dimensions are possible, which makes it impossible to find the optimal input dimensions for the network. Subsequently, the cropped ROI has to be severally resized, by upsampling or downsampling the ROI, which affects the performance of the proposed framework. One possible alternative is to train various models with different input dimensions. However, this comes with an immense increase in the computational cost, which hinders the solution’s real-time clinical applications. For instance, Zhang et al. [27] proposed multi-scale segmentation squeeze-and-excitation UNet with a conditional random field to segment the nodule in the given volume of interest. They extracted VOIs at four different scales and trained four different networks and finally applied a conditional random field to merge the four predictions. Their framework increased the computational complexity and only covered four scales defined according to dimensions available in the given dataset, which is insufficient to cover the possible diversity in the size of lung nodules in real-time clinical applications. To overcome the aforementioned issue, in this work, we propose a dual-encoder-based architecture that incorporates the ROI mask to input as hard attention, which enables the framework to avoid the pre- and post-inference resizing and leads to performance improvement.

3. Materials and Methods

3.1. Dataset

In this work, we used the lung image database consortium and image database resource initiative (LIDC-IDRI) database [7,28], which is the largest publicly available resource for lung CT scans. This dataset is created to facilitate the development of computer-aided systems for evaluating lung nodule identification, categorization, and quantification. In the LIDC-IDRI, a sizable number of thoracic CT scans have been gathered; the database comprises 1018 diagnostics and screening thoracic CT images for lung cancer from 1010 individuals with annotated lesions. Each thoracic CT scan underwent a two-phase annotation process performed by four qualified radiologists. As in earlier studies [5,29,30], in this work, we also considered nodules with a minimum diameter of 3 mm and annotations from all four radiologists. The ground-truth border for pulmonary nodule segmentation was created using a 50% consensus criterion [31] due to the variability among the four radiologists, and a Python module named pyLIDC was employed. A total of 893 nodules from the LIDC dataset were selected and randomly distributed into 40%, 5%, and 55% sets, which were, respectively, used as training, validation, and test sets.

3.2. Data Pre-Processing

The pre-possessing of CT images can significantly improve the network’s performance by reducing the influence of noise and irrelevant tissues. Normalizing the image can reduce the network’s dependence on parameter initialization, smoothing the optimization process, and, subsequently, enhancing the convergence probability. Concretely, grayscale thresholding was applied to normalize the intensity range, which helped to suppress irrelevant, redundant information. This enabled the network to pay attention to the relevant tissue and reduce the complexity of the input data, making the network’s training more efficient and effective.

We also normalized the intensity values, ranging from 0 to 1, by using the window center and window width tag from corresponding DICOM files [32]. The normalization can be defined as follows:

I_{n} = \frac{I - W_{M i n}}{W_{M a x} - W_{M i n}},

(1)

W_{M i n} = W C - W W / 2,

(2)

W_{M a x} = W C + W W / 2,

(3)

where I,

I_{n}

,

W C

, and

W W

represent the original image, normalized image, window center, and window width, respectively. The values of the window center and window width are extracted from the DICOM tags [32].

The LIDC collection includes the scans obtained from numerous locations and scanners. Consequently, it has a variety of pixel spacings and slice thicknesses. These variables are crucial for nodule appearance. In particular, slice thickness significantly impacts the coronal and sagittal views. Slice thickness in most LIDC scans, which spans from 0.45 mm to 5.0 mm, is higher than pixel spacing. Therefore, to enhance the visibility of nodules in all three views, the slice thickness was reduced to the corresponding pixel spacing by upsampling the scan along the z-axis. The pixel spacing remained unchanged, as it was less than one for each scan, producing an axial view of the nodules in a reasonable resolution.

In contrast to previous studies [19,20,21,33], which produced the training samples by employing the constant margin scheme, in this work, we utilized the ROI with random margins on each side as in [5]. To train our DEHA-Net architecture, we generated ROI masks by using ground-truth nodule masks. To enforce our model to learn about the absence of lung nodules in a given slice, we also included two non-nodular slices from both sides of each nodule.

3.3. Dual-Encoder-Based Hard Attention Network with Adaptive ROI Mechanism

The proposed framework utilized a novel dual-Encoder-based hard attention network (DEHA-Net) with an adaptive ROI (A-ROI) mechanism. The overall framework is illustrated in Figure 2. The first stage, the 2D ROI, which used manual input from a radiologist or computer-aided diagnosis (CADe) system as its source, was carried out using DEHA-Net along the axial axis. The A-ROI algorithm was applied to generate the ROIs for the remaining surrounding slices, which enabled the investigation of the nodules along the axial view in order to reconstruct the 3D mask of the nodule. In the second stage, the 3D mask constructed after the axial analysis was exploited to generate the ROIs along the sagittal and coronal views. Then, we applied the proposed DEHA-Net along the sagittal and coronal views with predefined ROIs generated from the 3D mask obtained at the first stage. Finally, a consensus module was utilized to produce the final 3D segmentation mask of the nodule. It is important to note that in the whole pipeline, no resizing was performed, thus eliminating the issues associated with the rescaling of a given input and network output. This enabled our network to achieve improved performance and made it more robust to size variations in various nodules. The following subsection describes the details of the proposed DEHA-Net and the A-ROI algorithm.

Dual-Encoder-Based Hard Attention Network

Lung nodules vary in shape and dimension, making it impossible to set a suitable input dimension for the network. To overcome this issue, we designed a dual-encoder-based hard attention network (DEHA-Net) that incorporated two inputs, i.e., the slice containing the nodule and ROI mask, to segment the nodule in the given slice accurately. Specifically, the ROI mask provided hard attention, which enabled the network to focus on only the provided region of interest. The proposed DEHA-Net consisted of two encoders and one decoder branch, as demonstrated in Figure 3. Each encoder was connected to the decoder with residual connections from four different levels. Both encoders had identical architecture, consisting of four levels. At nth level, there was a convolution layer of

32 \times n^{2}

filters and a kernel size of 3 × 3, followed by rectified linear activation to add non-linearity. After ReLU, there was a batch normalization layer and then a max-pooling layer, which compressed the information. These four layers made a single level of an encoder.

The first encoder extracted the features from CT scan images, while the second encoder enforced the hard attention learned from the ROI mask of the nodule. Its primary purpose was to maintain the network’s focus on the nodule’s location. Decoders output the segmentation mask of nodule for the current mask and ROI for the next and previous slice. Similarly, the decoder consisted of four levels, each consisting of a concatenation layer followed by a convolution layer. After that, rectified linear was applied for non-linearity, followed by a batch normalization layer, and finally, these features were upsampled. In the last level of the decoder, the upsampling layer was replaced by a convolution layer of a single filter with SoftMax activation. Each concatenation layer of the decoder concatenated the features from each level of both encoders and the previous decoder level to pass into the proceeding layers.

3.4. Adaptive ROI Algorithm

The adaptive ROI (A-ROI) algorithm was proposed in [5], which enables the network to investigate nodule presence in the surrounding slices without having ROIs from the user. Concretely, the A-ROI algorithm exploits the segmented mask of nodules in the current (nth) slice generated by the network to estimate the ROI for the successive slices (i.e.,

n \pm 1

). In this work, we employed the A-ROI algorithm to complement the proposed DEHA-Net to perform the 3D segmentation of lung nodules. A-ROI utilizes a hyperparameter

R_{T} \in (0, 1)

to moderate the margins around the nodule in the generated ROI masks.

The full impact of the A-ROI algorithm is demonstrated in Figure 4. Two constant ROIs are shown in the first row in red and blue, which remain fixed throughout all the slices: One with tight margins failed to cover the nodule in the surrounding slices, while the other constant ROI had wider margins, which added redundant area, thus confusing the network. By contrast, in the second row, the dynamic ROI produced by the A-ROI algorithm is shown. The column shows the different slices; (a) represents the slice where the user provides the first ROI, and (b–f) demonstrate the adjacent slices.

The proposed framework for generating the 3D segmentation mask of lung nodules along the axial view is described in Algorithm 1. Algorithm 1 illustrates the steps followed to generate the 3D segmentation mask by investigating nodule penetration along the axial axis. The provided ROI by the radiologist or CADe system in

n_{i} t h

slice is represented by

R o I_{n_{i}}

and is used as

R o I_{n}

to initiate the segmentation. The normalized slice,

I_{n}

, and the provided ROI are fed to DEHA-Net, denoted by

Θ

. Later, the segmentation mask of the nodule generated by DEHA-Net was inputted into the A-ROI algorithm to produce the ROI mask for the next slice. The next slice could be in any direction, i.e., forward or backward. The same cycle was repeated until the next ROI mask became blank.

Algorithm 1: The algorithmic steps followed in the proposed framework for nodule investigation along the axial view.

1:: $n = n_{i}$ , $R o I_{n} = R o I_{n_{i}}$
2:: while $\sum R o I_{n} > 0$ do
3:: $S e g_{n} = Θ (I_{n}, R o I_{n})$
4:: $n \leftarrow n \pm 1$
5:: $R o I_{n} \leftarrow A R O I (S e g_{n}, R_{T})$
6:: end while

3.5. Ensembling Mechanism

The proposed framework utilized a consensus module to ensemble the segmentation results obtained from the axial, sagittal, and coronal axes. The consensus value

E_{i}

of

i_{t h}

voxel is calculated as follows:

E_{i} = t h r (\sum_{j = 1}^{K} S_{i j}, τ)

(4)

t h r (g, τ) = \{\begin{matrix} 1, & i f g \geq τ \\ 0, & O t h e r w i s e \end{matrix}

(5)

where

S_{i j}

represents the prediction of

i t h

from the

j t h

model, and K denotes the number of models, which in our case were three, i.e., axial, sagittal, and coronal.

τ

is the threshold that is determined on the validation set.

4. Experimental Setup and Implementation Details

4.1. Loss Function

To train the proposed DEHA-Net, we utilized the dice similarity coefficient (DSC) [34] loss, which can be defined as follows:

L_{DSC} = \frac{1}{N} \sum_{i = 1}^{N} [1 - \frac{2 * Θ (I_{i}, R o I_{i}) \cap S_{g_{i}}}{Θ (I_{i}, R o I_{i}) + S_{g_{i}}}]

(6)

where

Θ

,

S_{g_{i}}

, and N represent the model, ground-truth segmentation mask, and the number of samples in the training set, respectively. We used stochastic gradient descent (SGD) to train our network.

4.2. Implementation Details and Training Strategy

We used the Keras [35] framework for implementing the proposed DEHA-Net and used Equation (6) with an SGD scheme to minimize the error. The model was trained on an Nvidia Tesla V100 Tensor core GPU with 12,821 images sized

512 \times 512

and a batch size of 8. Training was initiated from random weights and with an initial learning rate of 0.001 and the first and second momentum of 0.9 for the decay of the learning rate. We used early stopping with a patience value of 10 epochs to avoid overfitting.

4.3. Performance Measures

We considered three evaluation parameters to rigorously evaluate the performance of the proposed framework. The following evaluation parameters were used to evaluate the performance of our proposed method.

Dice Similarity Coefficient: We used the dice similarity coefficient (DSC) [19,36], which measures the degree of overlap between the ground-truth mask and the predicted mask. The DSC values range from 0 to 1, where 1 and 0 indicate complete overlap and no overlap, respectively. It can be defined as follows:

$D S C = \frac{2 * Y^{'} \cap Y}{Y^{'} \cup Y} .$

(7)

where $Y^{'}$ and Y are the predicted segmentation mask and reference segment mask, respectively.
Sensitivity: To measure the pixel classification performance proposed framework, we used sensitivity (SEN), which can be defined as follows:

$S E N = \frac{Y^{'} \cap Y}{Y} .$

(8)
Positive Predictive Value (PPV): To measure the correctness of the segmentation area produced by the proposed framework, we used the positive predictive value (PPV), which can be defined as follows:

$P P V = \frac{Y^{'} \cap Y}{Y^{'}} .$

(9)

5. Results and Discussion

5.1. Overall Performance Analysis

We evaluated our proposed framework on the parameters described in Section 4.3 and compared its performance with previously published studies. Table 1 summarizes the results achieved by using our framework on the test set along with the reported performance of existing studies. It demonstrates that our proposed architecture outperforms the existing methods in terms of the dice score while also having the lowest standard deviation, which depicts its robustness against the variations in the type and size of lung nodules. In comparison with our previous work [5], which utilized the cropped ROI input, our current approach offers improved performance with a lower standard deviation. This can be attributed to the incorporation of ROI masks into a dual-encoder-based architecture, which eliminates the necessity to crop and normalize the input slice. It also signifies the effectiveness of the incorporation of the A-ROI algorithm in the proposed scheme to estimate the ROI masks for the surrounding slices of a given input slice.

5.2. Robustness Analysis

The LIDC dataset includes annotations that describe various characteristics of nodules, such as their subtlety, internal structure, calcification, sphericity, margin, lobulation, speculation, texture, and malignancy. These characteristics represent different levels of difficulty in detecting the boundaries of nodules. To evaluate the effectiveness of our method, we divided the test data into groups based on each characteristic and analyzed the results for each group. Table 2 presents the dice scores for each group, which demonstrates that our framework performs consistently in each group, and promising results can be obtained on all types of lung nodules. This can be attributed to the hard attention mechanism, which enables the proposed DEHA-Net to only focus on the given ROI region while leveraging the surrounding information to distinguish the nodule.

Further, to illustrate the robustness of the proposed method, a histogram of the distribution of the dice scores on the test set of the LIDC dataset is shown in Figure 5. The majority of test instances have a score of over 85%, which demonstrates the strong performance of our proposed method.

5.3. Qualitative Analysis

To elaborate on the difference in the performances of this framework and our previous work [5], we performed a visual analysis of the results. Figure 6 shows the visual results with axial views on randomly selected nodules of different sizes and types. The results demonstrate that the incorporation of hard attention with the ROI mask in the model significantly improves the segmentation performance. It can also be observed that the resizing of cropped slice disturbs the boundary of the segmented nodule, which is critical in determining the exact dimensions of the nodule and subsequently, the malignancy level. The proposed DEHA-Net enables our framework to utilize the full slice without losing minor details of the given input image, which are crucial to perform the accurate segmentation of lung nodules.

6. Conclusions

In this work, we proposed a novel dual-stage-based framework that used a 2D slice along with the seed region of interest (ROI), covering the nodule area, to produce the 3D segmentation of the nodule. To segment the nodule in the given slice, we proposed a novel dual-encoder-based hard attention network (DEHA-Net), which utilized the adaptive region of interest (A-ROI) algorithm for estimating the ROI for the surrounding slices. In contrast to the previous studies in which a cropped patch of a given slice is inputted to the network, the proposed DEHA-Net leverages complete 2D contextual information by taking the entire slice as input. It helps the DEHA-Net to learn the meaningful features that better distinguish between the nodule and non-nodular voxels. In the second stage, after obtaining the 3D segmentation of nodules from axial slices, the framework followed the same segmentation scheme for sagittal and coronal views. Finally, a consensus module was employed to process the results from all three axes to obtain the refined segmentation mask. An extensive evaluation of the proposed framework was performed on the lung image database consortium and image database resource initiative (LIDC/IDRI) dataset, which is the largest publicly available dataset. The quantitative and qualitative results were presented and analyzed, which demonstrate that the technique shows excellent performance by outperforming the existing state-of-the-art methods in terms of the dice similarity score. Furthermore, our results reveal that the framework is significantly robust to the various types and sizes of nodules. Future plans include improvement in the framework by reducing its computational complexity to optimize its performance in terms of execution time.

Author Contributions

Conceptualization, M.U.; methodology, M.U.; validation, M.U. and Y.-G.S.; writing—original draft preparation, M.U. and Y.S; writing—review and editing, M.U. and Y.-G.S.; supervision, Y.-G.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research has been supported by HealthHub, Seoul, South Korea.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data used in this study is public dataset from The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI) which can be accessed from https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=1966254 (accessed on 5 February 2023). The architecture source code can be provided on request.

Acknowledgments

We are grateful to Abdullah Shahid, AI R&D department, HealthHub, for supporting the experimental analysis and validation of the proposed technique. We also thank the managing editor, physical section and academic editor for inviting us to submit this paper and facilitating a smooth editorial process.

Conflicts of Interest

The authors declare no conflict of interest.

References

Mozley, P.D.; Bendtsen, C.; Zhao, B.; Schwartz, L.H.; Thorn, M.; Rong, Y.; Zhang, L.; Perrone, A.; Korn, R.; Buckler, A.J. Measurement of tumor volumes improves RECIST-based response assessments in advanced lung cancer. Transl. Oncol. 2012, 5, 19. [Google Scholar] [CrossRef]
Devaraj, A.; van Ginneken, B.; Nair, A.; Baldwin, D. Use of volumetry for lung nodule management: Theory and practice. Radiology 2017, 284, 630–644. [Google Scholar] [CrossRef] [PubMed]
Moltz, J.H.; Bornemann, L.; Kuhnigk, J.M.; Dicken, V.; Peitgen, E.; Meier, S.; Bolte, H.; Fabel, M.; Bauknecht, H.C.; Hittinger, M.; et al. Advanced segmentation techniques for lung nodules, liver metastases, and enlarged lymph nodes in CT scans. IEEE J. Sel. Top. Signal Process. 2009, 3, 122–134. [Google Scholar] [CrossRef]
Usman, M.; Rehman, A.; Shahid, A.; Latif, S.; Byon, S.S.; Lee, B.D.; Kim, S.H.; Shin, Y.G. MEDS-Net: Self-Distilled Multi-Encoders Network with Bi-Direction Maximum Intensity projections for Lung Nodule Detection. arXiv 2022, arXiv:2211.00003. [Google Scholar]
Usman, M.; Lee, B.D.; Byon, S.S.; Kim, S.H.; Lee, B.i.; Shin, Y.G. Volumetric lung nodule segmentation using adaptive roi with multi-view residual learning. Sci. Rep. 2020, 10, 12839. [Google Scholar] [CrossRef] [PubMed]
Wu, J.; Qian, T. A survey of pulmonary nodule detection, segmentation and classification in computed tomography with deep learning techniques. J. Med. Artif. Intell. 2019, 2. [Google Scholar] [CrossRef]
Armato, S.G., III; McLennan, G.; Bidaut, L.; McNitt-Gray, M.F.; Meyer, C.R.; Reeves, A.P.; Zhao, B.; Aberle, D.R.; Henschke, C.I.; Hoffman, E.A.; et al. The lung image database consortium (LIDC) and image database resource initiative (IDRI): A completed reference database of lung nodules on CT scans. Med. Phys. 2011, 38, 915–931. [Google Scholar] [CrossRef] [PubMed]
Dehmeshki, J.; Amin, H.; Valdivieso, M.; Ye, X. Segmentation of pulmonary nodules in thoracic CT scans: A region growing approach. IEEE Trans. Med. Imaging 2008, 27, 467–480. [Google Scholar] [CrossRef]
Diciotti, S.; Picozzi, G.; Falchini, M.; Mascalchi, M.; Villari, N.; Valli, G. 3-D segmentation algorithm of small lung nodules in spiral CT images. IEEE Trans. Inf. Technol. Biomed. 2008, 12, 7–19. [Google Scholar] [CrossRef]
Diciotti, S.; Lombardo, S.; Falchini, M.; Picozzi, G.; Mascalchi, M. Automated segmentation refinement of small lung nodules in CT scans by local shape analysis. IEEE Trans. Biomed. Eng. 2011, 58, 3418–3428. [Google Scholar] [CrossRef]
Rendon-Gonzalez, E.; Ponomaryov, V. Automatic Lung nodule segmentation and classification in CT images based on SVM. In Proceedings of the 2016 9th International Kharkiv Symposium on Physics and Engineering of Microwaves, Millimeter and Submillimeter Waves (MSMW), Kharkiv, Ukraine, 20–24 June 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 1–4. [Google Scholar]
Ullah, Z.; Usman, M.; Gwak, J. MTSS-AAE: Multi-task semi-supervised adversarial autoencoding for COVID-19 detection based on chest X-ray images. Expert Syst. Appl. 2023, 216, 119475. [Google Scholar] [CrossRef]
Litjens, G.; Kooi, T.; Bejnordi, B.E.; Setio, A.A.A.; Ciompi, F.; Ghafoorian, M.; Van Der Laak, J.A.; Van Ginneken, B.; Sánchez, C.I. A survey on deep learning in medical image analysis. Med. Image Anal. 2017, 42, 60–88. [Google Scholar] [CrossRef]
Guo, Y.; Liu, Y.; Georgiou, T.; Lew, M.S. A review of semantic segmentation using deep neural networks. Int. J. Multimed. Inf. Retr. 2018, 7, 87–93. [Google Scholar] [CrossRef]
Rocha, J.; Cunha, A.; Mendonça, A.M. Comparison of Conventional and Deep Learning Based Methods for Pulmonary Nodule Segmentation in CT Images. In Proceedings of the EPIA Conference on Artificial Intelligence, Vila Real, Portugal, 3–6 September 2019; Springer: Berlin/Heidelberg, Germany, 2019; pp. 361–371. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
Ullah, Z.; Usman, M.; Jeon, M.; Gwak, J. Cascade multiscale residual attention cnns with adaptive roi for automatic brain tumor segmentation. Inf. Sci. 2022, 608, 1541–1556. [Google Scholar] [CrossRef]
Tyagi, S.; Talbar, S.N. CSE-GAN: A 3D conditional generative adversarial network with concurrent squeeze-and-excitation blocks for lung nodule segmentation. Comput. Biol. Med. 2022, 147, 105781. [Google Scholar] [CrossRef] [PubMed]
Wang, S.; Zhou, M.; Liu, Z.; Liu, Z.; Gu, D.; Zang, Y.; Dong, D.; Gevaert, O.; Tian, J. Central focused convolutional neural networks: Developing a data-driven model for lung nodule segmentation. Med. Image Anal. 2017, 40, 172–183. [Google Scholar] [CrossRef]
Wang, S.; Zhou, M.; Gevaert, O.; Tang, Z.; Dong, D.; Liu, Z.; Tian, J. A multi-view deep convolutional neural networks for lung nodule segmentation. In Proceedings of the 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Jeju Island, Republic of Korea, 11–15 July 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1752–1755. [Google Scholar]
Tong, G.; Li, Y.; Chen, H.; Zhang, Q.; Jiang, H. Improved U-NET network for pulmonary nodules segmentation. Optik 2018, 174, 460–469. [Google Scholar] [CrossRef]
Hancock, M.C.; Magnan, J.F. Lung nodule segmentation via level set machine learning. arXiv 2019, arXiv:1910.03191. [Google Scholar]
Chen, W.; Wang, Q.; Yang, D.; Zhang, X.; Liu, C.; Li, Y. End-to-End Multi-Task Learning for Lung Nodule Segmentation and Diagnosis. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 6710–6717. [Google Scholar]
Liu, H.; Cao, H.; Song, E.; Ma, G.; Xu, X.; Jin, R.; Jin, Y.; Hung, C.C. A cascaded dual-pathway residual network for lung nodule segmentation in CT images. Phys. Medica 2019, 63, 112–121. [Google Scholar] [CrossRef]
Chen, Q.; Xie, W.; Zhou, P.; Zheng, C.; Wu, D. Multi-Crop Convolutional Neural Networks for Fast Lung Nodule Segmentation. IEEE Trans. Emerg. Top. Comput. Intell. 2021, 6, 1190–1200. [Google Scholar] [CrossRef]
Zhang, Z.; Liu, Q.; Wang, Y. Road extraction by deep residual u-net. IEEE Geosci. Remote. Sens. Lett. 2018, 15, 749–753. [Google Scholar] [CrossRef]
Zhang, B.; Qi, S.; Wu, Y.; Pan, X.; Yao, Y.; Qian, W.; Guan, Y. Multi-Scale Segmentation Squeeze-and-Excitation UNet with Conditional Random Field for Segmenting Lung Tumor from CT Images. Comput. Methods Programs Biomed. 2022, 222, 106946. [Google Scholar] [CrossRef] [PubMed]
McNitt-Gray, M.F.; Armato III, S.G.; Meyer, C.R.; Reeves, A.P.; McLennan, G.; Pais, R.C.; Freymann, J.; Brown, M.S.; Engelmann, R.M.; Bland, P.H.; et al. The Lung Image Database Consortium (LIDC) data collection process for nodule detection and annotation. Acad. Radiol. 2007, 14, 1464–1474. [Google Scholar] [CrossRef]
Feng, X.; Yang, J.; Laine, A.F.; Angelini, E.D. Discriminative localization in CNNs for weakly-supervised segmentation of pulmonary nodules. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Quebec City, QC, Canada, 10–14 September 2017; Springer: Berlin/Heidelberg, Germany, 2017; pp. 568–576. [Google Scholar]
Wu, B.; Zhou, Z.; Wang, J.; Wang, Y. Joint learning for pulmonary nodule segmentation, attributes and malignancy prediction. In Proceedings of the 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), Washington, DC, USA, 4–7 April 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1109–1113. [Google Scholar]
Kubota, T.; Jerebko, A.K.; Dewan, M.; Salganicoff, M.; Krishnan, A. Segmentation of pulmonary nodules of various densities with morphological approaches and convexity models. Med. Image Anal. 2011, 15, 133–154. [Google Scholar] [CrossRef]
Softneta. DICOM Library—Anonymize, Share, View DICOM Files ONLINE. Available online: https://www.dicomlibrary.com/dicom/dicom-tags/ (accessed on 28 January 2023).
Amorim, P.H.; de Moraes, T.F.; da Silva, J.V.; Pedrini, H. Lung Nodule Segmentation Based on Convolutional Neural Networks Using Multi-orientation and Patchwise Mechanisms. In Proceedings of the ECCOMAS Thematic Conference on Computational Vision and Medical Image Processing, Porto, Portugal, 16–18 October 2019; Springer: Berlin/Heidelberg, Germany, 2019; pp. 286–295. [Google Scholar]
Zou, K.H.; Warfield, S.K.; Bharatha, A.; Tempany, C.M.; Kaus, M.R.; Haker, S.J.; Wells III, W.M.; Jolesz, F.A.; Kikinis, R. Statistical validation of image segmentation quality based on a spatial overlap index1: Scientific reports. Acad. Radiol. 2004, 11, 178–189. [Google Scholar] [CrossRef]
Chollet, François; Keras. 2015. Available online: https://github.com/fchollet/keras (accessed on 5 February 2023).
Jung, J.; Hong, H.; Goo, J.M. Ground-glass nodule segmentation in chest CT images using asymmetric multi-phase deformable model and pulmonary vessel removal. Comput. Biol. Med. 2018, 92, 128–138. [Google Scholar] [CrossRef] [PubMed]
Cao, H.; Liu, H.; Song, E.; Hung, C.C.; Ma, G.; Xu, X.; Jin, R.; Lu, J. Dual-branch residual network for lung nodule segmentation. Appl. Soft Comput. 2020, 86, 105934. [Google Scholar] [CrossRef]
Maqsood, M.; Yasmin, S.; Mehmood, I.; Bukhari, M.; Kim, M. An efficient DA-net architecture for lung nodule segmentation. Mathematics 2021, 9, 1457. [Google Scholar] [CrossRef]
Zhou, Z.; Gou, F.; Tan, Y.; Wu, J. A cascaded multi-stage framework for automatic detection and segmentation of pulmonary nodules in developing countries. IEEE J. Biomed. Health Inform. 2022, 26, 5619–5630. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Illustration of different types of lung nodules. The intra-nodule diversity can be noticed in columns (a–d), whereas inter-nodule variations are demonstrated in rows (i–iv).

Figure 2. The proposed framework consists of two stages. In the first stage, the user or CADe system provides the ROI along the axial axis, and the DEHA-Net (dual-encoder-based hard attention network with self-hard attention) and adaptive ROI algorithm are used to determine the ROIs in the surrounding slices to perform 3D segmentation. In the second stage, the sagittal and coronal views are created to segment the nodule. Finally, three segmentation predictions are fed into the consensus module to produce the final 3D segmentation mask.

Figure 3. Illustration of the proposed dual–encoder–based hard attention network (DEHA-Net) architecture, which consists of two encoder blocks and one decoder block to incorporate the hard attention for lung nodule segmentation.

Figure 4. Constant and adaptive regions of interest (ROIs) are depicted in a series of consecutive slices containing a nodule from column (a–f). Te segmentation of the nodule starts from column (a) with the manual ROI and ends at column (f). The blue and red bounding boxes represent constant ROIs, while the green boxes represent adaptive ROIs. (Figure credit: Usman et al. [5]).

Figure 5. Dice similarity score distribution obtained on the LIDC testing set.

Figure 6. The visual results comparison of the previous cropped-slice-based approach with Res-UNet and full-slice input-based approach with DEHA-Net.

Table 1. The quantitative results of our proposed scheme along with previously published studies in terms of mean ± standard deviation of all quantitative measures used in this study. The best performance is indicated in bold, and – indicates the absence of value.

Authors, Year	DSC (%)	SEN (%)	PPV (%)
Wang et al., 2017 [19]	82.15 ± 10.76	92.75 ± 12.83	75.84 ± 13.14
Tong et al., 2018 [21]	73.6 ± –	–	–
Liu et al., 2019 [24]	81.58 ± 11.05	87.30 ± 14.30	79.71 ± 13.59
Chen et al., 2020 [23]	86.43 ± –	–	–
Cao et al., 2020 [37]	82.74 ± 10.20	89.35 ± 11.79	79.64 ± 13.34
Usman et al., 2020 [5]	87.55 ± 10.58	91.62 ± 8.47	88.24 ± 9.52
Chen et al., 2021 [25]	81.32 ± –	92.33 ± –	74.78 ± –
Maqsood et al., 2021 [38]	81 ± –	–	–
Zhang et al., 2022 [27]	85.1 ± 7.10	82.7 ± 10.8	90 ± 10.7
Tyagi et al., 2022 [18]	80.74 ± –	85.46 ± –	80.56 ± –
Chen et al., 2022 [25]	81.32 ± –	92.33 ± –	74.78 ± –
Zhou et al., 2022 [39]	86.75 ± 10.58	89.07 ± 8.31	83.26 ± 10.21
Our Method 2023	87.91 ± 6.27	90.84 ± 8.22	89.56 ± 10.07

Table 2. Mean dice score for various types of nodules from the LIDC-IDRI testing set.

Characteristics	Characteristic Score
Characteristics	1	2	3	4	5	6
Calcification	-	-	85.99 [18]	91.25 [42]	85.98 [27]	87.77 [405]
Internal structure	87.98 [487]	78.04 [3]	-	84.13 [2]	-	-
Lobulation	91.07 [201]	86.09 [164]	84.79 [78]	85.08 [31]	87.54 [18]	-
Malignancy	89.18 [39]	87.76 [114]	79.45 [163]	89.14 [98]	91.02 [78]	-
Margin	92.08 [9]	89.81 [37]	79.25 [78]	82.99 [232]	92.97 [136]	-
Sphericity		88.77 [38]	83.22 [153]	91.61 [218]	90.24 [83]	-
Speculation	92.42 [257]	82.69 [165]	85.17 [32]	80.39 [14]	83.56 [24]	-
Subtlety	80.3 [4]	88.96 [22]	82.88 [131]	91.99 [238]	86.03 [97]	-
Texture	80.47 [11]	85.73 [18]	87.1 [26]	82.27 [107]	90.17 [330]	-

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Usman, M.; Shin, Y.-G. DEHA-Net: A Dual-Encoder-Based Hard Attention Network with an Adaptive ROI Mechanism for Lung Nodule Segmentation. Sensors 2023, 23, 1989. https://doi.org/10.3390/s23041989

AMA Style

Usman M, Shin Y-G. DEHA-Net: A Dual-Encoder-Based Hard Attention Network with an Adaptive ROI Mechanism for Lung Nodule Segmentation. Sensors. 2023; 23(4):1989. https://doi.org/10.3390/s23041989

Chicago/Turabian Style

Usman, Muhammad, and Yeong-Gil Shin. 2023. "DEHA-Net: A Dual-Encoder-Based Hard Attention Network with an Adaptive ROI Mechanism for Lung Nodule Segmentation" Sensors 23, no. 4: 1989. https://doi.org/10.3390/s23041989

APA Style

Usman, M., & Shin, Y.-G. (2023). DEHA-Net: A Dual-Encoder-Based Hard Attention Network with an Adaptive ROI Mechanism for Lung Nodule Segmentation. Sensors, 23(4), 1989. https://doi.org/10.3390/s23041989

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

DEHA-Net: A Dual-Encoder-Based Hard Attention Network with an Adaptive ROI Mechanism for Lung Nodule Segmentation

Abstract

1. Introduction

2. Related Work

3. Materials and Methods

3.1. Dataset

3.2. Data Pre-Processing

3.3. Dual-Encoder-Based Hard Attention Network with Adaptive ROI Mechanism

Dual-Encoder-Based Hard Attention Network

3.4. Adaptive ROI Algorithm

3.5. Ensembling Mechanism

4. Experimental Setup and Implementation Details

4.1. Loss Function

4.2. Implementation Details and Training Strategy

4.3. Performance Measures

5. Results and Discussion

5.1. Overall Performance Analysis

5.2. Robustness Analysis

5.3. Qualitative Analysis

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI