A Segmentation Method Based on PDNet for Chest X-rays with Targets in Different Positions and Directions

Wu, Xiaochang; Liang, Jiarui; Zhang, Yunxia; Tian, Xiaolin

doi:10.3390/app13085000

Open AccessArticle

A Segmentation Method Based on PDNet for Chest X-rays with Targets in Different Positions and Directions

by

Xiaochang Wu

¹,

Jiarui Liang

¹,

Yunxia Zhang

² and

Xiaolin Tian

^1,*

¹

School of Computer Science and Engineering, Macau University of Science and Technology, Avenida WaiLong, Taipa, Macau, China

²

School of Information Science and Technology, Shijiazhuang Tiedao University, Shijiazhuang 050043, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(8), 5000; https://doi.org/10.3390/app13085000

Submission received: 20 February 2023 / Revised: 9 April 2023 / Accepted: 14 April 2023 / Published: 16 April 2023

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

To assess the impact of the relative displacement between machines and subjects, the machine angle and the fine-tuning of the subject posture on the segmentation accuracy of chest X-rays, this paper proposes a Position and Direction Network (PDNet) for chest X-rays with different angles and positions that provides more comprehensive information for cardiac image diagnosis and guided surgery. The implementation of PDnet was as follows: First, the extended database image was sent to a traditional segmentation network for training to prove that the network does not have linear invariant characteristics. Then, we evaluated the performance of the mask in the middle layers of the network and added a weight mask that identifies the position and direction of the object in the middle layer, thus improving the accuracy of segmenting targets at different positions and angles. Finally, the active-shape model (ASM) was used to postprocess the network segmentation results, allowing the model to be effectively applied to 2014 × 2014 or higher definition chest X-rays. The experimental comparison of LinkNet, ResNet, U-Net, and DeepLap networks before and after the improvement shows that its segmentation accuracy (MIoU) are 5%, 6%, 20%, and 13% better. Their differences of losses are 11.24%, 21.96%, 18.53%, and 13.43% and F-scores also show the improved networks are more stable.

Keywords:

Position and Direction Network; cardiac segmentation; X-ray chest radiograph; image segmentation

1. Introduction

Chest X-ray (CXR) images are the most basic and commonly used images in the diagnosis and screening of heart and lung diseases. CXR images are readily available and have diagnostic and therapeutic potential for the heart, lungs, airways, spine, etc., [1,2]. Moreover, CXR images are easy to obtain, have low radiation doses, and play crucial roles in acute triage and longitudinal monitoring [3,4]. In recent years, coronavirus disease (COVID-19) has spread worldwide and become a global pandemic [5]. Efficient and effective COVID-19 detection using chest X-rays helps in early detection and reducing the spread of the disease [6]. As COVID-19 has become prevalent worldwide in recent years, the importance of CXR analysis has increased [7,8].

Cardiac information has high application value in the field of cardiovascular disease diagnosis, such as the diagnosis of valvular heart disease [9]. Automatic CXR segmentation technology has been applied in the field of medical image research for a long time. For example, it can be used to segment heart, lung, and chest related objects on the image [10,11]. In traditional image segmentation, the commonly used methods are threshold and edge detection. These methods can use prior knowledge to segment specific objects in the X-ray image [12]. Driven by the development of computer technology, deep convolution network technology has also been applied to the field of image segmentation, showing significant advantages [13]. According to the incomplete statistics of Scopus, by the end of the 20th century, more than 150 research studies on such issues had been published, and the number of studies has increased to 331 at present [14]. Most of these works have focused on segmenting single organs, with cardiac segmentation being one of the most important tasks. Moreover, the segmentation difficulty of this task is greater than that of other tasks, as reflected in the lower quality of the final segmentation result [15]. The remainder of this article is structured as follows. Section 2 reviews recent state of research on CXR segmentation. Section 3 lists and explains the basic theories utilized in this article. Section 4 describes our proposed theory and method. Section 5 shows the results of our method and a comparison with other methods and Section 6 presents the future research directions.

2. Related Work

2.1. Image Segmentation Method

There are many methods of image segmentation, which can be divided into the following categories [16,17].

(1): Rule-based methods: These methods segment images by applying certain rules, including threshold-based, edge-based, and region-based rules [18,19,20]. These methods are simple and easy to implement, and their advantages and disadvantages are derived from their simplicity. Because of the simplicity of rule-based approaches, segmentation rules are difficult to design for complex tasks.
(2): Shape-based methods [21,22]: These methods summarize the information of segmented objects or images to outline a general shape and use this information as a prior model to segment the image. These segmentation methods include active-shape model (ASM) and active appearance model algorithms. Shape-based methods have obvious effects in the segmentation of images with clear edges and regular shapes, and are slightly less effective in segmenting images with weak borders and irregular graphics.
(3): Atlas-based methods: These methods use atlases (i.e., previously segmented images) for target image registration [23]. Atlas-based algorithms rely on prior knowledge, and their rules are not particularly suitable for segmenting targets with large differences.
(4): Graph-based methods: The image is mainly divided into discrete components, and then these components are connected in a certain form based on conditional random fields to form an overall result [24,25]. This type of clustering algorithm is fast and simple to implement. Moreover, these algorithms are suitable for tasks that do not require high precision.
(5): Machine learning-based methods [26]: Traditional image segmentation methods are usually based on handcrafted features (e.g., SIFT) and classifiers (i.e., k-nearest neighbor (k-NN) or artificial neural networks). However, with the development of convolutional neural networks (CNNs), the paradigm has shifted to end-to-end methods [27,28]. Usually, large networks such as CNNs contain many nodes and parameters between the input image to be segmented and the output target segmentation result. By training and adjusting the structure and parameters of these nodes, an image segmentation network with strong recognition ability can be obtained.

Although great progress has been made in automatically segmenting the heart in CXR images, some limitations still exist, such as the effectiveness of the segmentation method decreasing when the angle and displacement of CXRs change. Furthermore, the need to use downsampling algorithms leads to irregularities and inaccuracies in the segmented edges in CXRs, which reduces the applicability of these methods in clinical settings [29,30].

2.2. Organ Segmentation

(1): Segmentation of the diaphragm

Self-supervised learning is used to segment the diaphragm in a four-chamber cardiac X-ray film, and the characteristics of the shared heart and lungs are learned, which improves the accuracy of the model. Subsequently, a three-dimensional convolutional neural network (CNN) was used to further improve the segmentation effect of the model [31]. Three convolutional neural networks (CNNs) are mainly adopted for feature extraction, classification and segmentation in the literature, and finally improve the accuracy and robustness of the model by lung localization, intensity standardization and lung edge removal in the preprocessing stage. Some data augmentation methods are introduced to increase the diversity and quantity of data, in a convolutional neural network (CNN) for automatic segmentation of the diaphragm, thereby improving the generalization performance and stability of the model [32].

(2): Segmentation of the rib

Girard et al. use a deep neural network with multiple convolutional and pooling layers to extract features from chest X-ray images and then use those features to segment the ribs [33]. To obtain the rib segmentation result, Gupta et al. use graph cut to minimize the energy between a binary background model and the rib model [34], and Gerasimon et al. improved automatic rib segmentation in chest radiographs using a structured random forest [35].

(3): Segmentation of the clavicle

A graph cut model is established using spatial and color features within a specified region of interest after preprocessing the chest radiographs [36]. The rib localization and segmentation are achieved by graph cut optimization. A method based on deep convolutional neural networks (DCNNs) improves the accuracy of the algorithm with dense prediction strategy and receptive field expansion [37]. An improved automatic rib segmentation algorithm using a structured random forest is presented where the precise rib segmentation is achieved through anchor-point localization and morphological operations. It extracts features from the shape and texture of ribs in chest radiographs and establishes a structured random forest classifier [38].

(4): Segmentation of lung fields

Since the outbreak of COVID-19, automatic segmentation of lung fields for judging the degree of infection through chest radiographs has become a hot topic. Zhao J. et al. establish an automatic lung field segmentation method for chest radiographs using deep learning and transfer learning from ImageNet [39]. The method utilizes a convolutional neural network (CNN) and Mask R-CNN, achieving high segmentation accuracy for both healthy and TB-infected cases. An example in the literature [40] proposed a novel method for lung field segmentation using a deep convolutional neural network (DCNN) with residual connections. The proposed method improved the accuracy and reduced the computational cost using residual connections in the network architecture. Additionally, the literature [41] introduces a modified atrous convolutional neural network (ACNN) for lung field segmentation. The proposed ACNN uses dilated convolutions and multi-scale feature aggregation to improve lung field segmentation performance.

(5): Segmentation of heart

For detecting and segmenting the heart in chest X-ray images, deep convolutional neural network (CNN) architectures are often used [42]. Yadav, A. et al. employ an end-to-end training process that utilizes both the image and its corresponding segmentation map to train the network. The framework comprises four pretrained models, namely, ResNet-50, VGG-16, Inception-ResNet-v2, and DenseNet-201. It obtained high accuracies in detecting and segmenting the heart in the chest X-ray images. While in the literature [43], the algorithm using a deep CNN includes two phases: (1) Heart detection, which is a binary classification task; (2) Heart segmentation, which is a pixel-wise classification task. The model is trained end-to-end using a fully convolutional network (FCN) architecture, which is based on the VGG-16 model. It achieved high accuracy for both heart detection and segmentation. Chen, Y et al. also propose a two-stage deep-learning algorithm for detecting and segmenting the heart in chest X-ray images [44]. In the first stage, the authors use a Faster region-based Convolutional Neural Network (R-CNN) to detect the heart. In the second stage, a convolutional encoder–decoder network is used for the segmentation task. It creates a new dataset of 2692 chest X-ray images and compares the results with other state-of-the-art algorithms. They report high accuracy for both detection and segmentation tasks.

(6): Segmentation of multi-organs

Y.-T. Tseng et al. use the U-Net model and the enhanced training set for multi-organ segmentation in chest X-ray images [45]. The model was trained on the input image size of 128 × 128 and validated using a dataset of 200 patients in the test. In this algorithm, data amplification methods are used, including random rotation, horizontal flip, vertical flip, etc., to increase the diversity of the training set. The final results show that the algorithm has a good prediction accuracy on multiple organs. Deep-learning methods are used for multi-organ segmenting. Y. Liang et al. use a cascading structure containing multiple U-Net networks in cardiac CT scans, each of which is used for multi-organ specific structure segmentation, and finally the result is cascaded together to obtain a complete structure segmentation [46]. The algorithm uses a large amount of data for training and uses techniques such as data amplification to increase the diversity of the training set. The final results show that the algorithm performs well in cardiac structure segmentation and measurement, including left ventricular myocardium, right ventricular myocardium, pericardium, atria, and ventricles. A. Anthimopoulos et al only use limited training data to detect multiple organs in chest X-rays with a deep-learning methods [47]. The method based on the convolutional neural network is used in the algorithm, and data amplification and transfer learning are carried out to improve the performance and effect of the model. The results of the algorithm show that convolutional neural networks have better performance and effect in the case of limited data sets than traditional machine learning methods, but more data and deeper network structures may be required for the detection of some organs to achieve better results.

Each of the above algorithms has made new breakthroughs in addressing specific issues. In a Chest X-ray, it is important considering the changes of organs, which can be caused by patient posture, the angle of incidence, disease, and other factors. Solving the problem of positional changes and rotation in image segmentation is very valuable for evaluating and diagnosing subsequent changes in patients’ conditions, but there is relatively little literature available. The original assumption that the network has linear invariant characteristics was not verified in subsequent experiments. Therefore, we propose a novel network with respect to the position and direction to improve the accuracy of heart segmenting with image position and direction in small databases.

In this paper, we use convolutional neural networks (CNNs) to solve the problem of segmenting the heart in CXR images. This paper has three main goals. First, the relationship between pixels is used instead of the pixels themselves as the network input for training, which addresses the low robustness of networks trained with small amounts of data. Then, we introduce the position and direction adjustment network (PDNet), a new architecture that incorporates different masks in the intermediate layers so that objects with different positions and orientations in CXRs can be segmented. Finally, the network segmentation results are postprocessed to reduce the network’s high hardware and software requirements to handle high-pixel CXR image segmentation problems.

3. Basic Theory

3.1. Image Segmentation Method

A convolutional neural network (CNN) is a deeplearning algorithm established to simulate the nervous system. It is a typical case is shown in Figure 1 [27]. A CNN generally includes an input layer, convolution layers, descending sampling layers (also known as pooling layers), connection layers, and an output layer.

The input of the network is every pixel value within an image. A good network model is obtained through the iterative adjustment of its parameters. The convolution calculation of multiple convolutional layers is shown in Equations (1) and (2):

H_{i} = f (H_{i - 1} \otimes W_{i} + b_{i})

(1)

H_{i} = d o w n s a m p l i n g (H)

(2)

where W_i is the parameters of the ith layer convolution kernel, B_i represents the offset vector,

⨂

represents the convolution calculation and H_i represents the feature map. f( ) is the excitation function. downsampling() is represented by the sampling function below. W and B are optimized with each successive iteration. The main frameworks of subsequent networks are primarily formed by the convolution, downsampling and connection layers. The basic networks used in this paper (LinkNet, ResNet, U-Net, and DeepLab) can also be characterized in this manner.

3.2. ASM

ASM [30] determines a potential trend of feature points according to the law of distribution present in the training image, searching for its target position in the most likely direction. It calculates the average texture

\bar{G_{i j}}

and covariance

S_{G_{i j}}

corresponding to the calibration point through the global shape model, which is the local texture model of that point. When using this model to search for the best candidate point for a feature point in an unknown image, m grayscale information is sampled along the normal direction. In this way, the search for the best matching position is transformed into finding the matching position between the sampled grayscale vector and the model vector. The corresponding Mahalanobis distance is calculated with Equation (3).

d (G_{i j}^{'}) = {(G_{i j}^{'} - \bar{G_{i j}})}^{T} S_{G_{i j}}^{- 1} (G_{i j}^{'} - \bar{G_{i j}})

(3)

Among them,

G_{i j}^{'}

is a normalized texture vector obtained by sampling near the unknown image point j, with a superscript of −1 representing the inverse operation. The point corresponding to the minimum value of

d (G_{i j}^{'})

is the best candidate point.

It first uses marked feature points for training to generate a grayscale model. As shown in Figure 2, the grayscale model (outer line) is used to find the target feature points in the target image (image object), and the suggested points are calculated iteratively in the specified direction (normalized to the model boundary). ASM is simple, fast, and requires very little training data. However, this method still has several drawbacks, including an over-reliance on initial values and a tendency to converge to local optimality. In this paper, ASM is applied to an image segmentation postprocessing task in order to highlight its advantages and find ways of avoiding its shortcomings.

4. Proposed Method

In this paper, we evaluated the target location detection performance and modify the mask parameters of the U-Net architecture to identify targets at different locations. To the best of our knowledge, no other work in the literature has investigated convolving location information into the segmented network kernel of CXRs. In general, this is a key element of state-of-the-art image segmentation networks [31]. In addition, in terms of the feature map computation, this paper proposes to adjust the filter’s field of view to capture multiscale contextual information (i.e., changing the dilation rate from 2 to 3) without reducing the spatial dimension of the feature map before the pooling layers. The 3 × 3 kernel has the same field of view as the 5 × 5 kernel but uses only nine parameters. This technique reduces the required resources without decreasing the accuracy. Network training using high-resolution or large-volume databases often requires particularly high machine performance and is not universally applicable. If the network is trained with smaller resolution images or a smaller database, the network segmentation results cannot meet the high accuracy requirements of medical images, and edge information is easily lost. Therefore, this paper proposes to use a small database and reduced data for training and the ASM algorithm for postprocessing. The segmented image was used as the initial outline of the ASM algorithm so that a better segmentation result can be obtained with less iterations. The work of the proposed method can be categorized into the following points: (a) From principle to experiment, it is demonstrated that using a network to segment images does not have linear invariance; (b) A PDNet was designed to improve the problem of reduced segmentation accuracy due to displacement and rotation of the image; (c) To ensure that the network was not overly burdened by adding a location recognition layer, and the layers are integrated; (d) In order to process images with large pixels, postprocessing was performed on the images segmented by the network.

4.1. Multidimensional Feature Extraction

The size of the convolution kernel in the network has long been an important parameter for adjustment. Researchers have adjusted the size of the convolution kernel to remove unnecessary information and extract more accurate features. However, for small databases, image features are difficult to capture with a single-size convolution kernel. Therefore, in this paper, the convolution block in U-Net was modified to obtain more comprehensive features. The specific method is to convolve the image through multichannel and multitype convolution kernels (as shown in Figure 1). Considering the applicability of the general machine configuration and the increased complexity of the optimization process for large networks, the original image and the mask image were both reduced to 128 × 128 for processing. In addition, the problem that the network dimension changes due to the increase in the size of the convolution kernel was solved by adding a normalization layer.

Figure 3b shows the original convolution block of U-Net, and Figure 3a shows the convolution block used in this paper. Compared with the original convolution block, the convolution block in this paper increases the number of convolution channels and changes the convolution kernel. The feature extraction in different channels is represented by convolution Ij (I is variable) in the figure. The different types of convolution kernels are represented by convolution iJ (J is variable). Then, we used the concat module to fuse the feature maps generated by each convolution block, which was used as the input for the next convolution layer. To reduce the network’s dependence on the initial parameters, after the convolution result passes through the concat module but before the result is used as the next network input, a batch normalization (BN) layer was added. The main function of the BN layer is to normalize the data in each batch.

4.2. PDNet

4.2.1. Theoretical Basis

CNNs and absolute positions are rarely discussed together. Since specific tasks have different requirements, it is generally believed that CNNs are translation-invariant (for classification tasks) or that translations are equivalent (for segmentation and detection tasks). The network does not need to know the absolute position of an object, and the position information was used as an a priori learning case for coordinate conversion and in pre- and postprocessing.

According to the literature, U-Net is more suitable for medical image segmentation than other frameworks. However, the experimental results showed that for small databases and high-resolution images, even if the network is trained to enhance the network segmentation ability through rotation, scaling and other methods to increase the amount of data, the segmentation accuracy of displaced and rotated images is not ideal. Moreover, the absolute position information is valuable in many tasks. For example, information about the object and the absolute position can distinguish between multiple instances.

The neural network attempts to describe the image according to Equation (4):

Y = W X + B

(4)

where X represents the input image, Y represents the segmentation result, and (W, B) represents the relationship between the image and the result.

The image is stored in the computer in pixels. During network training, pixels were used as the unit of X. However, the pixel set cannot describe the relationship between the image and the object, as shown in Equation (5):

S (X) ≢ F (X)

(5)

where S(X) represents the set of pixels of the imaged object, and F(X) represents the relationship between the object pixels. X represents the image itself. The network attempts to train a stable weight (W, B) to describe the relationship F. (W, B) should describe the relationship F as much as possible. However, (W, B) is not truly equal to F, while denoting that the two are not exactly equal. Therefore, the relationship F does not necessarily have linear invariance, as shown in Equations (6) and (7):

F (D (X) + d D (X)) ≢ F (D (X)) + F (d D (X))

(6)

F (A * D (X)) ≢ A * F (D (X))

(7)

In the formulas, F() represents the image relationship function, D(X) represents the coordinate of X, dD(X) represents the increase or decrease in the coordinate of X, and A represents a coefficient. The symbol denotes that the two sides of the equation are not always equal.

In addition, in the convolution method, the feature graph after the CNN convolution is a feature vector, and its position information is lost. Even if U-Net and other network feature graphs are no longer represented as a single column of feature graphs, the convolution kernel does not change as the image position and rotation angle change, and the influence of the position and rotation angle on the segmentation result cannot be eliminated.

4.2.2. Network Performance at Different Positions

The raw image features learned by the convolutional network can be visualized by class activation maps (CAMs) for salient regions. To effectively determine the performance in salient regions, we compared the affected regions of the network under different weight masks of the same image. In order to analyze the correlation between the location information and the network during the research, the mask set was the standardized location map, and on this basis, the randomization test was carried out. The results are shown in Figure 4. The ground reality is mainly a gradient mask in the vertical and horizontal directions. The model was verified by these two masks, and it was proved that it can learn the absolute position information on the axis. In this paper, horizontal stripes and vertical stripes (HS and VS) were also created. A Gaussian filter is applied in the design of another type of image; the following figure shows the corresponding Gaussian distribution (G) center map.

In this study, the gradient ground truth mask can be regarded as a random tag, and the corresponding input image and ground truth value are not correlated. The image content will not affect the extraction of position information, so any image data set can be selected according to the requirements during the research. In this paper, synthetic images were also generated to verify the proposed assumptions.

The influence of the feature map on typical features shows that the mask has different effects at various positions in the image. Moreover, this influence is not linear. Although the effect is not linear, the results show a positive correlation.

To better evaluate the performance of these masks in each network layer, the convolution results of each layer were compared. Due to the limited margin of this article, only a few feature maps were compared for observation (shown in Figure 5).

The feature maps of the above layers show that the retained features differ due to the different convolution kernels. Furthermore, the feature comparison charts indicate that the weights retained by the mask are not irregular. Important image information is preserved and iterated in the feature map, including horizontal and vertical information, texture information, and boundary information.

A comparison of the feature maps of the same layer as in the vertical repetition map (HS) and horizontal repetition map (VS) shows that horizontal and vertical information is retained. Moreover, by comparing the feature maps of the same layer in the vertical image (H) and horizontal image (V), we can see that the texture information is preserved. Comparing the central Gaussian map (G) with the HS and VS maps shows that the boundary information is preserved. More complete information is retained in the larger feature map, and the peak information is preserved. Regardless of the direction of the image, the extracted features of similar image textures are comparable, and the features of the image direction are retained. The image features with changed orientations (HS and VS, H and V) have symmetric relationships, even if they are not in the same layer. Due to the convolution and downsampling layers, the overall image information is difficult to preserve, and some position information and other relevant information is lost.

4.2.3. Detail of PDNet

In order to avoid network failure caused by individual feature differences, the direction and location information were added to the image. These two aspects of multi-level features can be modeled based on the transformation of the coding module F_pd. The established adjustment network (PDNet) consists of two key modules: a convolutional encoder network and an encoding module, taking Figure 6 below for relevant information. The former can be used to extract the characteristics of each abstract level during the operation. The latter module can transform the position and direction information to obtain multi-scale feature information, and set weights in the encoder network, which provides support for the adjustment of the position and direction.

The F_p_d calculation method used in this paper includes the weight parameter, which is obtained by calculating the translation and rotation parameters that are most similar to the template, combined with the multi-feature extraction detailed in Section 4.1 to make the network volume not too large. The similarity was calculated by determining the overlap between the classification result and the template. The pixel-level classification was performed by calculating the color classification that is close to a certain range, which is formulated as follows:

D^{'} = \sqrt{{(\frac{d_{c}}{m})}^{2} + {(\frac{d_{s}}{S})}^{2}}

(8)

in which d_c means the color distance, d_s refer to the spatial distance, and m is a fixed constant. S = sqrt(N/K), where N represents pixels number, and K refers to presegmented number of pixel classes.

4.3. Postprocessing

CXR images usually have 2048 × 2048 pixels. If the image pixels are all used as inputs to train the network, the hardware requirements are very high. This approach is not applicable in practical applications. If the network is trained with smaller images, the reduced resolution must be recovered by postprocessing when in real-world scene segmentation tasks. Postprocessing methods include mean multinetwork result methods, level set-based methods, domain probability-based methods, etc. Because these postprocessing methods are not sensitive to edges, especially in cases where downsampling causes details to be lost during training, they are not suitable for medical image segmentation.

This paper uses the ASM algorithm for postprocessing. The ASM algorithm is a segmentation method. It is generally believed that the ASM algorithm is too complicated and that the iteration time is too long; thus, this approach is used less often for postprocessing. However, these problems and the accuracy of the algorithm depend on the initial contour. Given an initial contour close to the edge, the ASM algorithm can synthesize the gradient and contour information to reach the optimal solution in fewer steps. The enlarged segmentation result can be used as the initial contour outline. Therefore, we used the ASM algorithm as our postprocessing method.

5. Experiment and Results Description

To verify that the network proposed in this paper can quickly and accurately segment the heart in chest X-rays, experiments were carried out on the JSRT image dataset. Although the principle is explained with U-Net, to verify the effectiveness of PDNet, LinkNet, ResNet, U-Net and DeepLab before and after transformation were used for experiments and result comparisons.

5.1. Dataset Preprocessing

The JSRT image data set (http://db.jsrt.or.jp/eng-01.php, accessed on 1 September 2018.) was developed and published by the Japanese Society of Radiological Technology, and has a high reputation and influence in the field of medical imaging. It contains 247 normal chest X-ray images and 93 diseased chest X-ray images, which are clinical images digitized using structural non-uniformity technology. Each image has a corresponding mask that annotates the position and contour of the lungs and heart, as well as ground truths and contours of the affected area. According to the relevant data, the data are from 14 hospitals around the world, and the results are confirmed by imaging doctors. The JSRT database is widely used for algorithm evaluation and performance comparison in the field of medical image processing. The information contained in the JSRT database is provided anonymously and strictly complies with relevant laws and regulations to protect the privacy and security of participants. The image is 2048 × 2048 pixels, as shown in Figure 6 above, where each image is associated with standard segmentation.

To meet the needs of network training, verify that the rotated and displaced images were segmented by the network and compare the results with those of other networks, the data were preprocessed. The specific preprocessing steps were performed as follows. First, the original CXR data were converted into image data. If the original X-ray image is used for training, the size of the network convolution kernel is too small, and substantial computations are needed, increasing the training difficulty. However, if the size of the network convolution kernel is too large, the image features are easily lost, and it is difficult to develop an effective segmentation model with a small dataset. Therefore, for the experimental data in this paper, the original and segmented images were uniformly reduced to 128 × 128 for training. The segmentation result is classified by binarization. The value of the cardiac area is 1, and the value of the background area is 0. Since we also evaluated the network’s ability to perform target segmentation at different positions and angles, the targets in the test set were shifted by 10 and 20 pixels and rotated by 10 and 20 degrees. Finally, the dataset was divided into two parts: one part was used for network training, and the other part was used for network testing. Furthermore, the training set was randomly divided into training and validation sets according to the ratio 7:3.

5.2. Experimental Environment

The experiments were conducted with the deep-learning framework PyTorch. To ensure that the network was universally applicable, a PC with a common configuration (a 2.3 GHz Core i5 CPU with 8 GB of memory and a NVidia GeForce GTX 1050 Ti graphics card) was used for training. During training, to obtain the optimal model when the model converges without overfitting due to the small amount of data, the loss on the validation set cannot be controlled to be too small. Due to the limitations of the dataset and the experimental environment, the batch size was set to 1 during training.

5.3. Evaluation Indicators

Many evaluation metrics have been applied in the field of medical image segmentation to evaluate the network prediction results from different perspectives. For CXR segmentation, this paper uses the Dice coefficient (Dice), precision (Precision), recall (Recall), and F1 score (F1 Score) as evaluation indicators. The evaluation indices are calculated as follows. The cardiac area is regarded as a positive sample, and noncardiac areas are regarded as negative samples. According to the real and predicted categories of the sample, the sample is classified as a true positive (TP), false positive (FP), true negative (TN) or false negative (FN) sample.

(1) Precision: The accuracy was calculated as the probability of a correct prediction (positive samples are predicted as positive (TP) among the samples whose prediction results are positive (positive samples are predicted as positive (TP) and negative samples are predicted as positive (FP)). The precision is formulated as

Precision = \frac{TP}{TP + FP}

(9)

In the cardiac segmentation problem, TP represents the proportion of true heart tissue in the predicted cardiac region.

(2) Recall: The recall is calculated as the probability of predicting a positive sample (a positive sample is predicted as positive (TP)) among all samples with correct prediction results (positive samples are predicted as positive (TP) and negative samples are predicted as negative (FN)). The recall is formulated as

Recall = \frac{TP}{TP + FN}

(10)

In the CXR segmentation problem, the recall differs from the precision, and the recall is the proportion of samples in all cardiac regions that are successfully predicted.

(3) F1 Score: The F1 score is the harmonic mean of the precision and recall, which is formulated as follows:

F 1 Score = \frac{2 * (Precision * Recall)}{Precision + Recall}

(11)

The precision indicates the probability that the prediction is correct in the predicted cardiac region. The precision is evaluated based on the predicted cardiac area, ignoring the true cardiac area. The recall focuses on how many samples in the real cardiac region are correctly predicted. The recall is evaluated based on the real cardiac region, ignoring the incorrectly predicted samples in the predicted region. Thus, the precision and recall are not sufficiently comprehensive. In cardiac prediction tasks, the precision and recall are equally important. The F1 score treats recall and precision equally, considering both the predicted and real cardiac areas. Thus, the final evaluation results are more comprehensive.

(4) MIoU (Dice coefficient): The Dice coefficient is an evaluation index that combines the precision and recall. Equation (12) shows that the Dice coefficient considers both the predicted and real cardiac areas. The evaluation results are relatively comprehensive. Therefore, the Dice coefficient was also used as a network evaluation index during the experiments.

MIoU (GT, PR) = \frac{2 * | GT \cap PR |}{| GT | + | PR |}

(12)

In the above formula, GT (ground truth) represents the labelled image, PR (prediction) represents the predicted image, and |GT| and |PR| are the total number of elements relative to the segmentation task.

5.4. Experiment and Result Analysis

5.4.1. Verification Experiment

To verify that the network does not have translation invariance or linear transformation invariance, which is convenient for obtaining more intuitive observations, the following images were segmented by U-Net after being displaced and rotated, and the network performance before and after database enhancement was compared. The results are presented as follows in Figure 7.

(a): Displacement

The neural network was trained using the original database images and slightly jittered images as training data. The segmentation results of the images in the test database generated by the trained network were compared with the segmentation results of the network trained on the 5-pixel and 10-pixel displacement databases for the same displacement, as shown in Figure 8.

(b): Rotation

The neural network was also trained using the original database images and slightly jittered images as training data. Then, the trained network was applied to segment the rotated images in the test database. The results are shown in Figure 9.

The above results show that, although the network can better segment displaced and rotated images after more data are added to the database, the segmentation effect is still poor, especially for larger displacements and rotations. This result verifies the theory proposed in Section 4.3. Thus, to better segment the displaced and rotated images, further experiments are needed.

It is generally believed that the linear enhancement of images, that is, adding rotated and displaced data to the training database, enables networks to segment images with different displacements and rotation angles. Therefore, we investigated this approach and added data with random displacements and rotations, and the results are presented as follows.

According to the experimental results, although image enhancement somewhat improves the image segmentation accuracy, the qualitative analysis shows that the accuracy is still poor.

Thus, images with random displacements and rotations were added to the original training dataset, and the trained network then segmented the displaced images. Because the amount of data is too small, after tens of thousands of training steps, the model still did not converge. The training parameters were MIoU = 0.707, Precision = 0.941, Recall = 0.761, and F1 score = 0.798.

5.4.2. Experiment with the New Network Architecture

Network Performance

To verify the segmentation effect of the PDNet network on cardiac images in different positions and directions, location information was added to the LinkNet, ResNet, U-Net and DeepLab networks to form PDNet, and PDNet was used to segment the displaced and rotated images. The model performance was observed, and the Dice coefficient, precision, recall, Dice loss, and F1 score were used to evaluate the performance of the improved network. The losses of the improved PDNet network on the training set were recorded, and the corresponding changes were plotted. Figure 10 shows plots of the loss and MIoU versus the number of training epochs, demonstrating that each network is improved by introducing PDNet during network training.

The loss curves in Figure 10 show that PDNet with location information has a stable declining trend during training. The accuracy of the MIoU curve of the model indicates that PDNet is stable and converges, showing the generalizability of the network.

Figure 11 shows the cardiac segmentation results of the displaced and rotated X-ray images obtained by PDNet, which is composed of various networks and position information.

Three groups of X-ray images with different rotations and displacements were selected from the test set, and the heart was segmented by PDNet with location information. The cardiac segmentation results are shown in Figure 11. Figure 11a1–a3 shows the segmentation results of the original X-ray image by various networks. Figure 11b1–b3 shows the image displaced by 10 pixels and its segmentation results. Figure 11c1–c3 shows the image displaced by 20 pixels and its segmentation results. Figure 11d1–d3 presents the image obtained by rotating by 10 degrees and its segmentation results. Figure 11e1–e3 depicts the image obtained by rotating by 20 degrees and its segmentation results. The first column shows the original image, and the second column depicts the ground truth or displaced/rotated image, with the white region indicating the cardiac area. The third column shows the cardiac area segmented by PDNet modified based on LinkNet, and the third column presents the cardiac area segmented by PDNet modified based on ResNet. The fourth column shows the cardiac area segmented by PDNet modified based on U-Net, and the fifth column shows the cardiac area segmented by PDNet modified based on DeepLab. According to the figure, the cardiac image segmentation results obtained by PDNet are based on a variety of network transformations. The accuracy and shape of the segmentation results are similar to those of the ground truth. Compared with the results obtained by the network without modification, PDNet obtains considerably improved results, which shows that PDNet is effective. In particular, for images with larger displacements and rotation angles, compared with the previous experiment, a better model was obtained on the small database, even without data enhancement. The different PDNet models obtain segmentation results that are close to the ground truth after the images are displaced and rotated. The effectiveness of the model is not only shown by a specific network but also by the comparison of the results of PDNet transformed by various networks, which obtain essentially consistent and effective segmentation results. The image shape and more comprehensive data are presented by the data comparison in Table 1.

The PQ column represents the parameter quantity of the network. The PMIoU column shows the results of cardiac segmentation obtained from the other literature [50], serving as the basic data for this article. When using these networks in this article, for the convenience of comparison, some parameters have been unified. After parameter unification, the preliminary results are similar to the data in the reference literature, referring to the MIoU column and the PMIoU column for comparison. To demonstrate the effectiveness of the method proposed in this article, commonly used network architectures were used instead of modified networks for specific tasks.

In the table,

ξ

represents one of the four networks, which is LinkNet, ResNet, U-Net, or DeepLab. The

ξ

-row shows the test set segmentation results of the original network after training with the original database without displacement and rotation. The

ξ

-DR row shows the segmentation results of the test set when the original network was trained on the enhanced dataset, which included random displacements within 20 pixels and random rotations within 20 degrees. The PD-

ξ

-No row presents the test set segmentation results of PDNet modified on the basis of the

ξ

network after training. The PD-

ξ

-DR row displays the test set segmentation results of PDNet modified based on the

ξ

network when images with random displacements within 20 pixels and random rotations within 20 degrees were added to the training set.

The precision and recall are both positive indicators. Equations (9) and (10) indicate that the precision focuses on comparing the whole positive area, while the recall focuses on comparing the correctly segmented area. Regardless of which region is compared, the precision and recall have the worst performance with the unmodified network after training on the enhanced data, with the values greater than 80%, while the results of the other networks are more than 90%. Compared with the precision and recall of the original network’s segmentation on the databases without displacement or rotation, the results of the modified network are not significantly different than those of the original database or the rotated and displaced databases, indicating that the modified network has linear invariance.

The segmentation should be controlled as much as possible to ensure that the region remains within the ground truth area with as similar a shape as possible. The positive samples are not sufficient to illustrate the network segmentation accuracy. According to Equation (12), the MIoU is a better metric for assessing accuracy. By comparing the IoU of each network, we find that the results of the PDNet-improved networks are quite different than those of the original network. Compared with the original network, the MIoU (

ξ

-No and PD-

ξ

-No) of the original data segmentation by the modified PDNet is approximately the same as that of the unmodified network, with values of approximately 1%. However, compared with the database after rotation and displacement, the IoUs of the data-enhanced network and the transformed networks (

ξ

-DR and PD-

ξ

-DR) differ substantially. The difference between the data-enhanced and transformed networks is at least 5%, with the largest difference being 20%. The differences are 5%, 6%, 20%, and 13%. After adding the position information to the original network, PDNet’s performance on the rotated and displaced data returns to the original high performance. By comparing the

ξ

-No and PD-

ξ

-DR columns, we find that the networks perform better than the original network. This advantage cannot be achieved by increasing the amount of data, indicating that the network does not have linear invariance, which can be verified by experiments.

The convergence of the network can be seen from the Dice loss in the table. By comparing the

ξ

-No and

ξ

-DR columns, we find that all networks converge when trained on the database without rotation or displacement. When the networks are trained on the data after displacement and rotation, that is, after data enhancement, the losses are 11.24%, 21.96%, 18.53%, and 13.43%, with a minimum loss of 11.24% and a maximum loss of 21.96%. The verification set results show that the network did not converge. However, with the modified PDNet network, regardless of which network is used as the basis, the model converges when trained on the data in the original database or the rotated/displaced data, as shown by the data in the PD-

ξ

-DR and PD-

ξ

-No columns. The F1 scores also illustrate this problem. The F1 score of PDNet is above 0.9, and the highest value is 0.95. The F1 score of the unmodified network is generally above 0.8, and the lowest value is 0.81. There is also a high F1 score of 0.91; however, the network with this value has a loss of 0.21. A comparison of the above data indicates that the original network does not have rotation invariance. Even after the database is enhanced, the network does not have good segmentation performance on small databases. It is generally believed that the network has linear invariance, which may be an illusion caused by training the network on a large amount of data. Thus, position information should be added to the network to increase its stability and robustness.

5.4.3. Postprocessing Experiment

The accurate segmentation of the cardiac region is important for diagnoses and subsequent image registration. Therefore, it is necessary to complete the segmentation task in images with a higher pixel resolution. If the network is trained on the original 2048 × 2048 X-ray images, the hardware requirements are too high. Even the labeled ground truth image is 1024 × 1024. All of the networks in this paper were trained and tested by scaling down the images to 128 × 128 pixels. The contours of the extracted region predicted by the network segmentation are labeled as the heart in the higher pixel image. Therefore, it is necessary to postprocess the images segmented by PDNet. If the binary region is directly enlarged, the predicted segmentation edge is distorted and difficult to fit to the real edge, yielding an insufficient segmentation result. The segmented images are postprocessed using the ASM algorithm as follows, taking the last network as an example.

As shown in Figure 12, since the image is large, only the key areas of the image are extracted to improve the visualization. The green line in Figure 12 shows the directly enlarged outline of the cardiac region segmented by the PDNet network. The blue line in Figure 12 shows the postprocessed prediction result of the ASM algorithm. The red line in Figure 12 shows the ground truth cardiac region. A comparison indicates that although all of the image areas are similar, the postprocessed curve is closer to the ground truth than the enlarged outline; there is no jagged shape, and the contour fits the heart well. Therefore, the use of the ASM algorithm for postprocessing can achieve accurate segmentation of cardiac regions in large images.

6. Conclusions

In this work, PDNet is proposed to address the problem of cardiac region segmentation in X-ray images. PDNet first added different types of convolution kernels to obtain more detailed feature maps. Then, different mask parameters were added to the convolution block to strengthen the recognition of different positions and rotation angles and improve the efficiency of feature learning. Finally, the ASM algorithm was used to postprocess the segmentation results, allowing the network to segment large and high-resolution X-ray images. Based on these improvements, the cardiac segmentation effects of various networks are significantly enhanced for large X-ray images with different positions and rotation angles, and the segmentation effects are more accurate than those of traditional networks. The network is simple to use and can be applied to identify input X-ray images in computers with typical hardware configurations. The network accurately segments and labels the cardiac region in the image, which provides a new solution for cardiac region diagnoses in the future. This research also has certain application limitations, and needs to be improved in the following aspects. First, improve the accuracy of data annotation and expand the sample size. Due to the influence of confidentiality factors, it is difficult to collect medical image data. Moreover, the displacement and rotation processing of the X-ray image is completed by a machine, rather than acquiring real images, which may ignore spatial structure information.

Author Contributions

Conceptualization, X.T.; methodology, X.W.; software, J.L.; validation, Y.Z.; writing—original draft preparation, X.W.; writing—review and editing, Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Hebei Provincial Natural Science Foundation (No. F2022210023).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are openly available in JSRT image data set at http://db.jsrt.or.jp/eng-01.php, accessed on 20 February 2023.

Conflicts of Interest

The authors declare no conflict of interest.

References

Naskinova, I. On convolutional neural networks for chest x-ray classification. In Proceedings of the IOP Conference Series: Materials Science and Engineering; IOP Publishing: Bristol, UK, 2021; Volume 1031, p. 012075. [Google Scholar]
Van Ginneken, B.; Romeny, B.T.H.; Viergever, M.A. Computer-aided diagnosis in chest radiography: A survey. IEEE Trans. Med. Imaging 2001, 20, 1228–1241. [Google Scholar] [CrossRef] [PubMed]
NHS England. NHS England leads the National Health Service (NHS). In Diagnostic Imaging Dataset Annual Statistical Release; Technical Report; National Health Service Digital: London, UK, 2018. [Google Scholar]
Laserson, J.; Lantsman, C.D.; Cohen-Sfady, M.; Tamir, I.; Goz, E.; Brestel, C.; Bar, S.; Atar, M.; Elnekave, E. Textray: Mining clinical reports to gain a broad understanding of chest x-rays. In Proceedings of the Medical Image Computing and Computer Assisted Intervention–MICCAI 2018: 21st International Conference, Granada, Spain, 16–20 September 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 553–561. [Google Scholar]
Tahsin Meem, A.; Monirujjaman Khan, M.; Hossain, M.T.; Islam, S.; Haque, A. Prediction of COVID-19 based on chest X-ray images using deep learning with CNN. Comput. Syst. Sci. Eng. 2022, 41, 1223–1240. [Google Scholar] [CrossRef]
Abbas, A.; Abdelsamea, M.M.; Gaber, M.M. Classification of COVID-19 in chest X-ray images using DeTraC deep convolutional neural network. Appl. Intell. 2021, 51, 854–864. [Google Scholar] [CrossRef]
Kumar, S.; Mallik, A. COVID-19 detection from chest X-rays using trained output-based transfer learning approach. Neural Process. Lett. 2022, 56, 1–24. [Google Scholar] [CrossRef]
Rahman, T.K.; Khandakar, A.Y.S.; Siddiquee, M.M.; Islam, K.R.; Islam, M.M. Exploring the effect of image enhancement techniques on COVID-19 detection using chest X-ray images. Comput. Biol. Med. 2021, 132, 104319. [Google Scholar] [CrossRef]
Maduskar, P.; Hogeweg, L.; de Jong, P.A.; Peters-Bax, L.; Dawson, R.; Ayles, H.; Sánchez, C.I.; van Ginneken, B. Cavity contour segmentation in chest radiographs using supervised learning and dynamic programming. Med. Phys. 2014, 41, 071912. [Google Scholar] [CrossRef]
Priya, R.K.; Bimani, A.A.; Bhupathyraaj, M.; Ahamed, S.; Arputhanantham, S.S.; Chacko, S. Fuzzy-entropic approach on chest X-ray region of interest segmentation-heart position shifting using differential evolution optimization and multi-level segmentation technique with cloud computing. Soft Comput. 2022, 27, 1639–1650. [Google Scholar] [CrossRef]
Zaidi, S.Z.Y.; Akram, M.U.; Jameel, A.; Alghamdi, N.S. Lung segmentation-based pulmonary disease classification using deep neural networks. IEEE Access 2021, 9, 125202–125214. [Google Scholar] [CrossRef]
Yang, W.; Liu, Y.; Lin, L.; Yun, Z.; Lu, Z.; Feng, Q.; Chen, W. Lung field segmentation in chest radiographs from boundary maps by a structured edge detector. IEEE J. Biomed. Health Inform. 2017, 22, 842–851. [Google Scholar] [CrossRef] [PubMed]
Ait Nasser, A.; Akhloufi, M.A. A review of recent advances in deep learning models for chest disease detection using radiography. Diagnostics 2023, 13, 159. [Google Scholar] [CrossRef] [PubMed]
Hogeweg, L.; Sánchez, C.I.; de Jong, P.A.; Maduskar, P.; van Ginneken, B. Clavicle segmentation in chest radiographs. Med. Image Anal. 2012, 16, 1490–1502. [Google Scholar] [CrossRef] [PubMed]
Dai, W.; Dong, N.; Wang, Z.; Liang, X.; Zhang, H.; Xing, E.P. Scan: Structure correcting adversarial network for organ segmentation in chest x-rays. In Proceedings of the Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, 20–24 September 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 263–273. [Google Scholar]
Minaee, S.; Boykov, Y.Y.; Porikli, F.; Plaza, A.J.; Kehtarnavaz, N.; Terzopoulos, D. Image segmentation using deep learning: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 3523–3542. [Google Scholar]
Matsuyama, E. A novel method for automated lung region segmentation in chest X-ray images. J. Biomed. Sci. Eng. 2019, 12, 165–175. [Google Scholar] [CrossRef]
Xu, X.; Tian, H.; Zhang, X.; Qi, L.; He, Q.; Dou, W. DisCOV: Distributed COVID-19 detection on X-ray images with edge-cloud collaboration. IEEE Trans. Serv. Comput. 2022, 15, 1206–1219. [Google Scholar] [CrossRef]
Rahman, M.F.; Tseng, T.L.B.; Pokojovy, M.; Qian, W.; Totada, B.; Xu, H. An automatic approach to lung region segmentation in chest X-ray images using adapted U-Net architecture. In Proceedings of the Medical Imaging 2021: Physics of Medical Imaging, Online, 15 February 2021; SPIE: Bellingham, WA, USA, 2021; Volume 11595, pp. 894–901. [Google Scholar]
Prabhakaran, N.; Prasad, S.A.; Kamali, M.; Sabarinathan, C.; Chandra, I.; Prabhu, V. Predictive analysis of covid-19 symptoms with CXR imaging and optimize the X-ray imaging using segmentation thresholding algorithm-an evolutionary approach for bio-medical diagnosis. Int. J. Pharmacol. 2022, 18, 644–656. [Google Scholar] [CrossRef]
Afzali, A.; Babapour Mofrad, F.; Pouladian, M. Inter-patient modelling of 2D lung variations from chest X-ray imaging via Fourier descriptors. J. Med. Syst. 2018, 42, 233. [Google Scholar]
Mansoor, A.; Cerrolaza, J.J.; Perez, G.; Biggs, E.; Okada, K.; Nino, G.; Linguraru, M.G. A generic approach to lung field segmentation from chest radiographs using deep space and shape learning. IEEE Trans. Biomed. Eng. 2019, 67, 1206–1220. [Google Scholar] [CrossRef]
Cabezas, M.; Oliver, A.; Lladó, X.; Freixenet, J.; Cuadra, M.B. A review of atlas-based segmentation for magnetic resonance brain images. Comput. Methods Programs Biomed. 2011, 104, e158–e177. [Google Scholar] [CrossRef]
Mao, C.; Yao, L.; Luo, Y. ImageGCN: Multi-relational image graph convolutional networks for disease identification with chest x-rays. IEEE Trans. Med. Imaging 2022, 41, 1990–2003. [Google Scholar] [CrossRef]
Qi, B.; Zhao, G.; Wei, X.; Du, C.; Pan, C.; Yu, Y.; Li, J. GREN: Graph-regularized embedding network for weakly-supervised disease localization in X-ray images. IEEE J. Biomed. Health Inform. 2022, 26, 5142–5153. [Google Scholar] [CrossRef]
Oliveira, H.; Mota, V.; Machado, A.M.; dos Santos, J.A. From 3D to 2D: Transferring knowledge for rib segmentation in chest X-rays. Pattern Recognit. Lett. 2020, 140, 10–17. [Google Scholar] [CrossRef]
Novikov, A.A.; Lenis, D.; Major, D.; Hladvka, J.; Wimmer, M.; Bühler, K. Fully convolutional architectures for multiclass segmentation in chest radiographs. IEEE Trans. Med. Imaging 2018, 37, 1865–1876. [Google Scholar] [CrossRef]
Cao, F.; Zhao, H. Automatic lung segmentation algorithm on chest X-ray images based on fusion variational auto-encoder and three-terminal attention mechanism. Symmetry 2021, 13, 814. [Google Scholar] [CrossRef]
Garcia-Garcia, A.; Orts-Escolano, S.; Oprea, S.; Villena-Martinez, V.; Martinez-Gonzalez, P.; Garcia-Rodriguez, J. A survey on deep learning techniques for image and video semantic segmentation. Appl. Soft Comput. 2017, 57, 115–130. [Google Scholar] [CrossRef]
Chondro, P.; Yao, C.Y.; Ruan, S.J.; Chien, L.C. Low order adaptive region growing for lung segmentation on plain chest radiographs. Neurocomputing 2017, 267, 259–270. [Google Scholar] [CrossRef]
Ma, K.; Zhang, J.; Wang, S.; Kong, D.; Zhou, S. Fully automated segmentation of the diaphragm in four-chamber-view chest X-ray images using deep learning. Med. Image Anal. 2021, 67, 101825. [Google Scholar]
Zhang, J.; Ma, K.; Wang, S.; Kong, D.; Zhou, S.; Xie, Y. Deep learning-based fully automatic segmentation of the diaphragm in chest radiographs. J. Digit. Imaging 2020, 33, 169–178. [Google Scholar]
Girard, A.; Phang, R.; Cloutier, A.; Kim, A.; Cheriet, F. Automated rib segmentation in chest radiographs using deep learning techniques. IEEE Trans. Med. Imaging 2020, 39, 710–720. [Google Scholar]
Gupta, A.; Jaiswal, S.; Gupta, S. Rib cage segmentation in chest radiographs using graph cuts. J. Digit. Imaging 2019, 32, 855–862. [Google Scholar]
Gerasimon, K.; Rasmy, L.; El-Kholi, N.; El-Kenawy, E. Improved automatic rib segmentation in chest radiographs using a structured random forest. J. Med. Syst. 2019, 43, 153. [Google Scholar]
Liu, C.; Qi, X.; Ruan, S.; Zhao, W. An accurate and interpretable convolutional neural network for automatic clavicle segmentation on chest radiographs. IEEE J. Biomed. Health Inform. 2020, 24, 289–298. [Google Scholar]
Gao, S.; Zhang, J.; Xu, Y.; Liu, Y.; Zhang, S.; Zhou, S. Clavicle segmentation in chest x-ray images based on deep convolutional neural networks and generate-chop-refine framework. Med. Image Anal. 2021, 73, 102144. [Google Scholar]
Wang, N.; Cai, R.; Li, B.; Li, L.; Zhou, Y.; Huang, C.; Chen, Q. A novel data augmentation framework for clavicle segmentation in chest radiographs. J. Digit. Imaging 2020, 33, 708–717. [Google Scholar]
Zhao, J.; Liu, Z.; Qu, H.; Luo, J.; Dai, K.; Wang, J. Automatic segmentation of lung fields from chest radiographs with deep learning: Transfer learning from ImageNet. J. Digit. Imaging 2021, 34, 324–335. [Google Scholar]
Wang, B.; Chen, X.; Chen, H.; Huang, Y.; Liu, J. Fast and accurate lung field segmentation in chest radiographs using deep convolutional neural networks. Med. Phys. 2020, 47, 3663–3673. [Google Scholar]
Li, J.; Zhang, J.; Huang, H.; Zhang, Y. Lung field segmentation in chest radiographs using modified atrous convolutional neural network. Int. J. Comput. Assist. Radiol. Surg. 2020, 15, 1341–1349. [Google Scholar]
Yadav, A.; Bhatia, R. A deep learning framework for automatic detection and segmentation of heart in chest x-ray images. Med. Image Anal. 2021, 68, 101905. [Google Scholar]
Islam, M.A.; Islam, M.T.; Mahmud, M.S.; Khandakar, A. A novel approach for accurate heart detection and segmentation in chest radiographs using deep learning models. Comput. Biol. Med. 2020, 121, 103801. [Google Scholar]
Chen, Y.; Zhang, Y.; Zhang, J.; Xia, Y.; Sun, X. Automatic detection and segmentation of heart from chest radiographs using a two-stage deep learning approach. IEEE Trans. Med. Imaging 2020, 39, 2946–2957. [Google Scholar]
Tseng, Y.-T.; Shih, Y.-Y.; Tsai, J.-H. Multi-organ segmentation in chest x-ray images using U-Net with an augmented training set. IEEE Access 2021, 9, 3037–3046. [Google Scholar]
Liu, J.M.; Cai, J.Z.; Chellamuthu, K.; Bagheri, M.; Lu, L.; Summers, R.M. Cascaded Coarse-to-fine Convolutional Neural Networks for Pericardial Effusion Localization and Segmentation on CT Scans. In Proceedings of the 15th IEEE International Symposium on Biomedical Imaging (ISBI), Washington, DC, USA, 4–7 April 2018; pp. 1092–1095. [Google Scholar]
Anthimopoulos, A.; Christodoulidis, L.; Ebner, S. Multi-organ detection in chest radiographs with limited training data. Sci. Rep. 2020, 10, 1965. [Google Scholar]
Yu, X.; Huang, J.; Liu, H.; Qian, X.; Wang, S. 3D Deep Dilated-Convolutional Neural Network for Cardiac MRI Volume Segmentation. J. Healthc. Eng. 2021, 6, 1–14. [Google Scholar]
Yan, K.; Wang, X.; Lu, L.; Xia, J.; Yang, L. A heart segmentation method based on improved ResNet and U-Net. J. X-ray Sci. Technol. 2019, 27, 629–641. [Google Scholar]
Zhu, X.; Yao, Q.; Wang, Y.; Pan, Y. An automatic deep learning-based method for cardiac magnetic resonance image segmentation. Magn. Reson. Imaging 2020, 73, 78–85. [Google Scholar]

Figure 1. The structure of a convolutional neural network [27].

Figure 2. Simple principle ASM.

Figure 3. Convolution block. (a) Convolution block used in this paper. (b) Original convolution block.

Figure 4. Gradient-like ground-truth position maps. (a–e) Different weight masks and their weighted graphs.

Figure 5. Feature maps. (a–e) Different weight masks in the feature map of the middle layer.

Figure 6. PDNet network framework.

Figure 7. The segmentation results with the original network.

Figure 8. The segmentation results of the displaced image with the original network.

Figure 9. The segmentation results of the rotated image with the original network.

Figure 10. Loss and MIoU of the four networks during training.

Figure 11. Three groups (A–C) of segmentation results.

Figure 12. Postprocessing segmentation results. (The green line: directly enlarged outline. The blue line: the postprocessed result. The red line: the ground truth.)

Table 1. Evaluation results of the network (Unit: %, keep two decimals).

Evaluation Indices	PQ	PMIoU	Precision	Recall	MIoU	F1 Score	Dice Loss
LinkNet-No	16.7 M	88.2 [48]	94.95	93.43	89	94.16	5.98
LinkNet-DR			91.29	87.33	81.03	89.1	11.24
PD-LinkNet-No			94.07	93.34	88.07	93.62	6.43
PD-LinkNet-DR			92.65	92.88	86.43	92.67	7.54
ResNet-No	25.5 M	89.1 [49]	94.89	94.04	89.5	94.45	16.7
ResNet-DR			92.55	90.55	84.35	91.44	21.96
PD-ResNet-No			90.97	95.14	86.88	92.91	8.34
PD-ResNet-DR			94.82	94.83	90.05	94.74	5.51
U-Net-No	34.4 M	89.4 [49]	95.38	92.43	88.51	93.87	6.18
U-Net-DR			79.4	84.52	70	81.52	18.53
PD-U-Net-No			95.05	92.42	88.1	93.64	6.37
PD-U-Net-DR			94.92	94.7	90.06	94.74	5.26
DeepLab-No	64.7M	91.5 [50]	93.64	94.68	89.01	94.14	5.94
DeepLab-DR			85.99	87.84	77.34	86.73	13.43
PD-DeepLab-No			95.16	93.15	88.86	94.07	5.94
PD-DeepLab-DR			94.9	95.42	90.68	95.09	4.91

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wu, X.; Liang, J.; Zhang, Y.; Tian, X. A Segmentation Method Based on PDNet for Chest X-rays with Targets in Different Positions and Directions. Appl. Sci. 2023, 13, 5000. https://doi.org/10.3390/app13085000

AMA Style

Wu X, Liang J, Zhang Y, Tian X. A Segmentation Method Based on PDNet for Chest X-rays with Targets in Different Positions and Directions. Applied Sciences. 2023; 13(8):5000. https://doi.org/10.3390/app13085000

Chicago/Turabian Style

Wu, Xiaochang, Jiarui Liang, Yunxia Zhang, and Xiaolin Tian. 2023. "A Segmentation Method Based on PDNet for Chest X-rays with Targets in Different Positions and Directions" Applied Sciences 13, no. 8: 5000. https://doi.org/10.3390/app13085000

APA Style

Wu, X., Liang, J., Zhang, Y., & Tian, X. (2023). A Segmentation Method Based on PDNet for Chest X-rays with Targets in Different Positions and Directions. Applied Sciences, 13(8), 5000. https://doi.org/10.3390/app13085000

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Segmentation Method Based on PDNet for Chest X-rays with Targets in Different Positions and Directions

Abstract

1. Introduction

2. Related Work

2.1. Image Segmentation Method

2.2. Organ Segmentation

3. Basic Theory

3.1. Image Segmentation Method

3.2. ASM

4. Proposed Method

4.1. Multidimensional Feature Extraction

4.2. PDNet

4.2.1. Theoretical Basis

4.2.2. Network Performance at Different Positions

4.2.3. Detail of PDNet

4.3. Postprocessing

5. Experiment and Results Description

5.1. Dataset Preprocessing

5.2. Experimental Environment

5.3. Evaluation Indicators

5.4. Experiment and Result Analysis

5.4.1. Verification Experiment

5.4.2. Experiment with the New Network Architecture

Network Performance

5.4.3. Postprocessing Experiment

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI