Enhanced Deep-Learning-Based Automatic Left-Femur Segmentation Scheme with Attribute Augmentation

Apivanichkul, Kamonchat; Phasukkit, Pattarapong; Dankulchai, Pittaya; Sittiwong, Wiwatchai; Jitwatcharakomol, Tanun

doi:10.3390/s23125720

Open AccessArticle

Enhanced Deep-Learning-Based Automatic Left-Femur Segmentation Scheme with Attribute Augmentation

by

Kamonchat Apivanichkul

¹,

Pattarapong Phasukkit

^1,2,*

,

Pittaya Dankulchai

³,

Wiwatchai Sittiwong

³ and

Tanun Jitwatcharakomol

³

¹

School of Engineering, King Mongkut’s Institute of Technology Ladkrabang, Bangkok 10520, Thailand

²

King Mongkut Chaokhunthahan Hospital (KMCH), King Mongkut’s Institute of Technology Ladkrabang, Bangkok 10520, Thailand

³

Division of Radiation Oncology, Department of Radiology, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand

^*

Author to whom correspondence should be addressed.

Sensors 2023, 23(12), 5720; https://doi.org/10.3390/s23125720

Submission received: 25 April 2023 / Revised: 27 May 2023 / Accepted: 14 June 2023 / Published: 19 June 2023

(This article belongs to the Special Issue Artificial Intelligence-Based Applications in Medical Imaging)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

This research proposes augmenting cropped computed tomography (CT) slices with data attributes to enhance the performance of a deep-learning-based automatic left-femur segmentation scheme. The data attribute is the lying position for the left-femur model. In the study, the deep-learning-based automatic left-femur segmentation scheme was trained, validated, and tested using eight categories of CT input datasets for the left femur (F-I–F-VIII). The segmentation performance was assessed by Dice similarity coefficient (DSC) and intersection over union (IoU); and the similarity between the predicted 3D reconstruction images and ground-truth images was determined by spectral angle mapper (SAM) and structural similarity index measure (SSIM). The left-femur segmentation model achieved the highest DSC (88.25%) and IoU (80.85%) under category F-IV (using cropped and augmented CT input datasets with large feature coefficients), with an SAM and SSIM of 0.117–0.215 and 0.701–0.732. The novelty of this research lies in the use of attribute augmentation in medical image preprocessing to enhance the performance of the deep-learning-based automatic left-femur segmentation scheme.

Keywords:

deep learning; automatic segmentation; femur bone; U-Net; attribute augmentation; cropping

1. Introduction

1.1. Background

Medical image segmentation involves segmenting or annotating regions of interest in a medical image such as conventional X-ray, computed tomography (CT) scans, magnetic resonance imaging (MRI), and ultrasound images. In medical image segmentation, a medical image is divided into distinct regions that correspond to anatomical structures of interest, with the goal to divide anatomical structures within the image into separate regions for medical applications, particularly in cancer radiotherapy planning [1,2].

In radiotherapy planning, image segmentation (or contour delineation) is required to distinguish between tumors and organs at risk (OARs), which are healthy organs or tissues that may be adversely affected by the radiation treatment. This is achieved in order to determine the optimal radiation dose and direction of the beam in the treatment to shrink tumors or eradicate cancer cells while sparing the nearby healthy tissue [3,4,5,6]. As a result, contour delineation is vital for effective cancer treatment [7,8]. Evidence shows that 48.3% of all cancer cases are treated by radiotherapy [9], suggesting that image segmentation is of vital importance in the treatment of cancers.

Contour delineation or segmentation was traditionally carried out by manually tracing the boundaries before the development of algorithm-driven semi-automatic technology for medical image segmentation. However, the disadvantage of the manual and semi-automatic medical image segmentation methods is the overreliance on physicians’ experience, making the results subject to intra- or inter-observer variability [10,11,12].

Manual and semi-automatic contour delineation is time-consuming and labor-intensive because one scan dataset (i.e., one medical image) consists of hundreds of slices. The contouring time varies depending on factors such as tumor volume, the number of OARs, the complexity of regions of interest, and the beam angle [7]. The contouring time lasts between 30 min–12 h depending on cancers and stages, e.g., 5–10 h for manual contouring of abnormal lesions using the neuroimaging scans [13], 1.5–3 h for head and neck radiotherapy [14], and 3 h for the intensity-modulated radiation therapy treatment planning [15]. The lengthy contouring or segmentation time results in a backlog of cancer cases, making it difficult to provide timely treatment to all patients who need it.

To address the issue of lengthy contouring time, a fully automatic medical image segmentation scheme based on deep learning technology has been developed. Specifically, deep-learning-based automatic medical image segmentation improves efficiency (i.e., shortened radiotherapy planning time) and reliability (i.e., improved segmentation accuracy) while reducing the workload of physicians [16].

Deep learning plays a significant role in the Fourth Industrial Revolution (i.e., Industry 4.0), specifically in healthcare [17]. Of particular relevance is convolutional neural network (CNN), which is a class of artificial neural network that is commonly utilized in medical image segmentation due to the algorithm’s applicability to various image quality, shapes, and sizes. Moreover, CNN involves blocks of series of operation layers (e.g., convolutional, pooling, and transposed convolution layers), resulting in excellent pattern recognition [18]. CNN automatically extracts relevant features from the training dataset for a required task by iteratively adjusting its weights with backpropagation. As a result, the CNN-based outcomes are superior to the manual outcomes [19]. In addition to CNN-based methods, transformer-based methods have also emerged as prominent approaches in medical image segmentation. These methods leverage the attention mechanism to selectively weigh the importance of different parts of the input data. In [20], it was identified that the standard measurement of the distance from the tumor’s lowest boundary to the anal verge is insufficient. As a result, a novel method was proposed to automatically measure the distance to anal verge (DTAV), accompanied by the design of a boundary-guided transformer for accurate rectum and tumor segmentation. In [21], recognizing the transformer model’s ability to capture extensive global information and its reliance on pre-training on large-scale datasets, a hybrid CNN-transformer network (HCTNet) was proposed, consisting of transformer encoder blocks (TEBlocks) in the encoder and a spatial-wise cross Attention (SCA) module in the decoder. Furthermore, ref. [22] proposed a hierarchical hybrid vision transformer (H2Former) by integrating the merits of CNNs, multi-scale channel attention, and transformers for medical segmentation.

1.2. Related Works

Despite the high reliability of the manual segmentation of medical images, the method is subject to the intra- and inter-observer variability. In response to this issue, semi-automatic segmentation techniques have been developed, which integrate mathematical algorithms such as thresholding, level set [23], seeded region growing, localized region-based active contour model, clustering-based methods, K-means, and clustering-based methods [24]. While these techniques have improved efficiency and reliability [25,26], they still suffer from inter-observer variability and are unsuited for certain medical images.

In order to address these limitations, deep learning models have been integrated into automatic medical image segmentation technology. Preprocessing plays a critical role in enhancing the performance of deep-learning-based automatic segmentation by transforming data into a format that is more readily processed [27]. The preprocessing methods proposed to enhance the performance of deep-learning-based automatic segmentation include window leveling, filtering, matching, histogram techniques [28], T1, FLAIR (skull stripping) [29], wavelet decomposition, local binary patterns [30], region of interest (ROI) selection, bias field correction, resampling methods [31], normalization [28,29,30,31], and crop ROI [32,33].

Evidence shows that the segmentation accuracy is significantly enhanced through a combination of preprocessing techniques [28,29,31,34]. Therefore, this study utilizes both conventional and novel preprocessing techniques to enhance image contrast and detect target organs, including window leveling, histogram projection, cropping, and attribute augmentation.

However, the conventional preprocessing techniques are less useful in certain cases, such as detecting or locating the target organs when the Hounsfield units (HU) of the organs of interest are similar or identical to those of the nearby organs and/or when the target organs are paired organs, e.g., femurs, kidneys, and lungs. As a result, attribute augmentation can be used to enhance the performance of the deep-learning-based automatic left-femur segmentation scheme. There are many previous works that have used data augmentation techniques and attribute-aware methods to improve object recognition and segmentation performance in various domains [35,36,37,38,39,40]. In [35], the authors aimed to improve the performance of object recognition and segmentation tasks by introducing a novel attribute-aware feature encoding (AFE) module within a multi-task network while enhancing semantic attributes through the integrated approach of attribute-aware feature encoding and the regularization of feature encoding via auxiliary attribute learning. To overcome the limitations posed by occlusions, varying lighting conditions, and objects with similar visual appearances, ref. [36] proposed a method that combines attribute-aware techniques and data augmentation to boost the performance of semantic segmentation methods. To address the scarcity of annotated data in the medical domain and overcome the associated challenges of high cost and effort in manual annotation, [37] introduced the cycle-consistent cross-domain medical image segmentation (CyCMIS) method, which leverages cycle-consistent techniques and diverse image augmentation to enhance the transferability, robustness, and generalization of segmentation models, enabling more accurate and reliable segmentation results even with limited labeled data in the target domain.

Addressing the inherent challenges of capturing significant variations in size, shape, texture, and color of skin lesions, ref. [38] introduced a method that combines multi-scale convolutional neural networks (CNNs) and domain-specific augmentations, involving specific transformations and enhancements applied to skin lesion images to simulate realistic variations, with the goal of enhancing the segmentation of skin lesions and their attributes. Aimed to enhance the generalization capability of deep learning models, improving segmentation performance, and addressing the limited availability of annotated training data, the authors of [39] developed an innovative approach that combines K-means clustering, deep learning techniques, and synthetic data augmentation. This approach involves generating synthetic data to augment the limited annotated data and improve the segmentation performance of deep learning models. To develop and evaluate an algorithm for bone segmentation on whole-body CT using a convolutional neural network (CNN), [40] proposed a method that utilizes a CNN along with novel data augmentation techniques, including conventional methods, mixup, and random image cropping and patching (RICAP).

In contrast, this research proposes an innovative data augmentation method that combines attribute-aware techniques, multiple regression theory [41,42,43], and deep learning models for medical image segmentation. The proposed method enhances the diversity and realism of synthetic images by utilizing a domain-specific, knowledge-driven data augmentation strategy.

Of particular interest is the U-Net model, which achieves high segmentation performance and is used in both architectural [33,34] and non-architectural aspects [13,28,29,30,31] of medical image segmentation.

The research methodology of this study follows [28,29,30,31] with minor modifications. However, unlike [28,29,30,31], this research relies on the lower abdominal CT scans of Thai patients, with the permission of the Siriraj Institutional Review Board.

Specifically, a notable correlation was observed between lying posture and the position of the left–right femur, indicating that the patient’s lying position can influence the femur’s positioning. The lying posture, including supine and prone postures, is utilized as data attributes in the experiments. This research proposes augmenting cropped CT slices with data attributes during image preprocessing to improve the performance of the deep-learning-based automatic left-femur segmentation scheme. In the study, the deep-learning-based automatic left-femur segmentation scheme was trained, validated, and tested using eight categories of CT input datasets (F-I–F-VIII).

The segmentation performance of the deep-learning-based automatic left-femur segmentation scheme was determined by Dice similarity coefficient (DSC) and intersection over union (IoU). The similarity between the predicted 3D reconstruction images of the left femur and ground-truth images were measured by spectral angle mapper (SAM) and structural similarity index measure (SSIM). The novelty of this research lies in the use of attribute augmentation in medical image preprocessing to enhance the performance of the deep-learning-based automatic left-femur segmentation scheme.

The organization of this research paper is as follows: Section 1 is the introduction. Section 2 describes the U-Net segmentation model for the left femur. Section 3 details the experimental dataset and data preprocessing. Section 4 deals with the segmentation performance of the proposed deep-learning-based automatic left-femur segmentation scheme and the image similarity metrics, and Section 5 compares and discusses the experimental results. The conclusions are provided in Section 6.

2. Deep-Learning-Based Automatic Left-Femur Segmentation Scheme: U-Net Segmentation Model

The U-Net segmentation model for the left femur comprises a five-layer, fully connected convolutional neural network (Figure 1). The U-Net is a type of CNN architecture consisting of a series of distinct operation layers, e.g., convolutional and pooling layers. The operation layers transform the input volume (e.g., the input image or another feature map) into an output volume (e.g., mask images and the feature maps) through a differentiable function. Figure 1 illustrates the U-Net architecture, which contains two paths: the contraction path (left side) and the expansion path (right side).

As shown in Figure 1, the CT input datasets are first entered into the contraction path and convolved and maximum-pooled (i.e., undergoing blocks of series of operation layers). Specifically, one block of series of operation layers consists of two consecutive convolutional layers and one max-pooling layer.

In each operation layer, the input datasets or feature maps are first padded to expand the size to allow for the center element of the kernel (3 × 3 kernel) to be placed over every pixel in the source, with a stride of 1, and each source pixel is then replaced with a weighted sum of the respective source pixel (

x_{i}

) and nearby pixels, where the weight (

w_{i}

) of the convolution kernel is a learnable parameter. The bias (

b

) is then added to the weighted sum of the source pixel and nearby pixels (

\sum_{i = 0}^{j} w_{i} x_{i}

) before applying the activation function (

f_{A}

) to obtain Z, as expressed in Equation (1). The activation function (

f_{A}

) between convolutional layers is the rectified linear unit (

R e L U

), as expressed in Equation (2).

Z = f_{A} (\sum_{i = 0}^{j} w_{i} x_{i} + b)

(1)

where j is the size of kernel.

f_{R e L u} (x) = \{\begin{matrix} 0 for x < 0 \\ x for x \geq 0 \end{matrix}

(2)

The aforementioned convolution process is repeated for a new feature map before maximum-pooling, given 2 × 2 filter and a stride of 1, to downsample and avoid overfitting. In the contraction path, the feature map resolution of each block of series of operation layers is halved, while the channels are doubled.

Meanwhile, the expansion path of the U-Net architecture restores spatial information for high-resolution feature maps and extracts features. In the expansion path, one block of series of operation layers consists of one transposed convolution, one concatenation, and two consecutive convolutional layers. The transposed convolution layer (3 × 3 convolution kernel and a stride of 2) doubles the feature map resolution and halves the channels. Moreover, the feature map (output) of the transposed convolution layer is concatenated with the corresponding feature map from the contraction path to compensate for missing features (i.e., skip connection). The process is repeated until the resolution of the feature map is identical to that of the CT input dataset.

In the final operation layer (represented by the yellow arrowhead), the feature map of the last convolutional layer is convolved, given 1 × 1 kernel convolution, to reduce the feature channels (64) to one channel or class output (i.e., the left femur). In addition, the sigmoid activation function (Equation (3)) is used to distribute the probability of output pixels and then binarily classified as 0 and 1, where 0 denotes non-target organs (depicted in black) and 1 the target organs (i.e., the left femur; depicted in white).

f_{s i g m o i d} (x) = \frac{1}{1 + e^{- x}}

(3)

3. Experimental Dataset and Data Preprocessing

Table 1 tabulates the initial experimental datasets, which are of CT-slice datasets in 8-bit PNG format from 120 CT scans of the lower abdomen of 120 patients (60 male and 60 female patients) of 60–80 years of age and with lower abdominal diseases including colorectal cancer, cervical cancer, prostate cancer, rectosigmoid cancer, and rectum cancer.

The CT-slice datasets are used to train, validate, and test the deep-learning-based automatic left-femur segmentation scheme. In radiotherapy of the lower abdomen, the OARs of radiation exposure are the bladder, left–right femur, prostate, rectum, and small bowel. Specifically, the target OARs of the deep-learning-based segmentation scheme is the left femur.

In this research, a CT scan consists of a series of CT slices (cross-sectional images) in the axial (horizontal), coronal (frontal), and sagittal (longitudinal) planes. The 3D CT slices were acquired using SOMATOM Confidence^® 32-slice CT simulator (Siemens, Germany) in the HELIX operation (helical scanning mode) with 120 kV, 250 mA, and 3 mm slice thickness. The use of the CT scans was reviewed and approved by the Siriraj Institutional Review Board with Certificate of Approval (COA) no. Si 315/2021.

Prior to preprocessing, the 120 CT scans were annotated to create an extensive dataset of source image–ground-truth mask image pairs and verified by radiologists. As shown in Figure 2, the data preprocessing entails four major steps: (i) assigning xyz coordinates of the bounding box for cropping, and this step involves window leveling and histogram projection; (ii) contrast enhancement using window leveling and cropping the CT slices; (iii) augmenting the cropped CT slices with attributes; and (iv) dividing the preprocessed datasets into three groupings: datasets for training, validation, and testing the deep-learning-based left-femur segmentation scheme. In this study, the purpose of cropping is to resize the image while retaining the image quality, whereas the aim of attribute augmentation is to enhance the predicted result of the deep-learning-based left-femur segmentation scheme.

3.1. Assigning xyz Coordinates of the Bounding Box for Cropping

In the femur cropping, the grayscale of each CT slice of the 120 CT scans was adjusted by using window leveling and normalization to distinguish the target OARs from the surrounding tissues and organs [44,45]. Window leveling is a viewer setting affecting the range of grayscale or Hounsfield unit (HU) numbers in the image. Window leveling and normalization is carried out by adjusting the window level (WL), which is the midpoint of the range of the HU numbers and the window width (WW), which is the measure of the range of minimum and maximum HU (

{H U}_{m i n}

and

{H U}_{m a x}

).

{H U}_{m i n}

and

{H U}_{m a x}

can be calculated by Equations (4) and (5), respectively [46].

{H U}_{m i n} = (W L - 0.5) - (\frac{W W - 1}{2})

(4)

{H U}_{m a x} = (W L - 0.5) + (\frac{W W - 1}{2})

(5)

In this research, WL and WW are 300 HU and 400 HU, which are the optimal WL and WW for maintaining the entire HU of the bone [47,48] (Figure 3). Given WW of 400 HU, an HU less than or equal to

{H U}_{m i n}

(i.e., ≤100 HU) is designated as

y_{m i n}

and HU greater than or equal to

{H U}_{m a x}

(i.e., ≥499 HU) as

y_{m a x}

. In this study,

y_{m i n}

and

y_{m a x}

are 0 HU and 255 HU, respectively. The HU between

{H U}_{m i n}

and

{H U}_{m a x}

(

y_{i})

, which is in the range of 0 to 255 HU, is calculated by Equation (6) [46].

y_{i} = ((\frac{x_{i} - (W L - 0.5)}{(W W - 1)}) + 0.5) \times (y_{m a x} - y_{m i n}) + y_{m i n}

(6)

where

y_{i}

is the HU output in the range of 0 to 255 HU (i.e., 2^8-bit), and

x_{i}

is the HU input given that

i

denotes the number of pixels (i.e., the resolution of image) of the CT slice. Since the preprocessed CT slice is in 8-bit PNG format, the range of the HU output (

y_{i}

) is between 0–255 HU.

The final task in assigning the xyz coordinates of the bounding box for femur cropping involves identifying the xyz coordinates by using histogram projection to locate the target organ (i.e., the femur). There exist several deep-learning-based preprocessing techniques to detect or locate organs of interest, e.g., bone fracture detection [49]. In the histogram projection, the vertical histogram projection is calculated by summing all rows of HU (

{H U}_{r}

) of a given column of the CT slice (Equation (7)) and the horizontal histogram projection by summing all columns of HU (

{H U}_{c}

) along a given row of the CT slice (Equation (8)) [50,51,52].

V e r t i c a l H i s t o g r a m P r o j e c t i o n = \frac{\sum {H U}_{r}}{S c}

(7)

H o r i z o n t a l H i s t o g r a m P r o j e c t i o n = \frac{\sum {H U}_{c}}{S c}

(8)

where

c

is the number of columns,

r

is the number of rows, and

S c

is a scaling value to normalize the output of vertical histogram projection and horizontal histogram projection. In this research,

S c

is set at 100.

Given the minimum thresholds for the vertical and horizontal histogram projections of 50 HU in the axial plane and 5 HU in the coronal and sagittal planes, the potential coordinates of all CT slices in all planes were determined (i.e., xy, xz, and zy coordinates for the axial, coronal, and sagittal planes). The smallest xy, xz, and zy coordinates (among the potential coordinates) of each CT slice were selected, and the axis coordinates of identical plane were averaged for the xyz coordinates of the CT scans. We obtained 120 xyz coordinates, corresponding to 120 CT scans of patients with lower abdominal cancers.

Figure 4 shows, as an example, the horizontal and vertical histogram projection and the xy coordinates of the femur in the axial plane. Figure 5 shows the workflow for assigning the xyz coordinates of the bounding box for femur cropping.

3.2. Contrast Enhancement and Femur Cropping

To enhance the contrast, CT slices are window-leveled given

W L

and

W W

of 120 HU and 336 HU, which are the optimal

W L

and

W W

for CT image contrast [53], respectively, and normalized into the range of 0 HU to 255 HU.

Figure 6a,b show the original CT slice in the axial plane and the corresponding CT image in 8-bit PNG format after window leveling and normalization. As seen in Figure 6b, the window leveling and normalization noticeably improve the image quality.

The xyz coordinates (in Section 3.1) are applied onto the contrast-enhanced CT slices to delineate the bounding box for femur cropping, as shown in Figure 7a. The purpose of cropping is to resize the image while retaining the image quality. The size of the cropped CT slices of the femur is 360 × 200 pixels in the axial plane, 360 × 110 pixels in the coronal plane, and 200 × 110 pixel in the sagittal plane. Figure 7b depicts, as an example, the bounding box for cropping (represented by the red rectangle) and the cropped cube comprising the cropped CT slices in the axial, coronal, and sagittal planes.

3.3. Cropped CT Slices Augmented with Attributes (Feature Addition)

Attribute augmentation (or feature addition), based on multivariate regression in machine learning [41,42,43], is utilized to enhance the segmentation performance of the deep learning algorithm. In this research, the attribute augmented to the cropped CT slices is lying position (supine and prone posture), which was selected based on expert suggestion from one of the co-authors, who is also an oncologist. Lying position was included as a data attribute because the position of the patient during CT scanning can affect the shape and position of internal organs and target volumes, such as the femur and rectum (e.g., [54,55]). Accounting for these differences can potentially improve segmentation accuracy, and by including lying position as a data attribute, the model can capture these variations. Attribute data are expected to affect the segmentation performance of the deep learning U-Net model.

Figure 8 shows the cropped CT slice of the femur before and after attribute augmentation.

In order to differentiate between the CT slices of different lying positions (when patients entered the CT scanner), the feature coefficients of the lying position attribute are categorized into three cases: small (1 and 2 for supine and prone posture, respectively), large (5 and 10), and excessively large (10 and 20). Since the prone position is unique to rectum cancer disease, where the lesion is located at the back, it is treated as a special case that influences the position and characteristics of the left–right femur, as shown in Figure 9. This can potentially confuse the segmentation model. Therefore, the coefficient value of the prone posture was assigned to be higher than the coefficient value of the supine posture, allowing for a significant differentiation between a special case and a normal case. Essentially, the feature coefficients were varied to investigate the effect of coefficient values on the performance of the U-Net segmentation model.

Figure 10 shows, as an example, the augmentation of the attribute to the cropped CT slices, where the white and yellow matrices represent the cropped CT slice and the attribute. The attribute augmentation is performed by adding M × 1 matrix to the last column of the cropped CT slice, where M is the number of rows. The attribute-augmented CT slices are subsequently converted into CT input datasets in 8-bit PNG format.

3.4. Training, Validation, and Testing Datasets of the Deep-Learning-Based Left-Femur Segmentation Scheme

The initial CT scans of the lower abdomen (before preprocessing) belong to 120 patients with lower abdominal cancers. The proportion of input datasets for training, validation, and testing the deep-learning-based automatic left-femur segmentation scheme is 60:20:20. Specifically, out of 120 CT scans, 72 CT scans were used for training, 24 CT scans for validation, and 24 CT scans for testing.

In addition, the deep-learning-based segmentation scheme was trained, validated, and tested under eight categories of CT input datasets, namely F-I–F-VIII for the left-femur segmentation.

Category F-I refers to the uncropped and non-augmented CT-image input datasets of the left femur; F-II to the cropped CT-image input datasets of the left femur (without attribute augmentation); and F-III, F-IV, and F-V to the cropped and augmented CT-image input datasets of the left femur with small (1 and 2 for supine and prone posture), large (5 and 10), and excessively large feature coefficients (10 and 20), respectively; and F-VI, F-VII, and F-VIII to the uncropped and augmented CT-image input datasets of the left femur with small (1 and 2 for supine and prone posture), large (5 and 10), and excessively large feature coefficients (10 and 20), respectively.

In the training, the weight (

w_{i}

) and bias (

b

) of the U-net model for left-femur segmentation was optimized by gradient descent optimization given the binary cross-entropy loss function and the learning rate (α) and epoch of 0.001 and 5000. The iteration was terminated when IoU [56] fails to improve for 50 consecutive epochs. Table 2 tabulates the hyperparameters of the U-Net models for left-femur segmentation.

The experiments were carried out using the APEX system of CMKL University, Thailand. APEX is a high-performance computing platform and storage infrastructure for AI works, with 1920 vCPU cores, 48x A100 GPUs with 1.92 TB total GPU memory, 30 petaFlops AI, 7.5 TB System Memory, 3 petaBytes Storage, and 200 Gbs Interconnect.

4. Segmentation Performance and Image Similarity Metrics

The segmentation performance of the deep-learning-based automatic left-femur segmentation scheme was assessed by DSC (Equation (9)) and IoU (Equation (10)). Specifically, DSC focuses on the prediction performance (i.e., segmentation accuracy) on average of the deep learning model, whereas IoU focuses on the worst prediction performance of the algorithmic model [57].

D S C (A, B) = \frac{2 |A \cap B|}{|A| + |B|}

(9)

I o U (A, B) = \frac{|A \cap B|}{|A \cup B|} = \frac{|A \cap B|}{|A| + |B| - |A \cap B|}

(10)

where

A

is the ground-truth CT image region, and

B

is the predicted region.

In addition, 3D reconstruction images were rendered based on the predicted results from the deep-learning-based segmentation models for a visually distinct comparison. The similarity between the 3D reconstruction images and ground-truth images were evaluated by SAM (Equation (11)) and SSIM (Equation (12)). Both metrics are image-similarity measures that quantify the degree of visual and semantic similarity of a pair of images.

SAM (α) = {c o s}^{- 1} \frac{\sum_{i = 1}^{n b} t_{i} r_{i}}{\sqrt{\sum_{i = 1}^{n b} {t_{i}}^{2}} \sqrt{\sum_{i = 1}^{n b} {r_{i}}^{2}}}

(11)

where t is the predicted 3D reconstruction image pixel spectrum, and r is the 3D ground-truth image pixel spectrum in an n-dimensional feature space, nb is the number of bands in the 3D image, and

α

is the angle between the two spectra of the predicted 3D model reconstruction image and 3D ground-truth image. A small

α

indicates similarity between the predicted 3D reconstruction image and the 3D ground-truth image [58].

The SSIM is a perceptual metric that quantifies image-quality degradation caused by processing such as data compression or by losses in data transmission. The SSIM algorithm extracts three key features from an image: luminance (

μ

), contrast (

σ

), and structure; and the comparison between the two images (i.e., the predicted 3D reconstruction image and 3D ground-truth image) is performed on the basis of these three features. The SSIM is mathematically expressed in Equation (12) [59,60].

S S I M (x, y) = \frac{(2 μ_{x} μ_{y} + c_{1}) (2 σ_{x y} + c_{2})}{(μ_{x}^{2} + {μ_{y}}^{2} + c_{1}) (σ_{x}^{2} + σ_{y}^{2} + c_{2})}

(12)

where

μ_{x}

and

μ_{y}

are the luminance of the predicted 3D reconstruction image (

x)

and the 3D ground-truth image

(y

),

σ_{x y}

is the correlation coefficient between the two images

, σ_{x}

and

σ_{y}

are the contrast of the predicted 3D reconstruction image (

x)

and the 3D ground-truth image

(y

), and

c_{1}

and

c_{2}

are constants to avoid an undefined value when

{μ_{x}}^{2} + {μ_{y}}^{2}

or

σ_{x}^{2} + σ_{y}^{2}

is approaching zero.

The SSIM is in the range of −1 to 1, where −1 indicates that both images (i.e., predicted 3D reconstruction image and 3D ground-truth image) are dissimilar, and 1 indicates that both images are identical.

5. Results and Discussion

This section discusses the segmentation performance of the proposed deep-learning-based automatic left-femur segmentation scheme. The testing of the deep-learning-based segmentation scheme was carried out under eight categories of the input datasets of CT images, namely F-I–F-VIII for the left-femur segmentation.

Specifically, category F-I refers to the uncropped and non-augmented CT-image input datasets of the left femur; F-II to the cropped CT-image input datasets of the left femur (without attribute augmentation); and F-III, F-IV, and F-V to the cropped and augmented CT-image input datasets of the left femur with small (1 and 2 for supine and prone posture), large (5 and 10), and excessively large feature coefficients (10 and 20), respectively; and F-VI, F-VII, and F-VIII to the uncropped and augmented CT-image input datasets of the left femur with small (1 and 2 for supine and prone posture), large (5 and 10), and excessively large feature coefficients (10 and 20), respectively.

5.1. Performance of the U-Net Femur Segmentation Model

Table 3 tabulates the DSC and IoU of the U-Net left-femur segmentation model under the eight categories of the CT-image input datasets: categories F-I–F-VIII.

Under dataset category F-I, the DSC and IoU of the left-femur segmentation model are 37.76% and 23.90% and 67.96% and 52.32% under category F-II. The DSC and IoU are 61.37% and 45.46%, 88.25% and 80.85%, and 72.54% and 57.93% under categories F-III, F-IV, and F-V, respectively. The DSC and IoU are 51.37% and 41.25%, 48.62% and 35.18%, and 45.27% and 31.30% under categories F-VI, F-VII, and F-VIII, respectively. The segmentation performance of the left-femur segmentation model decreases as the feature coefficients increase beyond a certain limit, as indicated by DSC and IoU under category F-V compared to category F-IV.

The DSC and IoU under categories F-VI, F-VII, and F-VIII are poorer than under categories F-II, F-III, F-IV, and F-V. The finding could be attributed to the substantially larger CT image size under categories F-VI, F-VII, and F-VIII (512 × 512 pixels) compared to under categories F-II, F-III, F-IV, and F-V (360 × 200 pixels). Specifically, the optimal feature coefficients for the U-Net left-femur segmentation model, given the dataset under category F-IV, are 5 for supine posture and 10 for prone posture, as evidenced by DSC and IoU of 88.25% and 80.85%.

Figure 11a–h, as an example, compare the performance of the U-Net left-femur segmentation model under dataset categories F-I, F-II, F-III, F-IV, F-V, F-VI, F-VII, and F-VIII, where the left, middle, and right columns are the CT image of the femur, the corresponding ground truth, and the predicted segmentation image. In Figure 11a (under dataset category F-I), the U-Net left-femur segmentation model displays the incomplete left femur compared to the ground-truth image. In Figure 11b (under dataset category F-II), the U-Net segmentation model displays the right femur (i.e., the non-target organ) in addition to the left femur (the target organ).

In Figure 11c (under dataset category F-III), the U-Net segmentation model displays small sections of the right femur, while certain sections of the left femur are missing. In Figure 11d (under dataset category F-IV), the U-Net segmentation model displays only the left femur (i.e., the target organ), and the predicted result closely resembles the ground-truth image. Meanwhile, in comparison with category F-IV, the predicted result under dataset category F-V is less complete, with certain sections of the left femur missing (Figure 11e). In Figure 11f, the U-Net segmentation model displays the incomplete left femur and certain sections of the right femur. In Figure 11g, the head of the left femur is missing, and in Figure 11h, the femur head of the target organ (left femur) is missing, and some sections of the right femur appear on the segmented image.

5.2. Comparison and Similarity of 3D Reconstruction Images

Table 4 compares the predicted 3D reconstruction images of the left femur and the ground-truth images and the corresponding SAM and SSIM under categories F-I, F-II, F-III, F-IV, F-V, F-VI, F-VII, and F-VIII.

The SAM and SSIM are 0.209–0.257 and 0.433–0.544 under category F-I; 0.135–0.236 and 0.658–0.714 under category F-II; 0.120–0.222 and 0.668–0.729 under category F-III; 0.117–0.215 and 0.701–0.732 under category F-IV; 0.121–0.223 and 0.675–0.729 under category F-V; 0.146–0.225 and 0.572–0.644 under category F-VI; 0.157–0.225 and 0.551–0.637 under category F-VII; and 0.184–0.236 and 0.461–0.639 under category F-VIII. By comparison, the predicted 3D reconstruction images using the cropped and augmented CT-image input datasets of the left femur with large feature coefficient values (category F-IV) closely resemble the ground-truth images, as evidenced by smallest SAM and largest SSIM.

Figure 12, Figure 13, Figure 14, Figure 15, Figure 16, Figure 17, Figure 18 and Figure 19 depict the 3D reconstruction images of the deep-learning-based automatic left-femur segmentation scheme under various categories of CT datasets of the left femur.

In Figure 12 (under dataset categories F-I), the left femur is almost entirely missing, with a small section of the non-target organ appearing in the 3D reconstruction image. In Figure 13 (under dataset categories F-II), a large section of the right femur (non-target organ) appears on the 3D reconstruction image, suggesting that the CT datasets require further preprocessing. In Figure 14 (under dataset categories F-III), there remain certain sections of the right femur, while the head and the distal of lesser trochanter of the left femur are missing.

In Figure 15 (under dataset categories F-IV), there remain small sections of the right femur (non-target organ), and the head of the left femur is slightly missing. In Figure 16 (under dataset categories F-V), there remains one tiny section of the right femur, but the head and the distal of lesser trochanter of the left femur are missing.

In Figure 17 (under dataset categories F-VI), the head of the left femur and the distal of lesser trochanter are missing, while large sections of the right femur (non-target organ) appear on the 3D reconstruction image. In Figure 18 (under dataset categories F-VII), larger sections of the head of the left femur and the distal of lesser trochanter are missing, and larger sections of the right femur appear on the 3D reconstruction image.

In Figure 19 (under dataset categories F-VIII), the distal of lesser trochanter is almost complete, while the femoral head of the left femur is missing, with some sections of the right femur appearing on the image. Essentially, the optimal CT datasets for the deep-learning-based automatic left-femur segmentation scheme are those belonging to category F-IV.

6. Conclusions

This research proposes augmenting cropped CT slices with data attributes during image preprocessing to enhance the performance of a deep-learning-based automatic left-femur segmentation scheme. The data attribute is the lying position (supine and prone posture) for the segmentation model. In the study, the deep-learning-based automatic left-femur segmentation scheme was trained, validated, and tested under eight categories of CT input datasets for the left femur (F-I–F-VIII). The segmentation performance of the left-femur segmentation scheme was evaluated by DSC and IoU, and the similarity between the predicted 3D reconstruction images and ground-truth images was measured by SAM and SSIM. The results show that the left-femur segmentation model achieved the highest DSC (88.25%) and IoU (80.85%) under category F-IV (using the cropped and augmented CT input datasets with large feature coefficients). Moreover, the SAM and SSIM of the left femur segmentation model are 0.117–0.215 and 0.701–0.732 under category F-IV. The optimal CT dataset for the deep-learning-based automatic left-femur segmentation scheme is that of dataset category F-IV. To further improve the DSC and IoU of the femur segmentation model (88.25% and 80.85% for DSC and IoU), subsequent research would experimentally modify the block of series of operation layers of the contraction and/or expansion path in the U-Net model. The limitation of this research is the tedious preparation process (i.e., time-consuming and labor-intensive nature) of the ground-truth images of the left femur necessary to train, validate, and test the U-Net left-femur model.

Author Contributions

Conceptualization, K.A., P.P. and P.D.; methodology, K.A. and P.P.; validation, K.A., P.P., P.D., W.S. and T.J.; formal analysis, K.A. and P.P.; investigation, K.A., P.P. and P.D.; resources, P.D.; data curation, K.A., W.S. and T.J.; writing—original draft preparation, K.A. and P.P.; writing—review and editing, K.A. and P.P.; funding acquisition, P.P. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by King Mongkut’s Institute of Technology Ladkrabang, Bangkok 10520, Thailand [KDS 2020/002].

Institutional Review Board Statement

This study was conducted in accordance with the Declaration of Helsinki and approved by the Institutional Review Board (or Ethics Committee) of Siriraj hospital (protocol code COA no. Si 315/2021; date of approval, 3 May 2021–2 May 2024; Protocol Title, “Automatic Multiple Organs Segmentation and 3D Image Reconstruction Using Deep Learning”).

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

This work was supported by King Mongkut’s Institute of Technology Ladkrabang. The authors would like to express deep gratitude to Thailand’s Siriraj Hospital for CT data support.

Conflicts of Interest

The authors declare no conflict of interest.

References

Lee, G.; Fujita, H. Deep Learning in Medical Image Analysis: Challenges and Applications; Springer: Berlin/Heidelberg, Germany, 2020; Volume 1213. [Google Scholar]
Ma, Z.; Tavares, J.M.R.; Jorge, R.N.; Mascarenhas, T. A review of algorithms for medical image segmentation and their applications to the female pelvic cavity. Comput. Methods Biomech. Biomed. Engin. 2010, 13, 235–246. [Google Scholar] [CrossRef] [Green Version]
Delaney, G.P.; Barton, M.B. Evidence-based estimates of the demand for radiotherapy. Clin. Oncol. 2015, 27, 70–76. [Google Scholar] [CrossRef]
Weston, A.D.; Korfiatis, P.; Philbrick, K.A.; Conte, G.M.; Kostandy, P.; Sakinis, T.; Zeinoddini, A.; Boonrod, A.; Moynagh, M.; Takahashi, N. Complete abdomen and pelvis segmentation using U-net variant architecture. Med. Phys. 2020, 47, 5609–5618. [Google Scholar] [CrossRef]
Ibragimov, B.; Xing, L. Segmentation of organs-at-risks in head and neck CT images using convolutional neural networks. Med. Phys. 2017, 44, 547–557. [Google Scholar] [CrossRef] [Green Version]
Puangragsa, U.; Setakornnukul, J.; Dankulchai, P.; Phasukkit, P. 3D Kinect Camera Scheme with Time-Series Deep-Learning Algorithms for Classification and Prediction of Lung Tumor Motility. Sensors 2022, 22, 2918. [Google Scholar] [CrossRef]
Voet, P.W. Automation of Contouring and Planning in Radiotherapy; Erasmus University Rotterdam: Rotterdam, The Netherlands, 2014. [Google Scholar]
Kosmin, M.; Ledsam, J.; Romera-Paredes, B.; Mendes, R.; Moinuddin, S.; de Souza, D.; Gunn, L.; Kelly, C.; Hughes, C.; Karthikesalingam, A. Rapid advances in auto-segmentation of organs at risk and target volumes in head and neck cancer. Radiother. Oncol. 2019, 135, 130–140. [Google Scholar] [CrossRef] [PubMed]
Kazemifar, S.; Balagopal, A.; Nguyen, D.; McGuire, S.; Hannan, R.; Jiang, S.; Owrangi, A. Segmentation of the prostate and organs at risk in male pelvic CT images using deep learning. Biomed. Phys. Eng. Express 2018, 4, 055003. [Google Scholar] [CrossRef] [Green Version]
Doi, K. Computer-aided diagnosis in medical imaging: Historical review, current status and future potential. Comput. Med. Imaging Graph. 2007, 31, 198–211. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Haque, I.R.I.; Neubert, J. Deep learning approaches to biomedical image segmentation. Inform. Med. Unlocked 2020, 18, 100297. [Google Scholar] [CrossRef]
Mathur, R. Deep Learning over Conventional Image Processing for Contrast Enhancement and Auto-Segmentation of Super-Resolved Neuronal Brain Images: A Comparative Study. Magn. Reson. Med. Sci. 2020, 19, 195–206. [Google Scholar] [CrossRef] [Green Version]
Paing, M.P.; Tungjitkusolmun, S.; Bui, T.H.; Visitsattapongse, S.; Pintavirooj, C. Automated segmentation of infarct lesions in T1-weighted MRI scans using variational mode decomposition and deep learning. Sensors 2021, 21, 1952. [Google Scholar] [CrossRef]
Harari, P.M.; Song, S.; Tomé, W.A. Emphasizing conformal avoidance versus target definition for IMRT planning in head-and-neck cancer. Int. J. Radiat. Oncol. Biol. Phys. 2010, 77, 950–958. [Google Scholar] [CrossRef] [Green Version]
Das, I.J.; Moskvin, V.; Johnstone, P.A. Analysis of treatment planning time among systems and planners for intensity-modulated radiation therapy. J. Am. Coll. Radiol. 2009, 6, 514–517. [Google Scholar] [CrossRef]
Zhou, X.; Takayama, R.; Wang, S.; Hara, T.; Fujita, H. Deep learning of the sectional appearances of 3D CT images for anatomical structure segmentation based on an FCN voting method. Med. Phys. 2017, 44, 5221–5233. [Google Scholar] [CrossRef] [Green Version]
Roy, S.; Meena, T.; Lim, S.-J. Demystifying supervised learning in healthcare 4.0: A new reality of transforming diagnostic medicine. Diagnostics 2022, 12, 2549. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2016, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar] [CrossRef] [Green Version]
Chan, H.-P.; Samala, R.K.; Hadjiiski, L.M.; Zhou, C. Deep learning in medical image analysis. Deep. Learn. Med. Image Anal. Chall. Appl. 2020, 1213, 3–21. [Google Scholar] [CrossRef]
Shen, J.; Lu, S.; Qu, R.; Zhao, H.; Zhang, L.; Chang, A.; Zhang, Y.; Fu, W.; Zhang, Z. A boundary-guided transformer for measuring distance from rectal tumor to anal verge on magnetic resonance images. Patterns 2023, 4, 100711. [Google Scholar] [CrossRef] [PubMed]
He, Q.; Yang, Q.; Xie, M. HCTNet: A hybrid CNN-transformer network for breast ultrasound image segmentation. Comput. Biol. Med. 2023, 155, 106629. [Google Scholar] [CrossRef] [PubMed]
He, A.; Wang, K.; Li, T.; Du, C.; Xia, S.; Fu, H. H2Former: An Efficient Hierarchical Hybrid Transformer for Medical Image Segmentation. IEEE Trans. Med. Imaging, 2023; Online ahead of print. [Google Scholar] [CrossRef]
Cha, K.H.; Hadjiiski, L.; Samala, R.K.; Chan, H.P.; Caoili, E.M.; Cohan, R.H. Urinary bladder segmentation in CT urography using deep-learning convolutional neural network and level sets. Med. Phys. 2016, 43, 1882–1896. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Kim, Y.J.; Lee, S.H.; Park, C.M.; Kim, K.G. Evaluation of semi-automatic segmentation methods for persistent ground glass nodules on thin-section CT scans. Healthc. Inform. Res. 2016, 22, 305–315. [Google Scholar] [CrossRef] [Green Version]
Starmans, M.P.; van der Voort, S.R.; Tovar, J.M.C.; Veenland, J.F.; Klein, S.; Niessen, W.J. Radiomics: Data mining using quantitative medical image features. In Handbook of Medical Image Computing and Computer Assisted Intervention; Elsevier: Amsterdam, The Netherlands, 2020; pp. 429–456. [Google Scholar]
Sakinis, T.; Milletari, F.; Roth, H.; Korfiatis, P.; Kostandy, P.; Philbrick, K.; Akkus, Z.; Xu, Z.; Xu, D.; Erickson, B.J. Interactive segmentation of medical images through fully convolutional neural networks. arXiv 2019, arXiv:1903.08205. [Google Scholar]
Chicco, D. Ten quick tips for machine learning in computational biology. BioData Min. 2017, 10, 35. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Islam, M.; Khan, K.N.; Khan, M.S. Evaluation of Preprocessing Techniques for U-Net Based Automated Liver Segmentation. In Proceedings of the 2021 International Conference on Artificial Intelligence (ICAI), Islamabad, Pakistan, 5–7 April 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 187–192. [Google Scholar]
Duque, P.; Cuadra, J.; Jiménez, E.; Rincón-Zamorano, M. In Data preprocessing for automatic WMH segmentation with FCNNs. In Proceedings of the From Bioinspired Systems and Biomedical Applications to Machine Learning: 8th International Work-Conference on the Interplay Between Natural and Artificial Computation, IWINAC 2019, Almería, Spain, 3–7 June 2019; Part II 8. Springer: Berlin/Heidelberg, Germany, 2019; pp. 452–460. [Google Scholar]
Ross-Howe, S.; Tizhoosh, H.R. In the effects of image pre-and post-processing, wavelet decomposition, and local binary patterns on U-nets for skin lesion segmentation. In Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, Brazil, 8–13 July 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–8. [Google Scholar]
De Raad, K.; van Garderen, K.A.; Smits, M.; van der Voort, S.R.; Incekara, F.; Oei, E.; Hirvasniemi, J.; Klein, S.; Starmans, M.P. The effect of preprocessing on convolutional neural networks for medical image segmentation. In Proceedings of the 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI), Nice, France, 13–16 April 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 655–658. [Google Scholar]
Gibson, E.; Giganti, F.; Hu, Y.; Bonmati, E.; Bandula, S.; Gurusamy, K.; Davidson, B.; Pereira, S.P.; Clarkson, M.J.; Barratt, D.C. Automatic multi-organ segmentation on abdominal CT with dense V-networks. IEEE Trans. Med. Imaging 2018, 37, 1822–1834. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Balagopal, A.; Kazemifar, S.; Nguyen, D.; Lin, M.-H.; Hannan, R.; Owrangi, A.; Jiang, S. Fully automated organ segmentation in male pelvic CT images. Phys. Med. Biol. 2018, 63, 245015. [Google Scholar] [CrossRef] [Green Version]
Pal, D.; Reddy, P.B.; Roy, S. Attention UW-Net: A fully connected model for automatic segmentation and annotation of chest X-ray. Comput. Biol. Med. 2022, 150, 106083. [Google Scholar] [CrossRef] [PubMed]
Yang, S.; Wang, Y.; Chen, K.; Zeng, W.; Fei, Z. Attribute-aware feature encoding for object recognition and segmentation. IEEE Trans. Multimed. 2021, 24, 3611–3623. [Google Scholar] [CrossRef]
Sulistiyo, M.D.; Kawanishi, Y.; Deguchi, D.; Ide, I.; Hirayama, T.; Murase, H. Performance boost of attribute-aware semantic segmentation via data augmentation for driver assistance. In Proceedings of the 2020 8th International Conference on Information and Communication Technology (ICoICT), Yogyakarta, Indonesia, 24–26 June 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–6. [Google Scholar]
Wang, R.; Zheng, G. CyCMIS: Cycle-consistent Cross-domain Medical Image Segmentation via diverse image augmentation. Med. Image Anal. 2022, 76, 102328. [Google Scholar] [CrossRef]
Jahanifar, M.; Tajeddin, N.Z.; Koohbanani, N.A.; Gooya, A.; Rajpoot, N. Segmentation of skin lesions and their attributes using multi-scale convolutional neural networks and domain specific augmentations. arXiv 2018, arXiv:1809.10243. [Google Scholar]
Khan, A.R.; Khan, S.; Harouni, M.; Abbasi, R.; Iqbal, S.; Mehmood, Z. Brain tumor segmentation using K-means clustering and deep learning with synthetic data augmentation for classification. Microsc. Res. Tech. 2021, 84, 1389–1399. [Google Scholar] [CrossRef]
Noguchi, S.; Nishio, M.; Yakami, M.; Nakagomi, K.; Togashi, K. Bone segmentation on whole-body CT using convolutional neural network with novel data augmentation techniques. Comput. Biol. Med. 2020, 121, 103767. [Google Scholar] [CrossRef]
Moore, A.W.; Anderson, B.; Das, K.; Wong, W.-K. Combining multiple signals for biosurveillance. In Handbook of Biosurveillance; Academic Press: London, UK, 2006; pp. 235–242. [Google Scholar]
Nugus, S. Regression Analysis. In Financial Planning Using Excel: Forecasting, Planning and Budgeting Techniques; CIMA Publishing: Washington, DC, USA, 2009; pp. 37–52. [Google Scholar]
Alexopoulos, E.C. Introduction to multivariate regression analysis. Hippokratia 2010, 14, 23. [Google Scholar]
Zatz, L.M. Basic principles of computed tomography scanning. In Technical Aspects of Computed Tomography; Mosby: St. Louis, MO, USA, 1981; pp. 3853–3876. [Google Scholar]
Hoang, J.K.; Glastonbury, C.M.; Chen, L.F.; Salvatore, J.K.; Eastwood, J.D. CT mucosal window settings: A novel approach to evaluating early T-stage head and neck carcinoma. Am. J. Roentgenol. 2010, 195, 1002–1006. [Google Scholar] [CrossRef]
Window Width Attribute. Available online: https://dicom.innolitics.com/ciods/us-image/voi-lut/00281051 (accessed on 12 February 2022).
Christensen, D.; Nappo, K.; Wolfe, J.; Tropf, J.; Berge, M.; Wheatley, B.; Saxena, S.; Yow, B.; Tintle, S. Ten-year fracture risk predicted by proximal femur Hounsfield units. Osteoporos. Int. 2020, 31, 2123–2130. [Google Scholar] [CrossRef] [PubMed]
Christensen, D.L.; Nappo, K.E.; Wolfe, J.A.; Wade, S.M.; Brooks, D.I.; Potter, B.K.; Forsberg, J.A.; Tintle, S.M. Proximal femur hounsfield units on CT colonoscopy correlate with dual-energy X-ray absorptiometry. Clin. Orthop. Relat. Res. 2019, 477, 850. [Google Scholar] [CrossRef]
Meena, T.; Roy, S. Bone fracture detection using deep supervised learning from radiological images: A paradigm shift. Diagnostics 2022, 12, 2420. [Google Scholar] [CrossRef]
Abu-Ain, T.; Sheikh Abdullah, S.N.H.; Bataineh, B.; Omar, K.; Abu-Ein, A. A novel baseline detection method of handwritten Arabic-script documents based on sub-words. In Proceedings of the Soft Computing Applications and Intelligent Systems: Second International Multi-Conference on Artificial Intelligence Technology, M-CAIT 2013, Shah Alam, Malaysia, 28–29 August 2013; Springer: Berlin/Heidelberg, Germany, 2013; pp. 67–77. [Google Scholar]
Talari, D.; Namburu, A. Indus Image Segmentation Using Watershed and Histogram Projections. Int. Robot. Autom. J. 2017, 3, 1–4. [Google Scholar] [CrossRef] [Green Version]
Reddy, E.S. Character segmentation for Telugu image document using multiple histogram projections. Glob. J. Comput. Sci. Technol. 2013, 13, 11–15. [Google Scholar]
Apivanichkul, K.; Phasukkit, P.; Dankulchai, P. Performance Comparison of Deep Learning Approach for Automatic CT Image Segmentation by Using Window Leveling. In Proceedings of the 2021 13th Biomedical Engineering International Conference (BMEiCON), Ayutthaya, Thailand, 19–21 November 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–5. [Google Scholar] [CrossRef]
Nijkamp, J.; de Jong, R.; Sonke, J.-J.; van Vliet, C.; Marijnen, C. Target volume shape variation during irradiation of rectal cancer patients in supine position: Comparison with prone position. Radiother. Oncol. 2009, 93, 285–292. [Google Scholar] [CrossRef] [PubMed]
Uemura, K.; Takao, M.; Otake, Y.; Takashima, K.; Hamada, H.; Ando, W.; Sato, Y.; Sugano, N. The effect of patient positioning on measurements of bone mineral density of the proximal femur: A simulation study using computed tomographic images. Arch. Osteoporos. 2023, 18, 35. [Google Scholar] [CrossRef]
Jiang, B.; Luo, R.; Mao, J.; Xiao, T.; Jiang, Y. Acquisition of localization confidence for accurate object detection. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 784–799. [Google Scholar]
Taha, A.A.; Hanbury, A. Metrics for evaluating 3D medical image segmentation: Analysis, selection, and tool. BMC Med. Imaging 2015, 15, 1–28. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Kruse, F.A.; Lefkoff, A.; Boardman, J.; Heidebrecht, K.; Shapiro, A.; Barloon, P.; Goetz, A. The spectral image processing system (SIPS)—Interactive visualization and analysis of imaging spectrometer data. Remote Sens. Environ. 1993, 44, 145–163. [Google Scholar] [CrossRef]
Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed] [Green Version]
McFadden, S.B.; Ward, P.A. Selecting the proper window for SSIM. In Image Quality and System Performance IX; SPIE: Bellingham, WA, USA, 2012; pp. 90–99. [Google Scholar]

Figure 1. The 5-layer U-Net architecture of the deep-learning-based left-femur segmentation scheme.

Figure 2. Overview of the scope of this research.

Figure 3. The CT slice in axial plane: (a) original CT slice and (b) CT image after window leveling and normalization (in 8-bit PNG format) with 300 HU for WL and 400 HU for WW.

Figure 4. Histogram projection and xy coordinates of the femur in the axial plane: (a) horizontal histogram projection, (b) vertical histogram projection, and (c) xy coordinates.

Figure 5. The workflow of assigning xyz coordinates of the bounding box for femur cropping.

Figure 6. The CT slice in axial plane: (a) original CT slice and (b) CT image after window leveling and normalization (in 8-bit PNG format).

Figure 7. (a) The workflow of contrast enhancement and cropping the femur region and (b) the example of the bounding box for cropping and the cropped cube comprising the cropped CT slices in the axial, coronal, and sagittal planes.

Figure 8. The CT image of the femur in the axial plane: (a) cropped CT slice and (b) cropped CT slice with attribute augmentation.

Figure 9. CT slices of the femur in different lying postures: (a) supine and (b) prone.

Figure 10. Example of attribute augmentation to the CT slices.

Figure 11. Performance of the U-Net left femur segmentation model under dataset category: (a) F-I, (b) F-II, (c) F-III, (d) F-IV, (e) F-V, (f) F-VI, (g) F-VII, (h) F-VIII.

Figure 12. Three-dimensional reconstruction images of the proposed deep-learning-based automatic left-femur segmentation scheme under dataset category F-I.

Figure 13. Three-dimensional reconstruction images of the proposed deep-learning-based automatic left-femur segmentation scheme under dataset category F-II.

Figure 14. 3D reconstruction images of the proposed deep-learning automatic left femur segmentation scheme under dataset category F-III.

Figure 15. Three-dimensional reconstruction images of the proposed deep-learning-based automatic left-femur segmentation scheme under dataset category F-IV.

Figure 16. Three-dimensional reconstruction images of the proposed deep-learning-based automatic left-femur segmentation scheme under dataset category F-V.

Figure 17. Three-dimensional reconstruction images of the proposed deep-learning-based automatic left-femur segmentation scheme under dataset category F-VI.

Figure 18. Three-dimensional reconstruction images of the proposed deep-learning-based automatic left-femur segmentation scheme under dataset category F-VII.

Figure 19. Three-dimensional reconstruction images of the proposed deep-learning-based automatic left-femur segmentation scheme under dataset category F-VIII.

Table 1. The initial experimental datasets of patients with lower abdominal diseases.

Organ of Interest	Left Femur
Number of patients	120
Age	60–80 years old
Gender	60 male and 60 female patients
Types of lower abdominal disorders	Cervical cancer and prostate cancer	Colorectal cancer, rectosigmoid cancer, and rectum cancer
Source of data	Siriraj Hospital, Thailand

Table 2. Hyperparameters of the U-Net model for left-femur segmentation.

Hyperparameter	Value
Hyperparameter	Femur
Number of layers	5
Epochs	5000
Learning rate	0.001
Optimizer	Adam
Loss function	Binary cross-entropy
Input dimension (pixels)	352 × 208
Convolution kernel size	3 × 3
Max pooling kernel size	2 × 2
Activation function	Rectified linear unit (ReLU), sigmoid
Initial channels	64

Table 3. Performance of the U-Net left-femur segmentation model in terms of DSC and IoU.

Performance of the U-Net Left-Femur Segmentation Model (%)
Dataset Category	Metric
Dataset Category	DSC	IoU
Dataset category F-I	37.76	23.90
Dataset category F-II	67.96	52.32
Dataset category F-III	61.37	45.46
Dataset category F-IV	88.25	80.85
Dataset category F-V	72.54	57.93
Dataset category F-VI	51.37	41.25
Dataset category F-VII	48.62	35.18
Dataset category F-VIII	45.27	31.30

Note: Dataset category F-I refers to the uncropped and non-augmented CT-image input datasets of the left femur; F-II to the cropped CT-image input datasets of the left femur (without attribute augmentation); F-III, F-IV, and F-V to the cropped and augmented CT-image input datasets of the left femur with small (1 and 2 for supine and prone posture), large (5 and 10), and excessively large feature coefficients (10 and 20); and F-VI, F-VII, and F-VIII to the uncropped and augmented CT-image input datasets of the left femur with small (1 and 2 for supine and prone posture), large (5 and 10), and excessively large feature coefficients (10 and 20).

Table 4. Image-similarity metrics (SAM and SSIM) of 3D reconstruction images of the left femur.

Isometric View of the Left Femur	Ground Truth and Predicted Results of the Left Femur under Dataset Category F-I
	Front	Left	Rear	Right	Top	Bottom
Ground Truth
Prediction
SAM	0.211	0.214	0.209	0.223	0.243	0.257
SSIM	0.433	0.454	0.501	0.532	0.502	0.544
Ground truth and predicted results of the left femur under dataset category F-II
	Front	Left	Rear	Right	Top	Bottom
Prediction
SAM	0.155	0.165	0.144	0.135	0.187	0.236
SSIM	0.677	0.658	0.712	0.714	0.701	0.689
Ground truth and predicted results of the left femur under dataset category F-III
	Front	Left	Rear	Right	Top	Bottom
Prediction
SAM	0.162	0.139	0.152	0.120	0.223	0.220
SSIM	0.668	0.669	0.708	0.729	0.688	0.708
Ground truth and predicted results of the left femur under dataset category F-IV
	Front	Left	Rear	Right	Top	Bottom
Prediction
SAM	0.142	0.138	0.129	0.117	0.204	0.215
SSIM	0.702	0.706	0.725	0.732	0.701	0.710
Ground truth and predicted results of the left femur under dataset category F-V
	Front	Left	Rear	Right	Top	Bottom
Prediction
SAM	0.151	0.145	0.134	0.121	0.196	0.223
SSIM	0.683	0.675	0.722	0.729	0.700	0.702
Ground truth and predicted results of the left femur under dataset category F-VI
	Front	Left	Rear	Right	Top	Bottom
Prediction
SAM	0.176	0.170	0.164	0.146	0.197	0.225
SSIM	0.596	0.572	0.632	0.634	0.644	0.635
Ground truth and predicted results of the left femur under dataset category F-VII
	Front	Left	Rear	Right	Top	Bottom
Prediction
SAM	0.183	0.157	0.170	0.176	0.197	0.225
SSIM	0.551	0.572	0.631	0.603	0.637	0.633
Ground truth and predicted results of the left femur under dataset category F-VIII
	Front	Left	Rear	Right	Top	Bottom
Prediction
SAM	0.201	0.193	0.184	0.192	0.212	0.236
SSIM	0.461	0.471	0.531	0.574	0.536	0.639

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Apivanichkul, K.; Phasukkit, P.; Dankulchai, P.; Sittiwong, W.; Jitwatcharakomol, T. Enhanced Deep-Learning-Based Automatic Left-Femur Segmentation Scheme with Attribute Augmentation. Sensors 2023, 23, 5720. https://doi.org/10.3390/s23125720

AMA Style

Apivanichkul K, Phasukkit P, Dankulchai P, Sittiwong W, Jitwatcharakomol T. Enhanced Deep-Learning-Based Automatic Left-Femur Segmentation Scheme with Attribute Augmentation. Sensors. 2023; 23(12):5720. https://doi.org/10.3390/s23125720

Chicago/Turabian Style

Apivanichkul, Kamonchat, Pattarapong Phasukkit, Pittaya Dankulchai, Wiwatchai Sittiwong, and Tanun Jitwatcharakomol. 2023. "Enhanced Deep-Learning-Based Automatic Left-Femur Segmentation Scheme with Attribute Augmentation" Sensors 23, no. 12: 5720. https://doi.org/10.3390/s23125720

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhanced Deep-Learning-Based Automatic Left-Femur Segmentation Scheme with Attribute Augmentation

Abstract

1. Introduction

1.1. Background

1.2. Related Works

2. Deep-Learning-Based Automatic Left-Femur Segmentation Scheme: U-Net Segmentation Model

3. Experimental Dataset and Data Preprocessing

3.1. Assigning xyz Coordinates of the Bounding Box for Cropping

3.2. Contrast Enhancement and Femur Cropping

3.3. Cropped CT Slices Augmented with Attributes (Feature Addition)

3.4. Training, Validation, and Testing Datasets of the Deep-Learning-Based Left-Femur Segmentation Scheme

4. Segmentation Performance and Image Similarity Metrics

5. Results and Discussion

5.1. Performance of the U-Net Femur Segmentation Model

5.2. Comparison and Similarity of 3D Reconstruction Images

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI