Unsupervised Feature Extraction for Various Computer-Aided Diagnosis Using Multiple Convolutional Autoencoders and 2.5-Dimensional Local Image Analysis

Nemoto, Mitsutaka; Ushifusa, Kazuyuki; Kimura, Yuichi; Nagaoka, Takashi; Yamada, Takahiro; Yoshikawa, Takeharu

doi:10.3390/app13148330

Open AccessArticle

Unsupervised Feature Extraction for Various Computer-Aided Diagnosis Using Multiple Convolutional Autoencoders and 2.5-Dimensional Local Image Analysis

by

Mitsutaka Nemoto

^1,2,*

,

Kazuyuki Ushifusa

²,

Yuichi Kimura

^3,4,

Takashi Nagaoka

^1,2,

Takahiro Yamada

⁵ and

Takeharu Yoshikawa

⁶

¹

Faculty of Biology-Oriented Science and Technology, Kindai University, Wakayama 649-6493, Japan

²

Graduate School of Biology-Oriented Science and Technology, Kindai University, Wakayama 649-6493, Japan

³

Faculty of Informatics, Kindai University, Osaka 577-8502, Japan

⁴

Cyber Informatics Research Institute, Kindai University, Osaka 577-8502, Japan

⁵

Institute of Advanced Clinical Medicine, Kindai University Hospital, Osaka 589-8511, Japan

⁶

Department of Computational Diagnostic Radiology and Preventive Medicine, The University of Tokyo Hospital, Tokyo 113-8655, Japan

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(14), 8330; https://doi.org/10.3390/app13148330

Submission received: 24 May 2023 / Revised: 30 June 2023 / Accepted: 15 July 2023 / Published: 19 July 2023

(This article belongs to the Section Biomedical Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

Featured Application

Providing effective local image features for developing a computer-aided diagnosis system without the burden of collecting disease datasets and annotating its lesion area.

Abstract

There are growing expectations for AI computer-aided diagnosis: computer-aided diagnosis (CAD) systems can be used to improve the accuracy of diagnostic imaging. However, it is not easy to collect large amounts of disease medical image data with lesion area annotations for the supervised learning of CAD systems. This study proposes an unsupervised local image feature extraction method running without such disease medical image datasets. Local image features are one of the key determinants of system performance. The proposed method requires only a normal image dataset that does not include lesions and can be collected easier than a disease dataset. The unsupervised features are extracted by applying multiple convolutional autoencoders to analyze various 2.5-dimensional images. The proposed method is evaluated by two kinds of problems: the detection of cerebral aneurysms in head MRA images and the detection of lung nodules in chest CT images. In both cases, the performance is high, showing an AUC of more than 0.96. These results show that the proposed method can automatically learn features that are useful for lesion recognition from lesion-free normal data, regardless of the type of image or lesion.

Keywords:

computer-aided diagnosis; unsupervised learning; convolutional autoencoder; 2.5-dimensional image analysis

1. Introduction

With the increasing sophistication of medical imaging technology, image diagnoses have become indispensable. In addition, the number of images per diagnosis is increasing [1]. These conditions increase the workload of the diagnostic imaging physician. Increased diagnostic imaging workload often leads to reduced diagnostic accuracy. For example, in a study by Li et al., 32 out of 83 cases were missed when detecting lung cancer in CT [2]. Therefore, there are growing expectations for AI computer-aided diagnosis: computer-aided diagnosis (CAD) systems can be used to improve the accuracy of diagnostic imaging.

CAD systems provide radiologists with the information required to diagnose images, such as the detection of lesions and suspicious local areas in medical images [3]. Several studies have reportedly improved diagnostic accuracy with the use of CAD systems. Kozuka et al. reported that using a CAD system for CT image diagnosis of lung nodules improved the detection sensitivity of inexperienced radiologists from 20.9% to 38.0% and reduced diagnosis time by 11.3% [4]. Li et al. also reported that the use of a CAD system for CT image diagnosis of obliterating lung cancer improved the area under the receiver operating characteristic (ROC) curve (AUC) of radiologists’ diagnostic accuracy from 0.763 to 0.854 [5]. Hirai et al. confirmed that using a CAD system to detect cerebral aneurysms on MR angiography (MRA) could improve the AUC from 0.931 to 0.983 [6]. Pacilè et al. also reported an improvement in breast cancer detection accuracy of mammography with concurrent use of a CAD system [7]. Both reports demonstrate the clinical usefulness of CAD systems; thus, further development of CAD technology is warranted.

In CAD systems, local image features are one of the key elements of system performance [8]. The typical processing steps of a CAD system are described using the example of a lesion detection system, which is one of the most common types of CAD systems. The lesion detection CAD system generally consists of pre-processing input data, the extraction of organ regions where the target lesion occurs, detecting lesion candidates, and classifying lesion candidates. High-quality local image features are essential for structuring sensitive lesion candidate detection and accurate lesion candidate classification. In the past, the features used in these processes were mainly custom-made and implemented. In recent years, many studies have implemented supervised deep learning methods, where one could simultaneously learn both classification and image feature extraction [9]. This simultaneous learning, which often leads to optimal image features for classification, improves classification/detection performances.

On the other hand, it is not easy to collect large amounts of medical image data for supervised deep learning of CAD systems [10]. When developing systems for rare diseases, it is often necessary to do so with almost no data on the target disease. In addition, machine-learning-based CAD system development requires annotation, which is the creation of teacher labels for lesion areas on the images. This annotation requires a high level of medical knowledge and can only be conducted with the help of experienced radiologists. Many studies and developments have recently used medical image open databases, such as the LIDC-IDRI (lung image database consortium image collection), for CAD systems [11]. However, there are many diseases for which no such database exists. Hence, machine learning methods that do not require large amounts of disease image data and teacher-label data, such as unsupervised and semi-supervised learning, have received much attention as one method of reducing the time and operating cost associated with data collection. This study focuses on unsupervised-learning-based feature extraction. In addition, we also note that due to new imaging equipment and reconstruction techniques, there are often cases where even normal medical image data without lesions cannot be collected sufficiently. Considering these facts, we considered it necessary to develop an unsupervised feature learning method that can train from only a few normal datasets.

This study proposes an unsupervised local image feature extraction method that can be applied to any CAD system for three-dimensional (3D) medical images. The proposed method requires only a normal image dataset that does not include lesions as the training dataset. Normal datasets are more accessible to obtain than disease datasets. Most data from patients who come for group examinations are normal and often stored in clinical institutions. The proposed method could learn the image feature extraction process without additional annotations using stored normal datasets. In addition, such unsupervised-learning-based methods do not use disease datasets for training, which enables stable learning and the supply of processing independent of the number of disease datasets. The proposed method extracts multiple image features from various 2.5-dimensional (2.5D) images [12,13] transformed from 3D images for multi-faceted image analyses and image dimensionality reduction to avoid over-fitting. The proposed method, which applies multiple convolutional autoencoders (CAEs) [14] to multiple 2.5D images and extracts multiple features from the results, can be used to extract the features required for detecting and recognizing small lesions on various 3D medical images. A simple CAE model is employed in the proposed method so that feature generation can be performed even with a small normal dataset for training. This study applies the proposed method to two types of lesion detection processing tasks with different modalities. The effectiveness of the proposed method is evaluated from its lesion detection performance.

2. Materials and Methods

2.1. Proposed Feature Extraction Method

The proposed unsupervised feature extraction method is to describe the information of a 3D local image patch as a feature vector. The 3D image patches here are local regions of images of interest extracted from a 3D medical image to be analyzed, such as lesion candidates. A flowchart of the proposed method is shown in Figure 1. First, the input 3D local image patches are transformed into four types of 2.5D image patches. Next, latent variables of the compressed pixel value information of each 2.5D image are extracted using multiple CAEs. Image features were calculated using the four extracted sets of latent variables. The feature sets of each of the four types of 2.5D images are integrated and output as a feature set of the input 3D image patches. The features obtained via the proposed method vary in content depending on the nature of the normal image data used for training. Therefore, the proposed method is generalized and can be applied to various types of 3D images by changing the normal training dataset. The following paragraphs describe, in detail, 2.5D image transformation, latent variable extraction using CAE, and feature extraction.

2.1.1. The 2.5D Image Transformation

An input 3D local image patch of 32 × 32 × 32 voxels is transformed into a 2.5D image patch consisting of three-channeled, two-dimensional (2D) images reflecting 3D information. The three channels are the axial, coronal, and sagittal slices through the 3D image patch center point. The 2.5D image transformation significantly reduces the number of pixels in the image input to the CAE, which is expected to avoid overfitting on the CAE training data due to the effect of the curse of dimensionality [15].

Four types of 2.5D images are used in this method: center slice extraction (CentSE) 2.5D images [12], maximum-intensity projection (MIP) 2.5D images [13], minimum-intensity projection (MinIP) 2.5D images, and mean-intensity projection (MeanIP) 2.5D images. The centSE-2.5D image, first proposed by Roth et al. for lymph node detection on CT images, consists of three images from which the axial, coronal, and sagittal center slices of the 3D patch image are extracted, as shown in Figure 2a.

MIP-2.5D image is obtained by applying a three-directional MIP process to a 3D patch image, as shown in Figure 2b. The MIP method, a method of observing the voxel values of a 3D image in any axial direction and projecting the largest of the values onto a 2D image, is often used when diagnostic radiologists read 3D medical images. In this study, MIP processing is applied in three directions, as in the CentSE-2.5D image, and a set of three MIP slices (axial, coronal, and sagittal) was used as the MIP-2.5D image. The MinIP-2.5D image and MeanIP-2.5D image are also sets of axial, coronal, and sagittal projection slices with minimum and mean voxel value projections in a similar way to the MIP-2.5D image, as shown in Figure 2c,d. The use of four 2.5D images with different properties is required to perform generalized local image feature extraction, regardless of the lesion or image type of interest.

2.1.2. Latent Variable Extraction Using CAE

Latent variables representing the features of the input image data are extracted from each of the four types of 2.5D images using CAE. The four CAEs, each of which learns the extraction of latent variables from the target 2.5D image, have a common structure, except for the number of units in the final layer, n. The unit number is the dimensions of the extracting latent variable vector by the CAE. This CAE model is viewed as an information compress function Enc(·) to extract an n-dimensional latent variable vector, z, from a 2.5D image, f;

E n c (f) = z

. The common structure of these CAEs is shown in Table 1. In the convolution layer, the image filtering process is used to enhance the characteristic parts of the image.

The max pooling layer compresses the feature images extracted in the convolution layer. The full-connection layer converts the pixel values of the multi-channel feature images extracted by the convolution and max pooling layers into latent variables.

The four CAEs used in the proposed method are both trained on 2.5D local patch images of normal image data without lesions. Hence, when the CAE is applied to unlearned images such as lesion images, the resulting latent variable is expected to show properties that differ from those of the latent variable encoding the normal image. The loss function in CAE learning is the L2 loss between the 2.5D image reproduced from the latent variables and the input 2.5D image. The reproduction image is obtained using a convolutional decoder having pairwise structures with the CAE. Adam optimizes these CAEs’ network parameters. The mini-batch size is 256, and the maximum number of epochs is 500. The number of latent variables to be output is the minimum number of dimensions for which the cumulative contribution of the CAE training data is greater than 90% in principal component analysis.

2.1.3. Feature Extraction

Multiple features are calculated based on the extracted latent variable vectors by the CAE. The feature set includes:

(a): The latent variables.
(b): The Mahalanobis distance to the normal dataset on the latent variable space.
(c): Features based on errors in 2.5D image reconstruction from the latent variables.

The Mahalanobis distance, DM_latent(f, S_nor), is the distance between an input 2.5D image, f, and a normal 2.5D patch image dataset, S_nor, used in the CAE training on the CAE latent variable space. The distribution of the normal dataset, S_nor, is considered when measuring the distance. Equation (1) is a calculation formula. Here,

μ_{n o r}

and

Σ_{n o r}

are the mean vector and covariance matrix of the normal dataset, S_nor, on the CAE latent variable space. Enc(f) means the latent variable vector calculated by the CAE model, Enc(∙), from a 2.5D image, f.

{D M_{l a t e n t} (f, S_{n o r})}^{2} = {(E n c (f) - μ_{n o r})}^{T} Σ_{n o r}^{- 1} (E n c (f) - μ_{n o r})

(1)

The pixel value statistics on the square difference 2.5D image,

d_{f - \hat{f}}

, which is between the input 2.5D image, f, and the reproduced 2.5D image from latent variables,

\hat{f}

, are also used as features. The formula of

d_{f - \hat{f}}

is as in Equation (2). Here,

f (x, y, c)

represents the c-channel’s pixel value at 2D coordinate (x, y) of image f.

ℝ^{W \times H \times C h}

is the set of the pixel coordinates within the 2.5D images, f and

\hat{f}

.

d_{f - \hat{f}} = {d (x, y, c) = \sqrt{f (x, y, c) - \hat{f} (x, y, c)} | (x, y, c) \in ℝ^{W \times H \times C h}}

(2)

The nine statistics for pixel intensity of

d_{f - \hat{f}}

are the maximum, minimum, mean, standard deviation, skewness, kurtosis, first quartile, second quartile, and third quartile.

Image reproduction was performed using the convolutional decoder, which is used in the CAE training and has a pairwise structure with the CAE. Latent variable extraction from lesion images is expected to be less accurate than normal images. In addition, the reproduced images obtained from the latent variables of the lesion data are also expected to differ significantly from the input images. The above features were used to assess these differences.

2.2. Proposed Method Application for Evaluation with Clinical Data

The performance of the proposed method is evaluated by applying it to problems of lesion detection on medical image data. The problems are the detection of cerebral aneurysms on time-of-flight unenhanced head magnetic resonance angiography (MRA) images and the detection of lung nodules on unenhanced chest CT images. Both experiments are proposed with the clinical image dataset shown in Table 2 and Table 3. The image features obtained via the proposed method are used to classify lesion candidates detected using the method described below. Lesion candidate classification aims to remove false-positive (FP) candidates.

2.2.1. Detection of Cerebral Aneurysms in Head MRA Images with Proposed Feature Extraction

This cerebral aneurysm detection method is applicable to MRA images scaled to 0.6 mm isotropic voxels.

First, brain artery regions are extracted by applying adaptive thresholding, morphological opening and closing, and labeling processes. Next, aneurysm candidates are detected by the sliding window method with a cubic window of 32 voxels per side and 16-voxel stride. When more than 10% of the cubic window area overlaps with the extracted brain artery regions, the cubic area is detected as a candidate 3D image patch. Finally, FP candidate patches are classified and eliminated by the lesion candidate classification process with the proposed unsupervised feature extraction and an Ada-Boosted classifier ensemble [16] using a feature set extracted by the proposed unsupervised feature extraction method.

The proposed unsupervised feature extraction with multiple CAEs is applied to exclude FP candidate patches from the true aneurysm patches. The feature extraction is pre-trained using normal vascular image patches obtained from 252 cases of normal head MRA image datasets, as shown in Table 2. These images were scanned for screening purposes at the University of Tokyo Hospital.

In contrast, the Ada-Boosted classifier ensemble for FP candidate elimination is pre-trained in a supervised fashion, using a large number of normal patches and a small number of true aneurysm patches. The output of the Ada-Boosted classifier ensemble, H(x), is shown in the following Equation (3), where x is the feature vector, h_i(∙) is the i-th weak classifier, α_i is the weight coefficient of h_i(∙), and θ is the control bias.

H (x) = \sum_{i} α_{i} h_{i} (x) + θ

(3)

The weight, α_i, is positively correlated with the corresponding h_i(∙) classification performance. When the function H(x) appears positive or negative, the lesion candidate patch from which the feature vector x is extracted is classified as a true lesion or FP.

In the supervised learning of the classifier ensemble, weak classifiers are sequentially added to improve the classification accuracy of the ensemble, following the Ada-Boost algorithm [16]. This learning also introduces cost-sensitive learning [17], which assigns a significant penalty if a minor number of lesion candidate data are misidentified in the learning process. The cost χ(∙) for true lesion candidates and FP candidates are calculated using Equations (4) and (5). Cost-sensitive learning helps avoid the generation of a biased classifier that can only identify FP candidates.

χ (t r u e l e s i o n c a n d i d a t e) = \frac{n (t r u e l e s i o n c l a s s) + n (F P c l a s s)}{n (F P c l a s s)}

(4)

χ (F P c a n d i d a t e) = \frac{n (t r u e l e s i o n c l a s s) + n (F P c l a s s)}{n (r u e l e s i o n c l a s s)}

(5)

In this study, decision stumps are employed as weak classifiers. The decision stump is a single-feature thresholding. In other words, this learning is regarded as a sequential feature selection process suitable for the classification problem.

The performance is evaluated in three-fold cross-validation with a dataset of 378 head MRA images containing at least 1 cerebral artery, as shown in Table 2. These images were also scanned for screening purposes at the University of Tokyo Hospital. The dataset was assigned lesion teacher labels defined by a consensus of at least two experienced radiologists who perform MRI image readings as part of their daily work. Both radiologists had over 10 years of MRA-reading experience. The dataset sorted for training in this cross-validation is used to train the Ada-Boosted classifier ensemble used in the lesion candidate identification process.

2.2.2. Detection of Lung Nodules in Chest CT Images with Proposed Feature Extraction

Lung nodule detection is applicable to chest CT images scaled to 1.25 mm isotropic voxels. After extracting the lung field regions through thresholding and hole filling, the vascular regions within the lung field are extracted using a combination of thresholding, morphological processing, and small-area removal [18].

Similar to the cerebral aneurysm detection described above, a cubic window of 32 voxels per side is scanned with a stride of 16 voxels on the vascular region’s binary image to extract lesion candidate patches overlapping with the pulmonary vascular regions by more than 10% of the window area. An Ada-Boosted classifier ensemble classifies and eliminates FP candidates with the image features of the proposed unsupervised feature extraction.

Normal vascular image patches automatically detected via the above lesion candidate detection process from 300 cases of normal chest CT image data are used to learn the feature extraction process. These images were scanned for screening purposes at the University of Tokyo Hospital. A cost-sensitive method is introduced for training the classifier ensemble, as in the classification process for cerebral aneurysm candidates. The weak classifiers in the ensemble also use decision stumps.

The pulmonary nodule detection performance was evaluated via three-fold cross-validation on a dataset of 450 chest CT images containing at least one pulmonary nodule. These images were also scanned for screening purposes at the University of Tokyo Hospital. The nodule teacher labels were also determined by a consensus between at least two radiologists who performed CT image readings as part of their daily work. Both radiologists had over 10 years of CT-reading experience.

2.2.3. Evaluation of Lesion Candidate Classification

The AUC, which is the area under the ROC (receiver operating characteristics) curve, and the ANODE score [19] are used as measures of classification performance. The ANODE score is the average sensitivity of 1/8, 1/4, 1/2, 1, 2, 4, and 8 false positives per case. The better the discriminator, the better the ANODE score. The better the discriminator, the closer the AUC and ANODE score are to 1.0. To visually observe the change in sensitivity with an increasing number of FPs per case, the free-response receiver operating characteristic (FROC) curve is also observed.

We also observe the features frequently used in the classifier ensemble obtained by Ada-Boost learning via the feature selection process: the sequential addition of decision stump discriminators in the Ada-Boost learning process is synonymous with the sequential selection of features used in the decision stump. Frequently selected features with a large weight coefficient, α_i, can be regarded as effective features for the identification problem. Therefore, the weighted feature selection frequency,

ρ (x, H (\cdot))

, is evaluated. Its calculation is shown in Equation (6). Where α_i is a weight coefficient of i-th weak classifier

h_{i} (\cdot)

,

β_{i} (x)

is a Boolean function that indicates 1 if the feature x is used in the i-th weak classifier and 0 otherwise.

ρ (x, H (\cdot)) = \frac{\sum_{i} α_{i} β_{i} (x)}{\sum_{i} α_{i}}

(6)

3. Results

3.1. Results of Cerebral Aneurysm Detection in Head MRA Images

An average of 161.4 lesion candidate patches per case were detected in the lesion candidate detection. Of which 147.0 were FP candidate patches with no overlap with the true lesions. The proposed unsupervised feature extraction was applied to eliminate these FP candidates. The ROC curves for the classification results of these lesion candidate patches are shown in the left graph in Figure 3a. The average AUC and standard deviation of the three-fold cross-validation were 0.987 and 0.003, respectively.

Figure 4a shows the mean FROC curve for the three classification results from three-fold cross-validation. The FROC curve is a subtype of the ROC curve with the number of FPs per case on the horizontal axis. Error bars shown on the curve are the standard deviations of sensitivity at seven different FPs per case used to calculate the ANODE score. The mean and standard deviation of the ANODE score were 0.543 and 0.025, respectively.

The average number of latent variables in the cross-validation is as follows; 140 latent variables were extracted from the CentSE-2.5D images, 137 latent variables from the MIP-2.5D images, 56 latent variables from the MinIP-2.5D images, and 37 latent variables from the MeanIP-2.5D images. Together with the features calculated from these latent variables, the proposed method provided an average of 410 image features per candidate in the cross-validation. The top 10 features in the weighted selection frequencies,

ρ (\cdot)

, in the Ada-boosted ensemble learning are shown in Figure 5a. This figure shows that the latent variable set of all four types of 2.5D images had high selection times. Latent variables from MIP-2.5D images were selected most frequently with large weight coefficients. The next most frequently selected set of latent variables was from MinIP-2.5D images. The pixel value statistics from the difference image between the original patch and reproduced patch by MIP-2.5D CAE and related decoder were also selected relatively often.

Table 4 compares the candidate classification performances between the proposed and related research methods by supervised machine learning with tailor-made features [20] and deep supervised learning [13]. The lesion detection performances by any method other than the proposed method were cited from the respective articles [13,20]. Thus, each evaluation value was obtained from a different dataset.

3.2. Results of Pulmonary Nodule Detection in Chest CT Images

An average of 3435.0 lesion candidate patches per case were detected in lesion candidate detection, with 3414.0 for the FP class. The ROC curves for the classification results of these candidate patches are shown in Figure 3b, and the mean AUC and standard deviation of the three-fold cross-validation were 0.969 and 0.008, respectively. Figure 4b shows the mean FROC curve for the three classification results from three-fold cross-validation. The error bars shown on the curve are the standard deviation of sensitivity at seven different FPs per case used to calculate the ANODE score. The mean and standard deviation of the ANODE score were 0.402 and 0.109, respectively.

An average of twenty-one latent variables were extracted from the CentSE-2.5D image, sixty-two latent variables from the MIP-2.5D image, twenty latent variables from the MinIP-2.5D image, and six latent variables from the MeanIP-2.5D image. Together with the features calculated from these latent variables, the proposed method provided an average of 109 image features/candidates in the cross-validation. The top 10 features of the weighted selection frequencies in ensemble learning are shown in Figure 5b. This figure shows that the Mahalanobis distance, DM_latent, on the MIP-2.5D image latent variable space is the most frequently selected. Other features extracted from MIP-2.5D images were also used. In addition, the features extracted from MinIP-2.5D images were also frequently selected.

Table 5 compares the candidate classification performances between the proposed method and related studies by supervised machine learning with tailor-made features [18] and deep supervised learning [21]. The lesion detection performances by any method other than the proposed method were cited from the respective articles [18,21]. The datasets used in each validation experiment were different.

4. Discussion

Candidate classification with the features obtained by the proposed method on about 160 candidates per case automatically extracted from MRA images achieved an average AUC of 0.987. The classification using the features by the proposed method for about 3400 candidates per case automatically extracted from CT images yielded an average AUC of 0.969. These are acceptable performances for clinical purposes. In addition, these results were higher performance than the unsupervised feature extraction by CAE [14] from 3D image patch directly. In this method, latent variables extracted from the 3D image patch were only used as the features. This means that the proposed method can automatically learn the useful features for lesion recognition from lesion-free normal data, regardless of the type of image or lesion.

In this evaluation experiment, Ada-Boost, which builds an ensemble by a sequential selection of features, was used as the training method for the discriminator. As the proposed method is an unsupervised feature extraction method, there is no guarantee that all extracted features are helpful for the target recognition problem. Using a classifier learning method with feature selection was considered effective in maximizing the performance of the features obtained by the proposed method. The performances of both Ada-Boosted classifier ensembles were high, showing an AUC of more than 0.96. In this evaluation experiment, we used Ada-Boost, a simple learning method that happens to be one of the Boosting methods. However, using Gradient Boosting (which can learn discriminator ensembles with high discriminative performance) and other feature selection methods will also lead to the more effective use of features obtained via the proposed method. Differences in feature selection tendencies were found when comparing the AdaBoost learning of the aneurysm candidate classifier ensemble to the classifier for lung nodule candidates. MIP-2.5D latent variables were the most frequently selected for classifying cerebral aneurysm candidates, followed by the MinIP-2.5D latent variable set and the CentSE-2.5D latent variable set. On the other hand, the Mahalanobis distance on the MinIP-2.5D latent variable space was the most frequently selected for classifying lung nodule candidates. The second most frequently selected latent variable was the MinIP-2.5D latent variable set. The difference in selection trends indicates that the 2.5D images suitable for image analysis differ depending on the type of image and lesion type under consideration. On the other hand, it was not the case that only certain 2.5D image-derived features were not selected to an extreme degree. The results demonstrate the effectiveness of using multiple 2.5D types. In addition to latent variables, the Mahalanobis distance on the latent variable space and the pixel value statistics of different 2.5D images,

d_{f - \hat{f}}

, also contributed to the classification of lesion candidates. They show the validity of not just using latent variables obtained by CAE as features but also extracting valuable features from them.

The four 2.5D images used in the proposed method were all composed of three cross-section slices (axial, coronal, and sagittal) with fixed angles. Separate latent variable analysis and feature extraction were performed on each 2.5D image. However, additional cross-sectional angles may achieve higher performance than the proposed method. Nakao et al. achieved highly accurate aneurysm detection using nine-channel MIP-2.5D image processing, which included the three cross sections used in the proposed method plus oblique slices. On the other hand, only a lower performance than that shown here was achieved when twelve channels comprising the four 2.5D images used in this study were analyzed simultaneously as a single 2.5D image. This lower performance suggests the need to find the optimal combination of cross-sectional images. Thus, we plan to address the problem of optimizing the set of cross-sectional slices to be processed experimentally.

The proposed method employs a simple CAE model to enable feature generation with a small training dataset. The experiments were conducted using a small number of training datasets to verify the robustness in situations with a small training dataset. In an experiment to classify candidate cerebral aneurysms, when the number of normal cases used for feature training was 30 or more, there was no statistically significant difference from the candidate identification performance based on training with 252 normal cases (U-test of AUC). These results showed the robustness of the proposed method in situations with a small number of training cases.

The limitation of the proposed method is its poor classification performance compared to handcrafted features and features obtained via supervised learning, as shown in Table 4 and Table 5. The brain aneurysm detection method by Nomura et al. [20] using handcrafted features was a two-stage lesion detection process in which discriminative processing using multiple features was performed on lesion candidates detected using luminance curvature features. The method achieved a sensitivity of 93.5% at five FPs per case via the use of appropriate image features for lesion candidate detection and candidate classification. This sensitivity is about 10% higher than the present experimental results. However, in the present experiment, the features in the proposed method were only used for the identification of lesion candidates. The fact that the features in the proposed method were not used to detect lesion candidates is the main reason for the difference in performance. Further comparisons should be made again using the results of the application of the features in the proposed method to the lesion candidate detection process.

Nakao et al. also proposed a method for discriminating all local image patches containing automatically extracted vascular regions using deep supervised learning CNN [13]. Its final detection performance was 94.2% sensitivity when the number of FPs per case was 2.9. Xie et al. proposed pulmonary nodule detection in CT images using deep convolutional neural networks and achieved high detection performance with a 0.790 ANODE score [21]. Although simple comparisons are difficult due to different evaluation datasets, this score is 0.39 higher than the results of the current experiment. Supervised deep learning with both normal and lesion class data is a powerful learning method that allows the derivation of feature spaces and discriminative boundaries that effectively separate both classes of data. This characteristic led to significantly higher sensitivity than our experiments. On the other hand, the methods proposed by Nakao et al. [13] and Xie et al. [21] are likely to experience significant performance degradation if they can only be trained on a small amount of lesion data. The features obtained via the proposed method do not depend on the size of the cerebral aneurysm dataset collected for system development. This characteristic is a significant advantage of the proposed method using unsupervised learning.

5. Conclusions

We proposed an unsupervised local image feature extraction method that can be applied to any CAD system for 3D medical images. The proposed method, which does not include lesions as the training dataset, requires only a normal image dataset. A normal dataset is more accessible than a disease dataset. These image features are extracted based on latent variables obtained by applying multiple CAEs to 2.5D images. The proposed method was evaluated by applying it to two lesion detection problems: the detection of cerebral aneurysms in head MRA images and the detection of lung nodules in chest CT images. In both cases, the performance was high, showing an AUC of more than 0.96. These results show that the proposed method can automatically learn features that are useful for lesion recognition from lesion-free normal data, regardless of the type of image or lesion. Future work will be conducted to achieve performance similar to features obtained by supervised deep learning. For example, a highly accurate anomaly detection method using heterogeneous class data pseudo-generated from training normal data will be investigated. Contrastive learning [22] to achieve efficient pre-training for supervised fine-tuning is also a unique topic for an improvement in our method. Also, we plan to address the problem of finding adaptive 2.5D cross-sectional slice sets for feature extraction. This study will investigate the axial, coronal, and sagittal cross sections used in this study and other oblique slices from multiple angles in terms of their contribution to lesion analysis.

Author Contributions

M.N. and K.U. were responsible for all procedures, feature extraction system construction, analysis, and writing of this manuscript. K.U., Y.K., T.N. and T.Y. (Takahiro Yamada) contributed to methodology creation and refinement. T.Y. (Takeharu Yoshikawa) contributed to project administration, clinical data collection, and lesion label annotation. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Grants-in-Aid for Scientific Research by Japan Society for the Promotion of Science (grant numbers: JP17H05284 and JP20K11944).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

The study described in this manuscript was approved by the Research Ethics Board of the University of Tokyo Hospital and the Ethics Board of Kindai University BOST. Informed consent was obtained from all individual participants.

Data Availability Statement

The datasets used and/or analyzed during this study are available from the corresponding author or reasonable request.

Acknowledgments

We would like to thank the Department of Computational Diagnostic Radiology and Preventive Medicine, at the University of Tokyo Hospital.

Conflicts of Interest

The authors declare no conflict of interest.

References

Rubin, G.D. Data Explosion: The Challenge of Multidetector-Row CT. Eur. J. Radiol. 2000, 36, 74–80. [Google Scholar] [CrossRef] [PubMed]
Li, F.; Sone, S.; Abe, H.; MacMahon, H.; Armato, S.G.; Doi, K. Lung Cancers Missed at Low-Dose Helical CT Screening in a General Population: Comparison of Clinical, Histopathologic, and Imaging Findings. Radiology 2002, 225, 673–683. [Google Scholar] [CrossRef] [PubMed]
Doi, K. Current Status and Future Potential of Computer-Aided Diagnosis in Medical Imaging. Br. J. Radiol. 2005, 78, s3–s19. [Google Scholar] [CrossRef] [PubMed]
Kozuka, T.; Matsukubo, Y.; Kadoba, T.; Oda, T.; Suzuki, A.; Hyodo, T.; Im, S.; Kaida, H.; Yagyu, Y.; Tsurusaki, M.; et al. Efficiency of a Computer-Aided Diagnosis (CAD) System with Deep Learning in Detection of Pulmonary Nodules on 1-Mm-Thick Images of Computed Tomography. Jpn. J. Radiol. 2020, 38, 1052–1061. [Google Scholar] [CrossRef] [PubMed]
Li, F.; Arimura, H.; Suzuki, K.; Shiraishi, J.; Li, Q.; Abe, H.; Engelmann, R.; Sone, S.; MacMahon, H.; Doi, K. Computer-Aided Detection of Peripheral Lung Cancers Missed at CT: ROC Analyses without and with Localization. Radiology 2005, 237, 684–690. [Google Scholar] [CrossRef] [PubMed]
Hirai, T.; Korogi, Y.; Arimura, H.; Katsuragawa, S.; Kitajima, M.; Yamura, M.; Yamashita, Y.; Doi, K. Intracranial Aneurysms at MR Angiography: Effect of Computer-Aided Diagnosis on Radiologists’ Detection Performance. Radiology 2005, 237, 605–610. [Google Scholar] [CrossRef] [PubMed]
Pacilè, S.; Lopez, J.; Chone, P.; Bertinotti, T.; Grouin, J.M.; Fillard, P. Improving Breast Cancer Detection Accuracy of Mammography with the Concurrent Use of an Artificial Intelligence Tool. Radiol. Artif. Intell. 2020, 2, e190208. [Google Scholar] [CrossRef] [PubMed]
van Ginneken, B.; Schaefer-Prokop, C.M.; Prokop, M. Computer-Aided Diagnosis: How to Move from the Laboratory to the Clinic. Radiology 2011, 261, 719–732. [Google Scholar] [CrossRef] [PubMed]
Litjens, G.; Kooi, T.; Bejnordi, B.E.; Setio, A.A.A.; Ciompi, F.; Ghafoorian, M.; van der Laak, J.A.W.M.; van Ginneken, B.; Sánchez, C.I. A Survey on Deep Learning in Medical Image Analysis. Med. Image Anal. 2017, 42, 60–88. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Cheplygina, V.; de Bruijne, M.; Pluim, J.P.W. Not-so-Supervised: A Survey of Semi-Supervised, Multi-Instance, and Transfer Learning in Medical Image Analysis. Med. Image Anal. 2019, 54, 280–296. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Armato III, S.G.; McLennan, G.; Bidaut, L.; McNitt-Gray, M.F.; Meyer, C.R.; Reeves, A.P.; Zhao, B.; Aberle, D.R.; Henschke, C.I.; Hoffman, E.A.; et al. The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): A Completed Reference Database of Lung Nodules on CT Scans. Med. Phys. 2011, 38, 915–931. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Roth, H.R.; Lu, L.; Seff, A.; Cherry, K.M.; Hoffman, J.; Wang, S.; Liu, J.; Turkbey, E.; Summers, R.M. A New 2.5D Representation for Lymph Node Detection Using Random Sets of Deep Convolutional Neural Network Observations. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2014, Boston, MA, USA, 14–18 September 2014; Golland, P., Hata, N., Barillot, C., Hornegger, J., Howe, R., Eds.; Springer International Publishing: Cham, Switzerland, 2014; pp. 520–527. [Google Scholar]
Nakao, T.; Hanaoka, S.; Nomura, Y.; Sato, I.; Nemoto, M.; Miki, S.; Maeda, E.; Yoshikawa, T.; Hayashi, N.; Abe, O. Deep Neural Network-Based Computer-Assisted Detection of Cerebral Aneurysms in MR Angiography. J. Magn. Reson. Imaging 2018, 47, 948–953. [Google Scholar] [CrossRef] [PubMed]
Masci, J.; Meier, U.; Cireşan, D.; Schmidhuber, J. Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction. In Proceedings of the Artificial Neural Networks and Machine Learning—ICANN 2011, Espoo, Finland, 14–17 June 2011; Honkela, T., Duch, W., Girolami, M., Kaski, S., Eds.; Springer: Berlin/Heidelberg, Germany, 2011; pp. 52–59. [Google Scholar]
Keogh, E.; Mueen, A. Curse of Dimensionality. In Encyclopedia of Machine Learning and Data Mining; Sammut, C., Webb, G.I., Eds.; Springer: Boston, MA, USA, 2017; pp. 314–315. ISBN 978-1-4899-7687-1. [Google Scholar]
Viola, P.; Jones, M. Rapid Object Detection Using a Boosted Cascade of Simple Features. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2001, Kauai, HI, USA, 8–14 December 2001; Volume 1, p. I. [Google Scholar]
Sun, Y.; Kamel, M.S.; Wong, A.K.C.; Wang, Y. Cost-Sensitive Boosting for Classification of Imbalanced Data. Pattern Recognit. 2007, 40, 3358–3378. [Google Scholar] [CrossRef]
Nomura, Y.; Nemoto, M.; Masutani, Y.; Hanaoka, S.; Yoshikawa, T.; Miki, S.; Maeda, E.; Hayashi, N.; Yoshioka, N.; Ohtomo, K. Reduction of False Positives at Vessel Bifurcations in Computerized Detection of Lung Nodules. J. Biomed. Graph. Comput. 2014, 4, p36. [Google Scholar] [CrossRef] [Green Version]
Van Ginneken, B.; Armato, S.G.; De Hoop, B.; Van Amelsvoort-van De Vorst, S.; Duindam, T.; Niemeijer, M.; Murphy, K.; Schilham, A.; Retico, A.; Fantacci, M.E.; et al. Comparing and Combining Algorithms for Computer-Aided Detection of Pulmonary Nodules in Computed Tomography Scans: The ANODE09 Study. Med. Image Anal. 2010, 14, 707–722. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Nomura, Y.; Masutani, Y.; Miki, S.; Nemoto, M.; Hanaoka, S.; Yoshikawa, T.; Hayashi, N.; Ohtomo, K. Performance Improvement in Computerized Detection of Cerebral Aneurysms by Retraining Classifier Using Feedback Data Collected in Routine Reading Environment. J. Biomed. Graph. Comput. 2014, 4, 12. [Google Scholar] [CrossRef] [Green Version]
Xie, H.; Yang, D.; Sun, N.; Chen, Z.; Zhang, Y. Automated Pulmonary Nodule Detection in CT Images Using Deep Convolutional Neural Networks. Pattern Recognit. 2019, 85, 109–119. [Google Scholar] [CrossRef]
Jaiswal, A.; Babu, A.R.; Zadeh, M.Z.; Banerjee, D.; Makedon, F. A Survey on Contrastive Self-Supervised Learning. Technologies 2021, 9, 2. [Google Scholar] [CrossRef]

Figure 1. Flowchart of the proposed feature extraction method.

Figure 2. Examples of 2.5D image extractions. (a) CentSE-2.5D image, (b) MIP-2.5D image, (c) MinIP-2.5D image, and (d) MeanIP-2.5D image.

Figure 3. ROC curves of two lesion detection experiments with three-fold cross-validation: (a) for aneurysm candidate classification and (b) for lung nodule candidate classification.

Figure 4. Mean FROC curves of two lesion detection experiments with three-fold cross-validation: (a) for aneurysm candidate classification and (b) for lung nodule candidate classification.

Figure 5. The top ten features in the weighted selection frequencies, ρ(∙), in training of Ada-Boosted classifier ensembles: (a) for aneurysm candidate classification, and (b) for lung nodule candidate classification.

Table 1. The common structure of the CAEs.

Layer	Kernel Size	Stride	Output Size
2.5D Input	−	−	32 × 32 × 3 ch
Conv + BN + ReLU	3 × 3	1	32 × 32 × 6 ch
Max Pooling	2 × 2	2	16 × 16 × 6 ch
Conv + BN + ReLU	3 × 3	1	16 × 16 × 9 ch
Max Pooling	2 × 2	2	8 × 8 × 9 ch
Conv + BN + ReLU	3 × 3	1	8 × 8 × 12 ch
Full Connection	−	−	n

(Conv: convolution, BN: batch normalization).

Table 2. Brain MRA image dataset for training proposed feature generation and evaluation.

Dataset	Diseased (for Evaluation)	Normal (for Training)
N cases (male:female)	378 (208:170)	252 (131:120)
Ages, average ± deviation (min, max)	61.9 ± 11.0 (34, 85)	55.6 ± 11.6 (34, 90)
N lesions	434	0
Lesion diameter (mm), average ± deviation (min, max)	3.08 ± 1.28 (2, 9)	–
Scanners	GE Signa HDxt 3.0T or GE DiscoveryMR750 3.0T
Original pixel size (mm)	0.468 × 0.468
Original slice thickness (mm)	0.6
Scanning date	Nov. 2006~May 2013
Institute	The University of Tokyo Hospital

Table 3. Chest CT image dataset for training proposed feature generation and evaluation.

Dataset	Diseased (for Evaluation)	Normal (for Training)
N cases (male:female)	450 (281:169)	300 (194:106)
Ages, average ± deviation (min, max)	60.2 ± 11.3 (40, 90)	51.5 ± 9.09 (40, 81)
N lesions	582	0
Lesion diameter (mm), average ± deviation (min, max)	7.61 ± 3.37 (5, 31)	–
Scanners	GE LightSpeed CT scanner
Original pixel size (mm)	0.781 × 0.781
Original slice thickness (mm)	1.25
Scanning date	January 2007~March 2016
Institute	The University of Tokyo Hospital

Table 4. Comparison of aneurysm candidate classification sensitivities; the classification with proposed unsupervised generated features and those by other supervised learnings.

Num. FPs/Case	Proposed	Nomura et al. [20]	Nakao et al. [13]
2.9	70.0 %	–	94.2%
5.0	83.3 %	93.5%	–
9.0	91.4 %	95.2%	–

Table 5. Comparison of sensitivity and ANODE scores for lung nodule candidate classification; the classification with proposed unsupervised generated features and those by other supervised learnings.

Num. FPs/Case	Proposed	Nomura et al. [18]	Xie et al. [21]
4.8	46.3%	80.0%	–
14.1	57.2%	90.0%	–
ANODE score	0.218	–	0.790

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Nemoto, M.; Ushifusa, K.; Kimura, Y.; Nagaoka, T.; Yamada, T.; Yoshikawa, T. Unsupervised Feature Extraction for Various Computer-Aided Diagnosis Using Multiple Convolutional Autoencoders and 2.5-Dimensional Local Image Analysis. Appl. Sci. 2023, 13, 8330. https://doi.org/10.3390/app13148330

AMA Style

Nemoto M, Ushifusa K, Kimura Y, Nagaoka T, Yamada T, Yoshikawa T. Unsupervised Feature Extraction for Various Computer-Aided Diagnosis Using Multiple Convolutional Autoencoders and 2.5-Dimensional Local Image Analysis. Applied Sciences. 2023; 13(14):8330. https://doi.org/10.3390/app13148330

Chicago/Turabian Style

Nemoto, Mitsutaka, Kazuyuki Ushifusa, Yuichi Kimura, Takashi Nagaoka, Takahiro Yamada, and Takeharu Yoshikawa. 2023. "Unsupervised Feature Extraction for Various Computer-Aided Diagnosis Using Multiple Convolutional Autoencoders and 2.5-Dimensional Local Image Analysis" Applied Sciences 13, no. 14: 8330. https://doi.org/10.3390/app13148330

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Unsupervised Feature Extraction for Various Computer-Aided Diagnosis Using Multiple Convolutional Autoencoders and 2.5-Dimensional Local Image Analysis

Abstract

Featured Application

Abstract

1. Introduction

2. Materials and Methods

2.1. Proposed Feature Extraction Method

2.1.1. The 2.5D Image Transformation

2.1.2. Latent Variable Extraction Using CAE

2.1.3. Feature Extraction

2.2. Proposed Method Application for Evaluation with Clinical Data

2.2.1. Detection of Cerebral Aneurysms in Head MRA Images with Proposed Feature Extraction

2.2.2. Detection of Lung Nodules in Chest CT Images with Proposed Feature Extraction

2.2.3. Evaluation of Lesion Candidate Classification

3. Results

3.1. Results of Cerebral Aneurysm Detection in Head MRA Images

3.2. Results of Pulmonary Nodule Detection in Chest CT Images

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI