*Article* **Deep Learning Supplants Visual Analysis by Experienced Operators for the Diagnosis of Cardiac Amyloidosis by Cine-CMR**

**Philippe Germain 1,\*, Armine Vardazaryan 2,3, Nicolas Padoy 2,3, Aissam Labani 1, Catherine Roy 1, Thomas Hellmut Schindler <sup>4</sup> and Soraya El Ghannudi 1,5**


**Abstract:** Background: Diagnosing cardiac amyloidosis (CA) from cine-CMR (cardiac magnetic resonance) alone is not reliable. In this study, we tested if a convolutional neural network (CNN) could outperform the visual diagnosis of experienced operators. Method: 119 patients with cardiac amyloidosis and 122 patients with left ventricular hypertrophy (LVH) of other origins were retrospectively selected. Diastolic and systolic cine-CMR images were preprocessed and labeled. A dual-input visual geometry group (VGG ) model was used for binary image classification. All images belonging to the same patient were distributed in the same set. Accuracy and area under the curve (AUC) were calculated per frame and per patient from a 40% held-out test set. Results were compared to a visual analysis assessed by three experienced operators. Results: frame-based comparisons between humans and a CNN provided an accuracy of 0.605 vs. 0.746 (*p* < 0.0008) and an AUC of 0.630 vs. 0.824 (*p* < 0.0001). Patient-based comparisons provided an accuracy of 0.660 vs. 0.825 (*p* < 0.008) and an AUC of 0.727 vs. 0.895 (*p* < 0.002). Conclusion: based on cine-CMR images alone, a CNN is able to discriminate cardiac amyloidosis from LVH of other origins better than experienced human operators (15 to 20 points more in absolute value for accuracy and AUC), demonstrating a unique capability to identify what the eyes cannot see through classical radiological analysis.

**Keywords:** cardiac amyloidosis; AL/TTR amyloidosis; hypertrophic cardiomyopathy; left ventricular hypertrophy; deep learning; convolutional neural network

#### **1. Introduction**

Cardiac amyloidosis (CA) is a specific cardiomyopathy caused by the deposition of misfolded amyloid fibrils in the extracellular myocardial space. Light-chain (AL) and transthyretin (TTR) are the most common subtypes. Cardiac amyloidosis is a fatal disease requiring rapid diagnosis for patients to benefit from recently released medications [1–3]. Its diagnosis has gained significant improvements in recent years, in particular with the recognition of diphosphonate SPECT imaging for the identification of the TTR form of the disease [4].

MRI plays an important role in this field thanks to gadolinium injections providing quite a specific pattern of myocardial late-enhancement [5] and demonstrating highly relevant extracellular volume (ECV) increase [6]. Despite recent relief in the restrictions

**Citation:** Germain, P.; Vardazaryan, A.; Padoy, N.; Labani, A.; Roy, C.; Schindler, T.H.; El Ghannudi, S. Deep Learning Supplants Visual Analysis by Experienced Operators for the Diagnosis of Cardiac Amyloidosis by Cine-CMR. *Diagnostics* **2022**, *12*, 69. https://doi.org/10.3390/ diagnostics12010069

Academic Editors: Sameer Antani and Sivaramakrishnan Rajaraman

Received: 9 December 2021 Accepted: 27 December 2021 Published: 29 December 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

on the use of gadolinium chelates [7], caution needs to be exercised in case of renal impairment, and a diagnostic approach without injection would be beneficial. Steady-state free precession (SSFP) cine-CMR is a basic method in cardiac MRI, offering a good quality morphological and functional depiction of important cardiac features [8]. Myocardial wall thickening, atrial enlargement and pericardial or pleural effusion constitute the hallmarks of amyloid cardiac involvement [9]. However, these signs are very nonspecific since they are also seen in many other etiologies of left ventricular hypertrophy such as advanced hypertensive disease, aortic stenosis and other overload diseases such as Fabry disease and sarcomeric hypertrophic cardiomyopathies, which is why cine-CMR alone is not recognized as effective for diagnosing cardiac amyloidosis.

Machine learning and, particularly, deep learning applied to imaging quickly established themselves in most pathological areas, and these methods are now recognized as having diagnostic capacities similar to experienced radiologists, particularly in cardiomyopathies [10] and cardiac amyloidosis [11]. An even more interesting fact concerns the superior diagnostic capacities of deep learning over human readers in some fields, such as breast cancer [12], especially its ability to identify pathologies invisible to the naked eye, such as abnormalities discernible only in immunohistochemistry or through genetic analysis. For example, deep learning was reported to be efficient in improving mutation prediction in hypertrophic cardiomyopathy using MR-cine images [13].

This innovative concept led us to initiate the present study in which we compared the performance of commonly available deep learning methods to experienced radiologists to discriminate cardiac amyloidosis from other myocardial hypertrophies based on cine-CMR alone. Moreover, we explored the capacity of deep learning to differentiate AL from TTR amyloidosis, which is not reliably achievable visually with cine-CMR.

#### **2. Materials and Methods**

#### *2.1. Study Population*

We retrospectively analyzed the cine-CMR sequences of patients performed between 2010 and 2020 in the radiology department of our hospital. This study was registered and approved by the Institutional Review Board of our university hospital, and all datasets were obtained and de-identified, with waived consent in compliance with the rules of our institution. The cine-CMR exams of 241 patients were studied, including 119 with histologically proven amyloidosis and 122 with left ventricular hypertrophy without amyloidosis (LVH). The patients' characteristics are listed in Table 1.

The left ventricular hypertrophy without amyloidosis group (*n* = 122) consisted of patients referred to CMR for suspected cardiac amyloidosis due to several suggestive features such as a heart failure episode, thickening of the myocardial walls on ultrasound examination, restrictive transmitral Doppler filling pattern, reduced longitudinal strain with apical sparring, monoclonal gammopathy or dubious Perugini grade 1 bone scintigraphy. Other cases presented a CMR of concentric left ventricular hypertrophy (left ventricular wall thickness ≥13 mm in diastole). The clinical context was consistent with hypertension, aortic stenosis or non-obstructive hypertrophic cardiomyopathy. Late-enhancement imaging obtained in all cases never demonstrated circumferential subendocardial or diffuse lateenhancement patterns suggestive of amyloid involvement.

For the amyloidosis group (*n* = 119), the selection criteria for amyloidosis diagnosis were based on typical CMR features confirmed by clinical, biological, bone scintigraphic and anatomo-histological findings. Left ventricular wall thickening (≥13 mm in diastole), left ± right atrial dilatation, increased native myocardial T1 relaxation time and/or extracellular volume (ECV), pericardial or pleural effusion and typical subendocardial late-enhancement pattern (circumferential, diffuse or not related to a coronary territory) were the main diagnostic clues for amyloidosis.


**Table 1.** Clinical and CMR characteristics of the study population.

The characteristics of patients with amyloidosis and left ventricular hypertrophy were included in this study. The number of observations, (integer) or average values ± standard deviation, are listed: BSA: body surface area; IVS: interventricular septum thickness; LVMI: left ventricular mass index; LVDVI: left ventricular diastolic volume index; LVEF: left ventricular ejection fraction; LA: left atrial; systolic time: the time of the systolic image; and ECV: extracellular volume. Between the parentheses is the percentage. Pericard. is for pericardial effusion, pleural is for pleural effusion and both is for pericardial + pleural effusions.

The characteristics of AL and TTR patients can be found in the supplemental material (Table S1). TTR amyloidosis was defined in 38 patients without monoclonal gammopathy and with a 99mTc-diphosphonate SPECT Perugini score of >1 or with amyloid deposits on an extracardiac and/or endomyocardial biopsy. AL amyloidosis was reported in 59 cases, based on the detection of a kappa/lambda free light-chain with monoclonal gammopathy and an extracardiac and/or endomyocardial biopsy. Among the 22 patients who were not categorized as AL or TTR, three were AA type, three had uncertain immunostaining, one had Perugini 1 and no gammopathy, four elderly patients died and 11 were lost to follow-up.

For the cine-CMR acquisitions, all images were obtained at 1.5 Tesla, using three Siemens (Erlangen, Germany) and one Philips (Eindhoven, The Netherlands) scanners. Steady-state free precession (SSFP) cine sequences were obtained with TE/TR 1.6/3.5 ms, 8 to 32 elements cardiac coil and 6 to 8 mm thick slices. End-systole (with the smallest left ventricular dimension) was visually selected (systolic time in Table 1). Orientation planes were long axis (4-chamber and vertical 2-chamber views) and short axis views. Table 1 lists the summary of acquisition parameters.

#### *2.2. Image Preparation*

The image preparation of cine studies exported from the PACS of our hospital was carried out with a dedicated Visual C software. All images were first de-identified and resampled (bilinear) in order to obtain a normalized homogeneous pixel size of 1.5 mm. The images' intensity windowing was manually focused on the central cardiac region of interest. Diastolic and systolic frames were selected. Epicardial contours (ROI\_epi) and myocardial contours (ROI\_myo) were manually drawn.

Finally, five pairs of images (cropped to 128 and 160 pixels, full view 256 pixels, ROI\_epi and ROI\_endo), as illustrated in Figure 1, were stored. The purpose of these tests (especially for ROIs) was to determine if a focused analysis led to better classification performance. Labeling (orientation plane, pathology, presence of effusion and gadolinium injection) was carried out simultaneously and saved in the labeled file.

**Figure 1.** Example of input shapes submitted to the CNN, with native 256 × 256 full image format (**A**), 224 × 224 cropped image (**B**), 160 × 160 cropped image (**C**), 128 × 128 cropped image (**D**), epicardial region of interest (ROI) image (**E**) and myocardial ROI (**F**).

#### *2.3. Deep Learning Process*

CNN implementation was performed in Python 3.7.6, with Keras library and TensorFlow backend. According to CLAIM recommendations [14], the data were distributed in order to ensure that images of the same patient always lie in either the train set, the validation set or the test set (no mixture between these sets).

For hyperparameter trimming, data processing was performed according to the diagram shown in Figure 2. A 40% test set (538 pairs of frames and 96 patients) was isolated and stored as a held-out test set. With the 60% remaining data, a three-fold cross-validation training was performed in order to trim hyperparameters (batch size, optimizer, learning rate, decay, number of trainable layers, dropout rate and parameters of the image data generator). This was done to avoid the influence of individual training and validation examples on the choice of hyperparameters.

**Figure 2.** Schematic view of the processing method used in order to strictly separate training/validation data and test data.

With optimal hyperparameters, a final model was built with all training data and evaluated on the test set. Patient-based metrics were calculated from the average of the predicted probability corresponding to all frames of a unique patient.

A VGG16 [15] base model was used and trained from scratch for diastolic and systolic frames. The two outputs (diastole and systole) were concatenated and followed by the following layers, where ReLU non-linearity was used after each Dense layer except the last one: Flatten, Dense 256, Dropout 0.40, Dense 128, Dropout 0.45, Dense 64, Dropout 0.50, Dense 1 and output Sigmoid activation layer. In the final model, training was done with batch size: 32; number of epochs: 150; optimizer: SGD; LR 4 10−5.; and decay: 10−6. Binary cross entropy was used as a loss function. The parameters of data augmentation applied during training were zoom range <0.15, 15% height and width shift range and up to 20◦ rotation.

The Grad-CAM algorithm [16] was used to visualize class activation maps. With this algorithm, the identification of the most contributive pixels involved for each class is related to the gradient information flowing into the final convolutional layer of the network.

#### *2.4. Experienced Radiologists/Cardiologists Blind Reading*

The blind reading of diastolic and systolic images was performed by one radiologist and two cardiologists (>10 years' experience of CMR analysis and reporting). Frame-based reading was obtained from the pairs of images corresponding to the test set. Patient-based reading was obtained from the whole dataset (241 patients), and paired comparisons were made with the 40% held-out test set (average number of frame pairs, 5.5 per patient).

#### *2.5. Evaluation and Statistical Analysis*

The performance metrics—computed on a frame-basis and a patient-basis—were test accuracy, sensitivity, specificity, confusion matrices, receiver operating characteristic (ROC) curves and precision-recall curves with the corresponding area under the curve (AUC) values. Testing the relationship between categorical variables (e.g., accuracy comparisons) was carried out with a Chi-square test. A comparison of the quantitative values was performed with Student's *t*-test, and a comparison of the AUC of ROC curves was performed with the Delong test. MedCalc 12.1.4 (MedCalc Software, Ostend, Belgium) was used for statistical analyses.

#### **3. Results**

*3.1. Amyloidosis vs. LVH Classification Obtained with the Held-Out Test Set According to the Input Shape*

Table 2 lists the results obtained with the various input shapes illustrated in Figure 1. Patient-based results were always better than frame-based results.

**Table 2.** Accuracy and AUC of the ROC curve for classification of amyloidosis vs. LVH in the 40% held-out test group, according to the input shape.


Results obtained with the 40% held-out test set after hyperparameters tuning. 160 × 160 indicates the cropping size of input frames. D and S indicate diastole and systole. Between brackets is the confidence interval of AUC. Values between parentheses indicate the level of significance of the difference as compared to the 160 × 160 D + S result (assessed with Chi-square test from the number of observations for accuracy and assessed by Delong test for AUC comparisons).

Optimal performance was obtained with 160 × 160 cropped diastolic and systolic images in which per frame analysis provided a test accuracy of 0.759 and an AUC of 0.836, whereas per patient analysis provided a test accuracy of 0.812 and an AUC of 0.937.

Combining diastole and systole did not improve the results. Full field 256 × 256 frames and focused myocardial ROI images provided significantly weaker results.

*3.2. Amyloidosis vs. LVH Classification Obtained with the Held-Out Test Set by Human Readers and by CNN*

The comparison between classification by experienced radiologists/cardiologists and the CNN is given in Table 3. The CNN provided a largely superior performance when compared to human readers.

Frame-based comparisons of human vs. CNN classification led to an accuracy of 0.605 vs. 0.746 (*p* < 0.0008) and an AUC of 0.630 vs. 0.824 (*p* < 0.0001).

Patient-based comparisons provided an accuracy of 0.660 vs. 0.825 (0.008) and an AUC of 0.727 vs. 0.895 (*p* < 0.002). The ROC curves of these comparisons are plotted in Figure 3.


**Table 3.** Accuracy and AUC of the ROC curve for classification of amyloidosis vs. LVH in the held-out test group for human readers vs. CNN.

Frame-based and patient-based results obtained with the held-out test set by human readers and by CNN. Accur. is for accuracy, Sensitiv. and Specific. are for sensitivity and specificity. Values between parentheses indicate the level of significance of the difference between human reader and CNN (assessed with Chi-square test from the number of observations for accuracy and assessed by Delong test for AUC comparisons).

**Figure 3.** ROC curves and AUC for frame-based (**A**) and patient-based (**B**) classification of amyloidosis vs. LVH by CNN and by three human readers (Read. 1 to 3).

#### *3.3. CNN Classification of AL vs. TTR Amyloidosis*

The frame-based accuracy and AUC obtained by the CNN classification of AL vs. TTR cardiac amyloidosis were 0.662 and 0.703 [0.664–0.741]. The corresponding patient-based values were 0.711 and 0.752 [0.654–0.834]. No comparison was performed here with human classification, but the comparison between the AUC values of the CNN and the simple left ventricular septal wall thickness measurement (per-patient AUC 0.735) did not show a statistically significant difference.

#### *3.4. Analysis of the Saliency Maps*

Saliency maps, which reveal the pixel areas responsible for classification, show that cardiac regions contribute to CNN decisions in only 25% of cases (Figure 4). Among the extracardiac targeted regions, the lungs are the most frequent, followed by the subcutaneous fat and liver. Distribution is quite similar for correct classification (concordant) and erroneous classification (discordant).

**Figure 4.** Saliency maps targeting cardiac region (**A**) but also frequently subcutaneous fat (**B**), lung (**C**) or liver (**D**). Diastolic frames are shown in the upper row and systolic frames in the lower row.

#### **4. Discussion**

The most important result of this study is the possibility of discriminating cardiac amyloidosis and LVH from other origins by simple cine-CMR images, which was significantly better with the CNN than by the physicians' visual analysis. The comparison carried out on slightly more than 100 patients of the two groups shows that, for frame-based and patient-based analysis, binary classification accuracy is approximately 15 absolute points higher with the CNN than with experienced radiologists/cardiologists. The same significant difference is also found by considering the AUC of the ROC curve, with a little less than 20 points absolute value improvement with the CNN as compared with experienced human readers.

#### *4.1. Methodological Considerations*

Patient-based analysis constitutes a much more relevant assessment because this is how the clinical diagnosis is carried out, and it should be noted that the transposition of the results from the image-level to the patient-level (by taking the average of the elementary predictions per frame) leads to an improvement of the accuracy in the range of 5 points (absolute value) and in the range of 10 points for the AUC. This phenomenon, which is observed for the human reader and CNN, may be explained thanks to the "averaging process" in the mind of the physician who examines the whole set of pictures of the patient.

The influence of methodological choices should be stressed: (1) The distribution of patients' images in a distinct train or validation/test data sets is mandatory; otherwise, the results would be clearly biased because we would have trained on images that are—for some features—similar to test images. Processing this way (without frame distribution for a unique patient) with our data set led to a misleading "improvement" of almost 10 points (absolute value) for accuracy and AUC results (data not listed here). (2) The strict separation of the train and validation set for hyperparameter tuning and the test set has been done. This method, based on a separate test set, schematized in Figure 2, is required to avoid information leakage related to hyperparameter tuning.

#### *4.2. Superiority of CNN Capacities over Human Diagnosis*

The aim of this study was not to propose making the diagnosis of cardiac amyloidosis solely on the cine-CMR data because much more relevant CMR indices are available thanks to gadolinium injection. Actually, late-enhancement and ECV allow the diagnosis of the presence of CA with a high sensitivity of 95% and an even higher specificity of 98% [5], and deep learning was demonstrated to be efficient in this field [11]. Our goal was to show that deep learning is able to extract diagnostic clues clearly surpassing visual analysis (15 to 20 points in the present study).

Excellent performances of the CNN are often reported in the literature, but their interest is limited if they are not compared to human performance. Among human–machine comparisons, many studies have reported that CNN diagnosis is on par with human visual assessment in multiple areas [17]. For example, for malignancy risk estimation of pulmonary nodules using thoracic CT, Venkadesh et al. [18] reported that the DL algorithm had an AUC of 0.96, which was significantly better than the average AUC of the clinicians (0.90) but comparable to that of thoracic radiologists. Our model was able to discriminate between AL and TTR CA with interesting values of patient-based accuracy (0.711) and AUC (0.752); however, this was no better than the classification obtained with the simple measurement of the septal thickness, already reported in previous publications [19–21] and resulting from the known increased amyloid burden in this subtype.

Of more interest is to show significant machine-over-human superiority in routine areas, where "clinical" visual analysis is the classic benchmark. Our study provides an interesting demonstration in this direction for diagnosing cardiac amyloidosis from cine-CMR. A small number of other publications could demonstrate that AI systems are capable of surpassing human experts in disease prediction. Such is the case for the distinction between low-grade and high-grade glioma by radiologists, which lacks accuracy (40–45% of non-enhancing MR lesions are found subsequently to be malignant glioma), whereas, in contrast, CNN-based grading provides > 90% accuracy [22]. Resnet-50 CNN outperformed 136 of 157 dermatologists in a head-to-head dermoscopic melanoma image classification task [23]. For the diagnosis of breast cancer, in a large multicenter study, Mc Kinney et al. [12] found that the AI system exhibited specificity and sensitivity superior to that of radiologists practicing in an academic medical center and exceeded the average performance of radiologists by a significant improvement in the area under the ROC curve (ΔAUC = +0.115). Similarly, in differentiating benign from malignant renal tumors, Xu et al. [24] reported higher AUC with the CNN model (0.906, based on T2-weighted images) as compared to the AUC obtained by two radiologists (0.724).

#### *4.3. Unveiling the Invisible*

One more step in this diagnostic quest is the possibility of discriminating pathological conditions that clinicians are not able to predict at all using the naked eye. Subtyping molecular markers, histological or immune-histochemical and genetic classes is impossible to ascertain from radiologic data. These identifications were initially proposed from radiomic signatures, for instance, to discriminate between hypertensive heart disease and hypertrophic cardiomyopathy [25] or between recent infarction vs. old infarction [26]. However, several comparative studies have demonstrated that deep learning based on radiologic data is superior to radiomics. This has been demonstrated for renal cancer [24], subtyping different types of cerebral glioma [27], diagnosis of breast cancer [28] and predicting axillary lymph node metastasis of breast cancer [29].

This may be explained because radiomics' features are handcrafted in advance and, thus, may not always fit to discriminate particular tasks. In contrast, the CNN is more flexible, adaptive and dynamic. As a data-driven tool, it is able to automatically learn to extract and select task-specific features if the amount of training data is large enough. Further evidence for the power of deep learning to make a histological diagnosis from radiological data has been provided by Zhao et al. for renal cell carcinomas Fuhrmangrading [30] and by Yuan et al. for prostate cancer Gleason score staging (accuracy 0.87) [31].

#### *4.4. Explanation of Classification Remains Unsatisfactory*

Deep neural networks operate through a multilayer nonlinear structure, making their predictions difficult to interpret. They are able to pick up a number of features that cannot be interpreted by humans but which are relevant for making a diagnosis. These automatically-learned discriminative features are unfortunately presently not clearly identifiable.

Grad-CAM helps identify the areas of pixels that are most responsible for class prediction [16]. This should provide valuable clues to understand the algorithm's decision. In principle, the salient areas should be located in the cardiac region, which only appeared in a quarter of the cases in our study. Two explanations may be advanced for this anomaly.

(1) Technically, our network uses only fully connected layers in the last phase, which is where the classification happens, but saliency cannot be obtained from fully connected layers. As a solution, we should try replacing some of the fully connected layers that come right after VGG with convolution. This way, the spatial information would be preserved longer in the network, and we might see more meaning in the saliency maps.

(2) Amyloidosis is not a disease confined to the heart since the involvement of the lungs, fatty tissues and other organs is also common. Liver and, moreover, spleen amyloid deposits have been reported in 41% of patients with systemic amyloidosis (almost only in AL type), and CMR-derived ECV measurement showed good diagnostic capability in this field [32]. This is why the diagnosis is also based on extracardiac biopsies, and it is interesting to note that the texture analysis was able to show specificities in the architecture of ultrasound images within abdominal fat [33], resulting in increased echogenicity and a loss of the normal structure of the fat layer, consistent with histopathological amyloid deposition in the fat.

This ubiquitous aspect of the disease may also explain why the input shape submitted to the CNN (from the full field image to the small region of interest focused on the sole myocardium illustrated in Figure 1) hardly modifies the performance of our model as shown in Table 2. It can also be noted in Table 2 that the combination of diastole and systole does not provide any diagnostic benefit, unlike for other cardiomyopathies [10], because the global LV systolic function is generally preserved in the early stage of amyloidosis.

#### *4.5. Study Limitations*

Two types of confounding factors must be mentioned. First, plane orientation and the presence of gadolinium in the sets of images could have influenced the results, but Table 1 shows a perfect equivalence between the two groups. Second, the presence of pericardial or pleural effusion constitutes a more important bias because the prevalence (slightly higher than in the study of Binder et al. [9]) is very different in the two groups. Pericardial effusion is observed in almost 50% of CA, i.e., two times more often than in hypertrophies unrelated to amyloidosis. Pleural effusions are observed in just over a third of CA, i.e., four times more than in other hypertrophies, and mixed effusions are 10 times more frequent in the amyloidosis group than in the LVH group. This disparity probably contributes to the classification made by CNNs (although heat maps rarely focus on areas of effusion) but also influences clinical judgment, so that the bias is the same for the machine and for the human, which, therefore, does not explain the diagnostic superiority of the algorithm.

A multiparametric approach is needed. Only cine-CMR data has been used here, and it is likely that one could significantly improve performances by combining the analysis with other CMR sequences such as T1 mapping, ECV assessment and late gadoliniumenhancement imaging. Based on gadolinium-enhanced images—and not on cine-MR images—Martini et al. obtained an accuracy of 0.88 and AUC of 0.98 [11], but remember that our aim was not to develop the best model to optimize cardiac amyloidosis diagnosis but to compare CNN and human reader performance. For the distinction between AL and TTR CA, it has been reported that transmural patterns of late gadolinium enhancement may differentiate these two types of the disease [21] but with relatively low performance. Recently, the use of a logistic regression model integrating T2 mapping (slightly increased

in the AL subtype) and right ventricular ejection fraction combined with age was reported to discriminate between these two subtypes with an AUC of 0.92 [34]. The performance of AI integrating such multiparametric CMR features, especially for the distinction between AL and TTR cardiac amyloidosis, should be explored in the future.

Technical improvements should be implemented. The leverage of more sophisticated CNN models (not limited to the classical VGG model used here) and, moreover, the combination (concatenation) of several multiparametric inputs, with possible additional categorical clinical input variables (e.g., [30]), should improve performance. Orientation plane specific models [11] should also be tested since images of different views were classified here by the same network, which makes learning relevant features from images potentially much harder as it increases variability unrelated to any disease. Significant work also remains to be done to improve the explainability of the results. Finally, the relatively limited number of observations and the monocentric nature of this study constitute another limitation. Multicenter studies could be of interest for the further validation and generalization of our findings.

#### **5. Conclusions**

In this study, based on cine-CMR images alone, we could demonstrate the ability of CNNs to discriminate cardiac amyloidosis from LVH of other origins significantly better than experienced human operators. The diagnostic accuracy and AUC were 15 to 20 points higher (in absolute value) for the VGG convolutional network used here than for human readers. This diagnostic superiority of the CNN results from the unique capability of the algorithm to identify features invisible to the naked eye, indiscernible through the classical radiological analysis. This scientific novelty, already reported in a few recent articles concerning other pathological fields, opens up promising prospects for improving diagnostic capacities in routine clinical practice. The astonishing potential of CNNs to improve the recognition of pathologies that are imperfectly detectable in radiology and to reveal invisible clues such as the histological type of lesions will certainly constitute a large field of future research.

**Supplementary Materials:** The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/diagnostics12010069/s1; Table S1: Clinical and CMR characteristics of AL and TTR cardiac amyloidosis.

**Author Contributions:** Conceptualization, P.G. and S.E.G.; methodology, N.P. and P.G.; software, A.V.; validation, P.G., N.P. and S.E.G.; formal analysis, A.L., A.V. and T.H.S.; investigation, S.E.G.; data curation, P.G. and S.E.G.; writing—original draft preparation, P.G.; writing—review and editing, S.E.G., A.L., N.P., C.R. and T.H.S.; supervision, N.P. and S.E.G.; project administration, C.R. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was partially supported by French state funds managed by the ANR under reference ANR-10-IAHU-02, without any involvement in the study design, data gathering, analysis/interpretation of data or writing of the report.

**Institutional Review Board Statement:** This retrospective study was registered and approved by the Institutional Review Board of the university hospital of Strasbourg (ref 20–072/sept 2020). All datasets were obtained and de-identified, with waived consent in compliance with the Institutional Review Board of our institution.

**Informed Consent Statement:** All datasets were obtained and de-identified, with waived consent in compliance with the Institutional Review Board of our institution. No protected health information for any subject is given in this manuscript.

**Data Availability Statement:** The database and code can be made available by reasonable request after the agreement of the Clinical Research Department of our hospital.

**Conflicts of Interest:** Nicolas Padoy serves as a consultant for Caresyntax and has received research support from Intuitive Surgical, unrelated to this work. Soraya El Ghannudi serves as a consultant for Pfizer.

#### **References**


## *Article* **VGG19 Network Assisted Joint Segmentation and Classification of Lung Nodules in CT Images**

**Muhammad Attique Khan 1, Venkatesan Rajinikanth 2, Suresh Chandra Satapathy 3, David Taniar 4, Jnyana Ranjan Mohanty 5, Usman Tariq <sup>6</sup> and Robertas Damaševiˇcius 7,\***


**Abstract:** Pulmonary nodule is one of the lung diseases and its early diagnosis and treatment are essential to cure the patient. This paper introduces a deep learning framework to support the automated detection of lung nodules in computed tomography (CT) images. The proposed framework employs VGG-SegNet supported nodule mining and pre-trained DL-based classification to support automated lung nodule detection. The classification of lung CT images is implemented using the attained deep features, and then these features are serially concatenated with the handcrafted features, such as the Grey Level Co-Occurrence Matrix (GLCM), Local-Binary-Pattern (LBP) and Pyramid Histogram of Oriented Gradients (PHOG) to enhance the disease detection accuracy. The images used for experiments are collected from the LIDC-IDRI and Lung-PET-CT-Dx datasets. The experimental results attained show that the VGG19 architecture with concatenated deep and handcrafted features can achieve an accuracy of 97.83% with the SVM-RBF classifier.

**Keywords:** lung CT images; nodule detection; VGG-SegNet; pre-trained VGG19; deep learning

#### **1. Introduction**

Lung cancer/nodule is one of the severe abnormalities in the lung, and a World Health Organization (WHO) report indicated that around 1.76 million deaths have occurred globally in 2018 due to lung cancer [1]. Lung cancer/nodule is due to abnormal cell growth in the lung and, in most cases, the nodule may be cancerous/non-cancerous. The Olson report [2] confirmed that lung nodules can be categorized into benign/malignant based on their dimension (5 to 30 mm fall into the benign class and >30 mm is malignant). When a lung nodule is diagnosed using the radiological approach, a continuous follow-up is recommended to check its growth rate. The follow-up procedure can continue for up to two years and, along with non-invasive radiographic imaging procedures, other invasive methodologies, such as bronchoscopy and/or tissue biopsy, can also be suggested to confirm the condition and harshness of the lung nodules in a patient [3].

Noninvasive radiological techniques are commonly adopted in initial level lung nodule detection using CT images, and, therefore, several lung nodule detection works are already proposed in the literature [4–6] which involve the use of traditional signal processing and texture analysis techniques combined with machine learning classification [7],

**Citation:** Khan, M.A.; Rajinikanth, V.; Satapathy, S.C.; Taniar, D.; Mohanty, J.R.; Tariq, U.; Damaševiˇcius, R. VGG19 Network Assisted Joint Segmentation and Classification of Lung Nodules in CT Images. *Diagnostics* **2021**, *11*, 2208. https:// doi.org/10.3390/diagnostics11122208

Academic Editor: Sameer Ántani

Received: 28 October 2021 Accepted: 24 November 2021 Published: 26 November 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

deep learning models [8,9], neural networks combined with nature-inspired optimization techniques [10,11] and ensemble learning [12]. The aims of this research are to construct a Deep Learning (DL) supported scheme to segment the lung nodule segment from the CT image slice with better accuracy and classify the considered CT scan images into normal/nodule class with improved accuracy using precisely selected deep and handcrafted features.

The recent article by Rajinikanth and Kadry [13] proposed a framework with VGG16 neural network model for the automated segmentation and classification of lung nodules from CT images. In their paper, a threshold filter technique is implemented to remove artifacts from CT images, and the artifact-eliminated images are then considered to test the proposed disease detection framework. The proposed scheme is tested using the LIDC-IDRI database [14–16] and the classification task implemented with the combined deep and handcrafted features helped to achieve a classification accuracy of 97.67% with a Random Forest (RF) classifier.

In this paper, we suggest a framework to support automated segmentation and classification of lung nodules with improved accuracy. The proposed scheme includes the following stages: (i) image collection and resizing, (ii) implementing the pre-trained VGG supported segmentation; (iii) deep feature-based classification, (iv) extracting the essential handcrafted features such as Gray Level Co-occurrence Matrix (GLCM), Local Binary Pattern (LBP) and Pyramid Histogram of Oriented Gradients (PHOG), (v) implementing a serial feature concatenation to unite the deep and handcrafted features and (vi) implementing and validating the performance of the classifiers using a 10-fold cross validation.

The images used for the experiments are collected from the LIDC-IDRI [15] and Lung-PET-CT-Dx [17] datasets. All these works are realized using the MATLAB® (MathWorks, Inc., Natick, MA, USA), and the attained result is then compared and validated with the earlier results presented in the literature.

The major contribution of the proposed work is as follows:


The proposed work is organized as follows. Section 2 presents and discusses earlier related research. Section 3 presents the implemented methodology. Section 4 shows the experimental results and discussions and, finally, the conclusions of the present research study are given in Section 5.

#### **2. Related Work**

Due to its impact, a significant amount of lung nodule detection from CT images is proposed using a variety of image databases, and summarizing the presented schemes will help to obtain an idea of the advantages and limitations of the existing lung nodule detection procedures. Traditional methods of machine learning (ML) and deep learning (DL) were proposed to examine lung nodules using CT image slices, and the summary of the selected DL-based lung nodule detection systems is presented in Table 1; all the considered works in this table discuss the lung nodule detection technique using a chosen methodology. Furthermore, all these works considered the LIDC-IDRI database for examination.

*Diagnostics* **2021**, *11*, 2208


The summary (see Table 1) presents a few similar methods implemented using CT images of the LIDC-IDRI database, and the highest categorization accuracy achieved is 97.67% [13].

In addition, a detailed evaluation of various lung nodule recognition practices existing in the literature is available in the following references [25–27]. Some of the works discussed in Table 1 recommended the need for a competent lung nodule detection system that can support both segmentation of the nodule section and classification of lung nodules from normal (healthy) CT images. The works discussed in Table 1 implemented either a segmentation or classification technique using deep features only. Obtaining better detection accuracy is difficult with existing techniques and, hence, the combination of deep features (extracted by a trained neural network model) and handcrafted features is necessary.

In this paper, the pre-trained VGG-16 supported segmentation (VGG-SegNet) is initially executed to extract the lung nodule section from CT images, and then the CT image classification is executed using deep features as well as combined deep and handcrafted features. A detailed assessment among various two-class classifiers, such as SoftMax, Decision-Tree (DT), RF, K-Nearest Neighbor (KNN) and SVM-RBF are also presented using a 10-fold cross-validation to validate the proposed scheme.

#### **3. Methodology**

In the literature, several lung abnormality detection systems based on DL are proposed and implemented using clinical-level two-dimensional (2D) CT images as well as benchmark images. Figure 1 shows the proposed system to segment and classify the lung nodule section of the CT images. Initially, the CT images are collected from the benchmark data set and, later, the conversion from 3D to 2D is implemented using ITK-Snap [28]. The ITK-Snap converts the 3D images into 2D slices of planes, such as axial, coronal and sagittal and, in this work, only the axial plane is considered for the assessment. Finally, all test images are resized to 224 × 224 × 3 and then used for the segmentation and classification task. The resized 2D CT images are initially considered for the segmentation task; where the lung nodule segment is mined using the VGG-SegNet scheme implemented with the VGG19 architecture. Later, the essential features are extracted with GLCM, LBP and PHOG, and then these features are combined with the learned features of the pre-trained DL scheme. Finally, the serially concatenated deep features (DF) and handcrafted features (HCF) are used to train, test and confirm the classifier. Based on the attained performance values, the performance of the proposed system is validated.

**Figure 1.** Structure of the proposed lung-nodule segmentation and classification system.

#### *3.1. Image Database Preparation*

The CT images are collected from LIDC-IDRI [15] and Lung-PET-CT-Dx [17] databases. These data sets have the clinically collected three-dimensional (3D) lung CT images with the chosen number of slices.

The assessment of the 3D CT images is quite complex and, hence, 3D to 2D conversion is performed to extract the initial image with a dimension of 512 × 512 × 3 pixels, and these images are then resized to 224 × 224 × 3 pixels to decrease the assessment complexity. In this work, only the axial view of 2D slices is used for the estimation and the sample test images of the considered image data set are depicted in Figure 2 and the total images for investigation are given in Table 2.

**Figure 2.** Sample test images considered in this study.

**Table 2.** The lung CT images analyzed in the experiments.


#### *3.2. Nodule Segmentation*

Evaluation of the shape and dimension of the abnormality in medical images is widely preferred during the image-supported disease diagnosis and treatment implementation process [29,30]. Automated segmentation is widely used to extract the infected section from the test image and the mined fragment is further inspected to verify the disease and its severity level. In the assessment of the lung nodule with CT images, the dimension of the lung nodule plays a vital role and, therefore, the extraction of the nodule is very essential. In this work, the VGG-SegNet scheme is implemented with the VGG19 scheme to extract the CT image nodule. Information on the traditional VGG-SegNet model can be found in [29].

The proposed VGG-SegNet model consists of the following specification; traditional VGG19 scheme is considered as the encoder section and its associated structure forms the decoder unit. Figure 3 illustrates the construction of the VGG19-based segmentation and classification scheme in which the traditional VGG19 scheme (first 5 layers) works as the encoder region and the inverted VGG19 with up-sampling facility is then considered as the decoder region. The pre-tuning of this scheme for the CT image is performed using the test images considered for training along with the essential image enhancement process [31]. The preliminary constraints for training the VGG-SegNet are allocated as follows: batch size is equal for encoder-decoder section, initialization uses a normal weight, learning rate

is fixed as 1e-5, Linear Dropout Rate (LDR) is assigned, and Stochastic Gradient-Descent (SGD) optimization is selected. The final SoftMax layer uses a sigmoid activation function.

**Figure 3.** Structure of VGG19 supported segmentation (VGG-SegNet) and classification scheme.

#### *3.3. Nodule Classification*

In the medical domain, automated disease classification plays an important role during the mass data assessment and a perfectly tuned disease classification system further reduces the diagnostic burden of physicians and acts as an assisting system during the decision-making process [32–35]. Therefore, a considerable number of disease detection systems assisted by DL are proposed and implemented in the literature [36–40]. Recent DL schemes implemented in the LIDC-IDRI with fused deep and HCF helped achieve a classification accuracy of >97% [13].

Figure 3 presents the assisted classification of using the VGG19 of lung CT images (dimension 224 × 224 × 3 pixels) using the DF using the SoftMax classifier, and then the performance of VGG19 is validated with VGG16, ResNet18, ResNet50 and AlexNet (images with dimension of 227 × 227 × 3 pixels) [41–46] and the performance is compared and validated. The performance of the implemented VGG19 is validated using DF, concatenated DF + HCF and well-established binary classifiers existing in the literature [47–50].

#### 3.3.1. Deep Features

Initially, the proposed scheme is implemented by considering the DF attained at fully connected layer 3 (FC3). After possible dropout, FC3 helps to provide a feature vector of dimension 1 × 1024, whose value is mathematically represented as in Equation (1).

$$FV\_{VGG19}(1 \times 1024) = VGG19\_{(1,1)'}VGG19\_{(1,2)'} \dots , VGG\_{\mathbb{X}} \mathbb{X} \mathbb{1} \mathbf{9}\_{(1,1024)} \tag{1}$$

Other essential information on VGG19 and the related issues can be found in [41].

#### 3.3.2. Handcrafted Features

The features extracted from the test image using a chosen image processing methodology are known as Machine Learning Features (MLF) or handcrafted features (HCF). Previous research in the literature already confirmed the need for the precision of HCF to progress the categorization accuracy in a class of ML and DL-based disease detection systems [46,50,51]. In the proposed work, the essential HCF from the considered test images is extracted using well-known methods such as GLCM [13,36,42], LBP [13,46] and PHOG [48].

The GLCM features are commonly used due to their high performance and, in this paper, the GLCM features are extorted from the lung nodule section segmented with the VGG-SegNet. The entire feature used in this work can be found in Equation (2).

$$FV1\_{\text{GLCM}}(1 \times 25) = \text{GLCM}\_{(1,1)'} \text{GLCM}\_{(1,2)'} \dots \text{,GLCM}\_{(1,25)} \tag{2}$$

In this work, the LBP with varied weight (weights with values; W = 1, 2, 3, and 4) is considered to mine the important features from the considered test images and the proposed LBP is already implemented in the works of Gudigar et al. [52] and Rajinikanth and Kadry [13]. The LBP features for the varied weights are depicted in Equations (3)–(6) and Equation (7) depicts the overall LBP features.

$$FV\_{LBP1\ (1\times59)} = LBP1\_{(1,1)}, LBP1\_{(1,2)}, \dots, LBP1\_{(1,59)}\tag{3}$$

$$FV\_{LBP2}\,(1 \times 59) = LBP2\_{(1,1)} \, LBP2\_{(1,2)} \, \dots \, \_{\prime}LBP2\_{(1,59)}\tag{4}$$

$$FV\_{LBP3\ (1\times59)} = LBP3\_{(1,1)'}LBP3\_{(1,2)'}\dots, LBP3\_{(1,59)}\tag{5}$$

$$FV\_{LBP4} \,(1 \times 59) = LBP4\_{(1,1)}, LBP4\_{(1,2)}, \dots, LBP4\_{(1,59)}\tag{6}$$

$$FV2\_{LBP\ (1\times2\oplus6)} = FV\_{LBP1\ (1\times5\oplus)} + FV\_{LBP2\ (1\times5\oplus)} + FV\_{LBP3\ (1\times5\oplus)} + FV\_{LBP4\ (1\times5\oplus)} \tag{7}$$

Along with the above said features, the PHOG features are also extracted and considered along with GLCM and LBP. The total information on the PHOG can be found in the article by Murtza et al. [48]. In this work, 255 features are extracted by assigning number of bins = 3 and levels (L) = 3. The PHOG features of the proposed work are depicted in Equation (8).

$$FV3\_{PHOG} \, (1 \times 255) = PHOG\_{(1,1)} \, PHOG\_{(1,2)} \, \dots \, \_{\prime}PHOG\_{(1,255)} \, \tag{8}$$

#### 3.3.3. Features Concatenation

In this work, a serial features concatenation is realized to unite the DF and HCF, and this technique helps to improve the feature dimension to a higher level. The serial features concatenation implemented in this work is depicted in Equation (9) and Final-Feature-Vector (FFV) is presented in Equation (10).

$$Concatenated\ features = DF\_{(1\times1024)} + HCF\_{(1\times516)^c} \tag{9}$$

$$FFV\_{(1\times1540)} = FV\_{VGG19\ (1\times1024)} + FV1\_{GLCM(1\times25)} + FV2\_{LRP(1\times256)} + FV3\_{PHON(1\times259)},\tag{10}$$

The FFV is then used to train, test and validate the classifier considered in the proposed methodology for the automated classification of lung nodules using CT images.

#### 3.3.4. Classifier Implementation

The performance of the DL-based automated disease detection arrangement depends chiefly on the performance of the classifier implemented to categorize the considered test images based on the need. In this paper, a binary classification is initially implemented using the SoftMax classifier and, later, the well-known classifiers, such as Decision Trees (DT), RF, KNN and Support Vector Machine-Radial Basis Function (SVM-RBF) [13,53–56], are also considered to improve the classification task. In this paper, a 10-fold cross-validation process is implemented, and the finest result attained is then considered as the final classification

result. The performance of the classifier is then authenticated and confirmed based on the Image Performance Values (IPV) [57–59].

#### *3.4. Performance Computation and Validation*

The overall eminence of the proposed method is validated by computing the essential IPV measures, such as True-Positive (TP), False-Negative (FN), True-Negative (TN), False-Positive (FP), Accuracy (ACC), Precision (PRE), Sensitivity (SEN), Specificity (SPE), Negative-Predicted-Value (NPV), F1-Score (F1S), Jaccard Index and Dice coeeficient, which are calculated in percentages, presented in Equations (11)–(16). The necessary information regarding these values can be found in [45–47].

$$Accuracy = A\text{CC} = \frac{TP + TN}{TP + TN + FP + FN} \times 100\% \tag{11}$$

$$Precision = PRE = \frac{TP}{TP + FP} \times 100\% \tag{12}$$

$$Sensitivity = SEN = \frac{TP}{TP + FN} \times 100\% \tag{13}$$

$$Specificity = SSE = \frac{TN}{TN + FP} \times 100\% \tag{14}$$

$$\% \text{Negative Predictive Value} = \text{NPV} = \frac{TN}{TN + FN} \times 100\% \tag{15}$$

$$F1 - Score = F1S = \frac{2TP}{2TP + FN + FP} \times 100\% \tag{16}$$

$$\% \text{accard} = \frac{TP}{TP + FN + FP} \times 100\% \tag{17}$$

$$\text{Dice} = \frac{2TP}{2TP + FN + FP} \times 100\% \tag{18}$$

#### **4. Results and Discussions**

This section demonstrates the results and discussions attained using a workstation with an Intel i5 2.5GHz processor, with 16GB RAM and 2GB VRAM equipped with MATLAB® (version R2018a). Primarily, lung CT images are used as presented in Table 2 and then each image is resized into 224 × 224 × 3 pixels to perform the VGG19-supported segmentation and classification task. Initially, the VGG-SegNet-based lung nodule extraction process is executed on the test images considered, and the sample result obtained for the normal/nodule class image is represented in Figure 4. Figure 4 presents the experimental result of the trained VGG-SegNet with CT images. Figure 4a shows the sample images of the normal/nodule class considered for the assessment; Figure 4b depicts the outcome attained with the final layer of the encoder unit; Figure 4c,d depicts the results of the decoder and the SoftMax classifier, respectively. For the normal (healthy) class image, the decoder will not provide a positive outcome for localization and segmentation, and this section will provide the essential information only for the nodule class.

In this paper, the extracted lung-nodule section with the proposed VGG-SegNet is compared to the ground truth (GT) image generated using ITK-Snap [28] and the essential image measures are calculated as described in previous works [4,13]. The performance of VGG-SegNet is also validated against the existing SegNet and UNet schemes in the literature [24,25,48,49]. The result achieved for the trial image is depicted in Figure 5 and Table 3, respectively. Note that the performance measures [50,51] achieved with VGG-SegNet are superior compared to other approaches.

**Figure 4.** Results obtained with proposed VGG-SegNet scheme: (**a**) text image, (**b**) lung section enhanced by encoder, (**c**) localization of nodule by decoder and (**d**) extracted nodule by SoftMax unit.

**Figure 5.** Segmentation results attained with considered CNN models.

**Table 3.** Performance evaluation of CNN models on sample lung CT image. Best values are shown in bold.


The segmentation performance of the proposed scheme is then tested using the lung nodules with various dimensions, such as small, medium and large, and the attained results are depicted in Figure 6. This figure confirms that the VGG-SegNet provides a better segmentation on the medium and large nodule dimension and provides reduced segmentation accuracy on the images having lesser lung nodule due to the smaller test image dimension.

After collecting the essential DF with VGG19, the other HCFs, such as GLCM, LBP and PHOG are collected. The GLCM features for the normal (healthy) class image are collected from the whole CT image, and for the abnormal class image it is collected from the binary image of the extracted nodule segment. Figure 7 shows the LBP patterns generated for the normal/nodule class test images with various weight values. During LBP feature collection, each image is treated with the LBP algorithm with various weights (ie, W = 1 to 4) and the 1D features obtained from each image are combined to obtain a 1D feature vector of dimension 1 × 236.

The PHOG features for the CT images are then extracted by assigning a bin size (L) of 3 and this process helped to obtain a 1 × 255 vector of features. The sample PHOG features collected for a sample CT image are seen in Figure 8. All these features (GLCM+LBP+PHOG) are then combined to form a HCF vector with a dimension of 1 × 516 features, following which they are then combined with the DF to improve the lung

nodule detection accuracy. After collecting the essential features, the image classification task is implemented using DF and DF + HCF separately.

**Figure 6.** Segmentation of nodule from chosen images of Lung-PET-CT-Dx and LIDC-IDRI dataset.

**Figure 7.** LBP patterns generated from the sample image with various LBP weights.

Initially, the DF-based sorting is executed with the considered CNN schemes and the classification performance obtained with the SoftMax is depicted in Table 4. Figure 9 presents the spider plot for the features considered, and the result of Table 4 and the dimension of the glyph plot confirm that VGG19 helps achieve an enhanced IPV compared to other CNN schemes. VGG19 is chosen as the suitable scheme to examine the considered CT images, and then an attempt is made to enhance the performance of VGG19 using DF + HCF.

**Figure 8.** PHOG features obtained with the sample test images of Normal/Nodule class.

**Figure 9.** Spider plot to compare the CT image classification performance of CNN models.

**Table 4.** Classification performance attained with pre-trained DL scheme with DF and SoftMax classifier. Here TP—true positives, FN—false negatives, TN—true negatives, FP—false positives, ACC—accuracy, PRE—precision, SEN—sensitivity, SPE—specificity, NPV—negative predictive value and F1S—F1-score.


The experiment is then repeated using the VGG19 scheme with the DF + HCF (1 × 1540 features) using classifiers, such as SoftMax, DT, RF, KNN and SVM-RBF; the outcomes are depicted in Table 5. Figure 10 shows the performance of VGG19 with SVM-RBF, in which a 10-fold cross validation is implemented and the best result attained among the 10-fold validation is demonstrated. The result demonstrated in Table 5 confirms that the SVM-RBF classifier offers superior outcome contrast to other classifiers and a graphical illustration in Figure 11 (Glyph-Plot) also confirmed the performance of SVM-RBF. The Receiver-Operating-Characteristic curve (ROC) presented in Figure 12 also confirms the merit of proposed technique.

**Table 5.** Disease detection performance of VGG19 with DF + HCF with different classifiers. Best values are shown in bold.


**Figure 10.** Training performance of the VGG19 with SVM-RBF for lung CT image slices.

**Figure 11.** Overall performance of VGG19 with various classifiers summarized as glyph-plots.

**Figure 12.** ROC curve attained for VGG19 with DF + HCF.

The above-shown result confirms that the disease detection performance of VGG19 can be enhanced by using both the DF with the HCF. The eminence of the proposed lung nodule detection system is then compared with other methods found in the literature. Figure 13 shows the comparison of the classification precision existing in the literature and the accuracy obtained with the proposed approach (97.83%) is superior compared to other works considered for the study. This confirms the superiority of the proposed approach compared to the existing works.

**Figure 13.** Validation of the disease detection accuracy of the proposed system with existing approaches.

The major improvement of the proposed technique compared to other works, such as Bhandary et al. [4] and Rajinikanth and Kadry [13], is as follows: this paper proposed the detection of lung nodules using CT images without removing the artifact. The number of stages in the proposed approach is lower compared to existing methods [4,10].

The future work includes: (i) considering other hand-made characteristics, such as HOG [48] and GLDM [43], to improve disease detection accuracy, (ii) considering the other variants of the SVM classifiers [43] to achieve better image classification accuracy and (iii) implementing a selected procedure to enhance the segmentation accuracy in lung CT having a lesser nodule size.

#### **5. Conclusions**

Due to its clinical significance, several automated disease detection systems have been proposed in the literature to detect lung nodules from CT images. This paper proposes a pre-trained VGG19-based automated segmentation and classification scheme to examine lung CT images. This scheme is implemented in two stages: (i) VGG-SegNet supported extraction of lung nodules from CT images and (ii) classification of lung CT images using deep learning schemes with DF and DF + HCF. The initial part of this work implemented the VGG-SegNet architecture with VGG19-based Encoder-Decoder assembly and extracted the lung nodule section using the SoftMax classifier. Handcrafted features from the test images are extracted using GLCM (1 × 25 features), LBP with varied weights (1 × 236 features) and PHOG with an assigned bin = L = 3 (1 × 255 features), and this combination helped to obtain the chosen HCF with a dimension of 1 × 516 features. The classification task is initially implemented with the DF and SoftMax, and the result confirmed that the VGG19 provided better result compared to the VGG16, ResNet18, ResNet50 and AlexNet models. The CT image classification performance of VGG19 is once again verified using DF + HCF and the obtained result confirmed that the SVM-RBF classifier helped to obtain better classification accuracy (97.83%).

The limitation of the proposed approach is the dimension of concatenated features (1 × 1540) which is rather large. In the future, a feature reduction scheme can be considered to reduce this set of features. Also, the performance of the proposed system can be improved by considering other HCFs that are known from the literature.

**Author Contributions:** All authors have contributed equally to this manuscript. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** This article does not contain any studies with human participants or animals performed by any of the authors.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The image dataset of this study can be accessed from; https://wiki. cancerimagingarchive.net/display/Public/LIDC-IDRI.

**Acknowledgments:** The authors of this paper would like to thank The Cancer Imaging Archive for sharing the clinical grade lung CT images for research purpose.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


#### *Article*
