A Development and Validation of an AI Model for Cardiomegaly Detection in Chest X-rays

Lee, Kang-Hee; Choi, Jun-Woo; Park, Chun-Oh; Han, Dong-Hun; Kang, Min-Soo

doi:10.3390/app14177465

Open AccessArticle

A Development and Validation of an AI Model for Cardiomegaly Detection in Chest X-rays

by

Kang-Hee Lee

^1,†,

Jun-Woo Choi

^2,†,

Chun-Oh Park

³,

Dong-Hun Han

¹

and

Min-Soo Kang

^4,*

¹

Department of Medical Artificial Intelligence, Eulji University, Seongnam 13135, Republic of Korea

²

Department of Medical IT, Eulji University, Seongnam 13135, Republic of Korea

³

Pnp Secure, Seoul 07594, Republic of Korea

⁴

Department of Bigdata Medical Convergence, Eulji University, Seongnam 13135, Republic of Korea

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Appl. Sci. 2024, 14(17), 7465; https://doi.org/10.3390/app14177465 (registering DOI)

Submission received: 17 July 2024 / Revised: 13 August 2024 / Accepted: 21 August 2024 / Published: 23 August 2024

(This article belongs to the Special Issue Integrating Artificial Intelligence in Renewable Energy Systems)

Download

Browse Figures

Versions Notes

Abstract

:

In this study, the development of a deep learning approach for distinguishing cardiomegaly in chest X-ray images and its validation process are presented. Typically, radiologists diagnose cardiomegaly by examining X-ray images. However, their interpretations can vary owing to subjective judgments, and mild cardiomegaly can be missed. For this reason, there is ongoing research into the use of AI-based deep learning algorithms as an adjunct to X-ray interpretation. In this study, radiologists collected 10,000 public images, from which 718 useful images were selected to create a dataset. A DenseNet121 algorithm was then used to develop an AI model for cardiomegaly detection. The results demonstrate an accuracy of 0.95, a recall of 0.91, and an F1 score of 0.94. Additional validation was performed to ensure the accuracy of the cardiomegaly detection model. The validation methods included saliency maps and guided backpropagation, which indicated the significance of the model. In conclusion, this study demonstrates the potential to develop a high-quality AI model with fewer data than previous studies, suggesting its applicability in the medical field.

Keywords:

cardiomegaly; radiologists; AI; machine learning; validation methods

1. Introduction

Chest X-rays (CXRs) are routinely performed during hospital admissions and checkups due to their low cost and simplicity. They are essential for screening and evaluating thoracic diseases, pulmonary and pleural effusions, and cardiomegaly [1]. Cardiomegaly is an enlargement of the heart, often caused by conditions such as hypertension and aging, which leads to an increase in the thickness of the posterior wall of the left ventricle [2]. According to data from the Health Insurance Review & Assessment Service, the number of patients with cardiomegaly increased from 19,590 in 2015 to 27,321 in 2019, representing an increase of approximately 139%. The number of patients with cardiomegaly is on the rise, and it is often asymptomatic in the early stages, making early detection difficult. Common clinical symptoms include dyspnea and chest pain even with simple activities. In patients with acute myocardial infarction, left ventricular hypertrophy is associated with a poor prognosis and requires aggressive treatment [3]. When associated with arrhythmias, cardiomegaly can lead to serious conditions such as syncope and sudden death; therefore, early detection is critical.

Cardiomegaly can be easily detected by X-ray imaging. However, subjective judgment can affect interpretation, leading to variability among radiologists and potential oversight of mild cardiomegaly. Although 2D-CRT has proven useful in the diagnosis of cardiomegaly, manual measurement and calculation of the heart and chest areas using a mouse are cumbersome and tiring. In addition, the increasing number of chest X-rays to be interpreted is exacerbating the shortage of radiologists, resulting in longer interpretation times [4]. To address these issues, research into AI-based interpretation systems is actively underway. Previous studies have shown that AI developed using large datasets of X-ray images can achieve high accuracy rates of 0.96 and 0.93, respectively. These AI systems have the potential to improve diagnostic accuracy and reduce interpretation time by image classification. Although medical data are necessary for such research, they are challenging to obtain due to the need for IRB approval. Therefore, this study used publicly available medical data, which exclude personal information (name, gender, age), for the benefit of the public [5]. This study also highlights the high level of accuracy achieved with a relatively small dataset compared to previous studies and describes the validation process to ensure proper learning of the model.

2. Related Research

2.1. Medical Data and Artificial Intelligence

In 2018, Ballantyne A. and Schaefer G.O. published “Consent and the ethical duty to participate in health data research”, stating that research using medical data is generally considered observational research and requires the consent of each individual participating in the study. However, the authors argue that the traditional approach of requiring individual consent for all studies should be relaxed when the public interest and transparency of the research are prioritized [5]. This argument is based on the notion that citizens have an ethical duty to share their health information for research purposes. Thus, studies utilizing medical data can be conducted without IRB approval, provided that the research’s purpose and transparency are clearly defined. In the 2021 paper by Nam J. G., “Development and validation of a deep learning algorithm detecting 10 common abnormalities on chest radiographs”, chest radiograph image data from Seoul National University Hospital collected between March 2004 and December 2017 were used. The dataset comprised a total of 146,717 images, and external validation was performed using the publicly available PadChest dataset. The deep learning model trained on these chest X-rays helped reduce interpretation time by 33%, improved radiologists’ performance, and enabled quicker interpretation in urgent situations [6]. This study confirmed the significance of using deep learning for chest X-ray interpretation. In 2022, Sarpotdar S. S. used a dataset of 108,948 outpatient chest X-rays from the public source “ChestX-rays8”. The study involved conducting data preprocessing, image enhancement, image compression, and classification using the U-net algorithm. The results showed a diagnostic accuracy of 0.94, a sensitivity of 0.96, and a specificity of 0.92 [7]. In addition, various deep learning models such as ResNet, DenseNet, and Inception v3 are widely used in the field of medical imaging [8]. In 2023, Kateb Y. and his research team used a total of 400 publicly available images, divided into 100 COVID-19, 100 pneumonia, and 100 normal images, with an additional 100 images used for testing. Using the DenseNet-121 model and applying transfer learning techniques with a pre-trained model on the ImageNet dataset, the study achieved high performance, with the F1 score, recall, and accuracy all reaching 0.98 [9]. This demonstrates the effectiveness and accuracy of the DenseNet-121 algorithm in medical image diagnosis.

2.2. Application to Cardiomegaly

The standard diagnostic indicator of cardiomegaly on chest radiographs is the cardiothoracic ratio (CTR), which is the ratio of the maximum transverse diameter of the heart to the maximum width of the chest measured at the rib angle. A CTR of 0.5 or greater is considered indicative of cardiomegaly.

In 2018, Que Q. published “CardioXNet: Automated Detection for Cardiomegaly Based on Deep Learning”, which detailed the development of CardioXNet using the DenseNet neural network to determine the presence of cardiomegaly in chest X-ray images [10]. The study utilized 103 standard PA radiographs. Due to the limited number of images, data augmentation techniques were utilized to increase the total number of images to 2630. The trained model achieved an accuracy of 0.93, an F1 score of 0.94, and a precision of 1.0 [10]. This case illustrates the use of data augmentation techniques when data are insufficient. In another instance, the Inception V3 model was utilized to integrate other deep learning algorithms into the medical field. Utilizing 1026 chest X-rays from Kyungpook National University Hospital for training, the model achieved an accuracy of 0.96 and a loss of 0.22. Future research is expected to achieve excellent results by utilizing more diverse medical imaging data [11]. There is no clear standard for the amount of data required to achieve high performance when training deep learning algorithms. However, previous studies have indicated a trend towards the use of datasets comprising at least 1000 images. In 2021, Kim M.J. used a total of 1026 chest X-ray images (526 normal and 500 with cardiomegaly) [12]. The accuracy rates for the training and testing phases were 0.92 and 0.92, respectively. Furthermore, to evaluate the model’s performance with varying data quantities, data augmentation was conducted 30-fold and 60-fold. The application of data augmentation, conducted 60-fold, resulted in a training and testing accuracy of 0.99 and 0.95, respectively. This indicates that data availability is positively correlated with training accuracy [12]. In 2023, Ribeiro E. employed a chest X-ray dataset retrospectively obtained from two major hospitals in Vietnam. A total of 15,000 images were used for training, while the remaining 3000 constituted the test set. Of these, 230 images were identified as cardiomegaly exhibiting and 1003 images were classified as normal. The algorithm used was ResNet50-v2, a model frequently utilized for cardiomegaly classification. It commenced with ImageNet weights and underwent fine-tuning on the individual dataset. The results demonstrated accuracy of 0.91, a precision of 0.74, and an F1 score of 0.79 for the model [13]. In 2020, Sogancioglu et al. published “Cardiomegaly Detection on Chest Radiographs: Segmentation Versus Classification”, highlighting the significant research focused on predicting specific abnormalities in chest X-rays [14]. However, the study points out that image-level labels, which refer to the entire image, do not provide detailed information about the shape or location of these abnormalities. This method, often described as a “black-box” system, is challenging to interpret and may not be suitable for clinical settings. Therefore, it emphasizes the need for more intuitive and interpretable models that provide insights into the reasoning behind inference [14].

2.3. Model Evaluation

Among the methods for evaluating deep learning models, visualization techniques are intuitive and straightforward to interpret. In 2017, Selvaraju R. R. elucidated Grad-CAM (Gradient-weighted Class Activation Mapping) in the paper “Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization” [15]. Grad-CAM is a technique that provides visual explanations for convolutional neural network (CNN) models by creating localization maps that highlight important regions in the image for making predictions [15]. This method facilitates comprehension of the reasons behind a model’s predictions or failures. To generate maps that highlight the most important areas in red, visualization techniques such as Grad-CAM, LIME, and SHAP were employed for a model with an accuracy of 0.91. In images that were labeled as non-cardiomegaly, both Grad-CAM and LIME still highlighted the heart region. However, the SHAP method did not designate the heart as the most relevant area for performance. This indicates that incorporating quantitative metrics for explainable AI is vital for model evaluation in future research [13].

In this study, a highly efficient and accurate model for medical image diagnosis was developed using a relatively small dataset in comparison to previous studies. Visualization techniques were used to validate the reliability of this model. This demonstrates that a high-performing AI model can be developed using high-quality data, even with a smaller dataset.

3. Experiments

3.1. Preprocessing Data

This study aims to propose that a high-performance model can be created using high-quality data, even with a smaller quantity than in previous studies. Since the study did not receive approval from an Institutional Review Board (IRB), the data were collected from publicly available sources. Clinical radiographers are not legally permitted to diagnose diseases for patients. However, through their experience of performing numerous X-ray examinations, they can intuitively recognize the differences between the lungs of a healthy individual and those of a patient, and they possess the ability to identify diseases. The significance of an AI model lies in its potential as a tool to reduce the interpretation time of cardiomegaly by achieving high accuracy amidst a large volume of X-ray images and a limited number of specialists. Furthermore, since cardiomegaly is identified when the size of the heart on an X-ray image exceeds 50% of the outer chest diameter, it is considered appropriate for radiographers to collect data without the assistance of a physician. Therefore, a radiographer curated the data to secure the target dataset. The National Institutes of Health (NIH) chest X-rays dataset available on Kaggle, an online community for data scientists and machine learning practitioners, was utilized. This dataset consists of a total of 10,000 images of cardiomegaly. However, many elements unnecessary for detecting cardiomegaly were included. Therefore, our radiographer excluded images with anteroposterior (AP) views, support devices, and artifacts visible in the images (e.g., pacemakers, heart surgery scars, spinal surgery artifacts, IV lines obscuring the lung and heart fields). Generally, chest X-rays are performed using the PA view. In a chest PA examination, the patient is standing with their chest facing the cassette. The distance between the X-ray tube and the patient is set at 6 feet to reduce beam divergence and magnification. The patient is instructed to take a deep breath and hold it during the examination. However, for emergency or critically ill patients who cannot stand, a chest AP examination is conducted. In a chest AP examination, the patient’s back is positioned against the cassette. Since the distance between the X-ray tube and the patient is shorter in an AP examination compared to a PA examination, the lateral width of the heart may appear larger. Additionally, since patients undergoing an AP examination are often in poor condition, they may not be able to take a deep breath, which can make the lungs appear narrower. In the images we obtained, AP views were marked accordingly. For the reasons mentioned above, AP images were excluded. The selected chest X-rays underwent a data enhancement process to remove unnecessary left markers and crop the images to include only the relevant areas needed for training, thereby securing a high-quality dataset. As a result, a total of 718 images were preprocessed from the 10,000 images of cardiomegaly, resulting in a dataset comprising 368 normal images and 350 cardiomegaly images. This dataset was divided into a 7:3 ratio for training and validation data, with the validation data further split into a 1:1 ratio for test data. An additional 41 images were selected from the previously excluded images to evaluate the model’s performance. Subsequently, the images were resized to 224 × 224 pixels. Figure 1a shows a typical image from the dataset, whereas Figure 1b illustrates an example of a cardiomegaly image from the same dataset. Additionally, Figure 1c shows an example of an image containing elements unnecessary for determining cardiomegaly.

3.2. Building the Model

In this study, a range of machine learning techniques were applied to compare and analyze the accuracy of cardiomegaly detection. Recently, transfer learning, a method of applying algorithms implemented in a specific environment to other similar fields, has demonstrated high efficiency and excellent performance, and is thus being used in many areas [16]. Among various deep learning algorithms, the structural characteristics of the DenseNet algorithm are frequently utilized for detecting conditions in medical image analysis. It also performs well in transfer learning and is advantageous for detecting specific diseases such as cardiomegaly. For these reasons, there are previous studies that have used the DenseNet algorithm as a backbone for classification [17,18]. This study aims to propose that high-quality data, even if relatively small in quantity, can improve model performance. Therefore, it is determined that the latest developed and announced algorithms are difficult to trust and not suitable for the purpose. For these reasons, the algorithms proposed in this study were VGG-16, Inception-V3, and DenseNet121. Each algorithm was used to construct a cardiomegaly detection model based on randomly classified datasets. Additionally, transfer learning was applied using pretrained weights from the large-scale ImageNet dataset. We implemented the deep learning algorithms using the tensorFlow deep learning framework in the Python environment.

3.2.1. VGG-16

The accuracy of the VGG-16 model in detecting cardiomegaly was evaluated as follows.

\hat{y} = σ (\frac{1}{1 + e^{- z}})

(1)

Here,

\hat{y}

represents the probability of predicting cardiomegaly, indicating the likelihood of belonging to the cardiomegaly class.

σ (z)

is the output of the sigmoid function, which is a nonlinear function. It is defined as follows:

z = W_{F C 3} F C 2 + b_{F C 3}

(2)

W

represents the weight,

F C n

represents the output value of the

n

th fully connected layer, and

b

represents the bias vector. This represents the result of applying the weights and biases of the third fully connected layer to the output of the second fully connected layer [19]. The code implementation applying this is shown in Algorithm 1.

Algorithm 1 VCD (VGG-16 for Cardiomegaly Detection) Modeling
1:	function VCD Model
2:	base_model <- VGG16(weights = “imagenet,” include_top = False, input_shape = (224, 224, 3))
3:	for layer in base_model.layers do
4:	layer.trainable = False
5:	end for
6:	x <- Flatten()(base_model.output)
7:	x <- Dense(256, activation = “relu”)(x)
8:	x <- Dropout(0.5)(x)
9:	predictions <- Dense(1, activation = “sigmoid”)(x)
10:	model <- Model(inputs = base_model.input, outputs = predictions)
11:	model.compile(optimizer = “adam,” loss = “binary_crossentropy,” metrics = [“accuracy”])
12:	return model
13:	end function

3.2.2. Inception-V3

The accuracy evaluation of cardiomegaly detection using Inception-V3 is shown in Equation (1). Here,

z

is defined as follows:

z = W_{F C} Y_{G A P} + b_{F C}

(3)

W

and

b

represent the weight and bias vector values, respectively, and have the same meaning as in the VGG-16 equations.

Y_{G A P}

represents the spatial average of each feature map compressed into a single value, which is referred to as global average pooling. This is calculated as follows:

Y_{G A P} = \frac{1}{H \times W} \sum_{i = 1}^{H} \sum_{j = 1}^{W} X_{i, j}

(4)

H

and

W

are the height and width of the feature map, and

X

is the input feature map [20]. The code implementation applying this is shown in Algorithm 2.

Algorithm 2 IVCD (InceptionV3 for Cardiomegaly Detection) Modeling
1:	function IVCD Model
2:	base_model <- InceptionV3 (include_top = False, weights = “imagenet,” input_shape = (224, 224, 3))
3:	x <- GlobalAveragePooling2D()(base_model.output)
4:	predictions <- Dense(1, activation = “sigmoid”)(x)
5:	model <- Model(inputs = base_model.input, outputs = predictions)
6:	model.compile(optimizer = Adam(learning_rate = 0.0001), loss = “binary_crossentropy,” metrics
	= [“accuracy”])
7:	return model
8:	end function

3.2.3. DenseNet121

DenseNet concatenates the feature maps of all preceding layers, enabling each layer to access a diverse range of features from previous layers. This approach allows the network to learn more complex patterns and improves overall performance [21]. Due to the dense connections, DenseNet requires fewer parameters than traditional convolutional neural networks (CNNs), which helps reduce the risk of overfitting and computational complexity. The activation function performs a nonlinear transformation of the output values of the neurons in each layer. The use of ReLU in fully connected layers can enhance the model’s performance [22]. Based on this, the structure of DenseNet includes basic convolution and pooling layers, Dense Block layers, transition layers, another Dense Block layer, and an average pooling layer prior to the classification layer. Figure 2 shows the structure of the AI model developed in this study. C1 represents the convolution layer; D1, D2, D3, and D4 represent the four Dense Block layers; and T1, T2, and T3 represent the three transition layers. The model concludes with a binary classification layer, Dense(Sigmoid).

The accuracy of the DenseNet-based model in detecting cardiomegaly was evaluated as follows:

y = σ (W x + b)

(5)

y

represents the probability of predicting cardiomegaly, indicating the likelihood of belonging to the cardiomegaly class.

W

represents the weight vector,

x

represents the input vector, and

b

represents the bias.

σ (x)

is the sigmoid function, a nonlinear function defined as:

σ (x) = \frac{1}{1 + e^{- x}}

(6)

The input

x

is defined as:

x = W_{D e n s e} \cdot G A P {\cdot b}_{D e n s e}

(7)

W_{D e n s e}

represents the weight vector of the final dense layer.

G A P

is the output of the global average pooling layer that aggregates the features extracted by the DenseNet backbone.

b_{D e n s e}

is the bias of the dense layer. A Dense Block layer consists of multiple convolutional layers, where each layer uses the outputs of all preceding layers as inputs. This allows the feature maps of each layer to be combined efficiently. The transition layer is composed of 1 × 1 convolutions and pooling layers [23]. The layers within a Dense Block are calculated as follows.

x_{i + k} = H_{k} ({x_{i}, x_{i + 1}, \dots, x_{i + k - 1}})

(8)

Here,

H_{l}

denotes operations such as convolution, batch normalization, and ReLU. The final layer of a typical DenseNet uses the Softmax function for multi-class classification, which converts the input vector into a probability distribution. Each element of the given input vector is transformed into a value of between 0 and 1; however, in this study, the output value is transformed into 0 or 1 to determine cardiomegaly. Therefore, the sigmoid function was used instead of the Softmax function. The DenseNet121 model pretrained on ImageNet was selected as the backbone for classifying cardiomegaly in chest X-rays. The backbone refers to the base network used in a CNN to extract features from an image. The code implementation of this analysis is provided in Algorithm 3.

Algorithm 3 DNCD (DenseNet for Cardiomegaly Detection)
1:	function DNCD
2:	Variable
3:	dirs ← [“train,” “validation”, “test”]
4:	train_datagen ← ImageDataGenerator(rescale = 1.0/255, zoom_range = 0.2)
5:	validation_test_datagen ← ImageDataGenerator(rescale = 1.0/255)
6:	function DENSENET_MODELING
7:	base.model ← DenseNet121(include.top = False, weights = “imagenet”, input.shape = (224, 224, 3))
8:	x ← GlobalAveragePooling2D()(base.model.output)
9:	predictions ← Dense(1, activation = “sigmoid”) (x)
10:	model ← Model(inputs = base.model.input, outputs = predictions)
11:	end function
12:	end function

3.3. Inspection Model

3.3.1. Saliency Map

A saliency map is a technique that visually represents the pixels that play a crucial role in classifying a particular image, thereby enhancing the explainability of deep learning models and contributing to the reliability of models in medical image analysis [24]. The goal was to verify whether the model recognizes the contours of the heart to determine cardiomegaly. Therefore, to indicate the importance of each pixel of the image to the given class, the gradient of the input image relative to the classification score was calculated. The gradient is calculated as follows:

w = {\frac{\partial S_{c}}{\partial I}|}_{I_{0}}

(9)

S_{c} (I)

is the score function for class

c

for input image

I

. Therefore, image

I_{0}

is the actual image to be calculated, and

w

is the gradient vector of image

I

relative to the class score

S_{c}

. By taking the partial derivative, we can determine how changes in each pixel of the input image affect the class score [24]. By rearranging the elements of

w

, we obtain the saliency map. In the case of grayscale images like X-rays, the number of elements in

w

is equal to the number of pixels in

I_{0}

; thus, the calculation in this study is as follows:

M_{i j} = |w_{h (i, j)}|

(10)

M_{i j}

is the element of the saliency map in row

i

and column

j

, and

w_{h (i, j)}

is the gradient vector value corresponding to the pixel in row

i

and column

j

of image

I

. Algorithm 4 provides an implementation to verify whether the model recognized and identified the heart in an actual X-ray image.

Algorithm 4 Compute Saliency Map
1:	function COMPUTE_SALIENCY_MAP
2:	Compute Gradients
3:	Use GradientTape to record operations for automatic differentiation:
4:	with tf.GradientTape() as tape:
5:	Watch img.tensor
6:	preds ← model(img.tensor)
7:	loss ← preds[:, class.index]
8:	Compute gradients of loss with respect to img.tensor:
9:	grads ← tape.gradient(loss, img.tensor)[0]
10:	Compute Saliency
11:	saliency ← tf.reduce_max(tf.abs (grads), axis = −1).numpy()
12:	Normalize Saliency Map
13:	Normalize saliency:
14:	saliency ← (saliency − saliency.min())/(saliency.max() − saliency.min())
15:	return saliency
16:	end function

3.3.2. Guided Backpropagation

A guided backpropagation is derived from saliency maps and deconvolutional networks, using elements from both vanilla backpropagation and deconvolutional networks when handling ReLU nonlinearity. It backpropagates only positive gradients and positive activations, thereby allowing for cleaner and more detailed visualization of important features. This method can explain the logic behind each decision made by AI algorithms [25]. Mapping the decision paths of neural networks is widely used to explain classification tasks. Therefore, the three conditions required to backpropagate the importance of each pixel in the image are summarized in Table 1.

Guided backpropagation highlights key evidence regions in classification tasks to extract features and uses the following equation to indicate the most activated parts of the image as it passes through ReLU.

R_{i}^{l} = (f_{i}^{l} > 0) (R_{i}^{l + 1} > 0) R_{i}^{l + 1}

(11)

R_{i}^{l}

and

f_{i}^{l}

are the relevance and activation values of neuron

i

in layer

l

, respectively. In this study, not only saliency maps but also guided backpropagation were employed to determine whether the model accurately recognized the heart contours. Backpropagation was additionally performed only on neurons with positive activation values, allowing intuitive representation of the neuron’s influence on the input image. The implementation of this analysis is given in Algorithm 5.

Algorithm 5 Compute-Guided Backpropagation
1:	function GUIDEDRELU(x)
2:	Apply Guided ReLU during backpropagation:
3:	if x is greater than 0 and the partial derivative of y with respect to x is
4:	greater than 0 then
5:	guided_relu ← x
6:	else
7:	guided_relu ← 0
8:	return guided_relu
9:	end function
10:	function GUIDEDBACKPROPAGATION(grads)
11:	Calculate guided backpropagation gradients:
12:	guided_grads ← grads.numpy()
13:	return guided_grads
14:	end function

4. Results

To propose AI models with good performance trained with high-quality data, albeit fewer than in previous studies, the performance of the VCD, IVCD, and DNCD models was compared. We secured a dataset of 718 high-quality images (368 normal and 350 cardiomegaly) from 10,000 data through a radiographer’s selection process and then preprocessed the images using OpenCV functions. The data were randomly split into training, validation, and test sets at a 7:2:1 ratio (cardiomegaly = 1, non-cardiomegaly = 0). To compare the performance of each algorithm, the batch size was set to 16 and the number of epochs to 20 for model training. The results of the VCD, IVCD, and DNCD on the randomly selected validation data from the 718 images are presented in Table 2.

Comparing the performance of the three models, the VGG-16 algorithm demonstrated an accuracy of 0.91, a recall of 1.0, and an AUC value of 0.99, indicating high accuracy. However, its precision and loss values were not as good as those of the other two algorithms. For Inception-V3, the recall was 0.83 and the F1 score was 0.91, indicating relatively lower values. On the other hand, DenseNet121 demonstrated an accuracy of 0.95, a precision of 0.98, a recall of 0.91, a loss value of 0.11, an F1 score of 0.94, and an AUC value of 0.99, indicating the best performance among the three models. The best-performing model was tested with 41 excluded images, and as a result, a high accuracy of 0.95 was confirmed. In clinical settings, accuracy, recall, and AUC are particularly important for model evaluation. The ROC curve of the DNCD model performance, which visually represents sensitivity and specificity, is shown in Figure 3.

To verify whether the DNCD deep learning model could classify images by assessing the heart size of a radiologist, saliency maps and guided backpropagation techniques were used. The results are shown in Figure 4.

Figure 4a shows the original chest X-ray image, Figure 5b shows the results using the saliency map, and Figure 5c shows the results using guided backpropagation. The saliency map revealed that the model focused on the outer edges of the heart. It suggests that even poorly refined chest X-ray images can accurately detect the heart. The results for the excluded cardiomegaly images are shown in Figure 5.

Figure 5a–c show the results using the original image, the saliency map, and guided backpropagation, respectively. This is composed of two normal images and two cardiomegaly images. When unnecessary elements for determining cardiomegaly were included, it was interpreted that the model focused on the entire heart, including the outer edges, as well as the outer edges of the chest for determination. Therefore, the guided backpropagation results indicate that not only the outer edges of the heart but also the outer edges of the chest are important for determining cardiomegaly. This demonstrates the reliability of the proposed DNCD model.

5. Conclusions

This study represents a significant advancement in the early diagnosis and detection of the increasing number of cardiomegaly patients. Chest X-rays are essential for the diagnosis of cardiomegaly, and as the number of acquired chest X-ray images increases, the need for efficient diagnostic tools grows. While research into using deep learning AI as an X-ray support tool is actively being conducted, access to medical data remains limited. To overcome this problem, we presented an AI model using high-quality cardiomegaly X-ray data selected by experts from public datasets, which demonstrated significant accuracy and recall. In addition, we showed that the designed model can analyze heart images and make decisions similar to those of a real expert. These results demonstrate that carefully selecting high-quality data can lead to the development of high-performance AI models on relatively small datasets. We faced the limitation of the time-consuming and cumbersome process of manually reviewing each of the 10,000 images to assess the ratio of heart size to chest boundary and select high-quality data for the dataset. According to Kaggle’s description, the inaccuracy rate of this public dataset can reach up to 10%. Therefore, future research should explore methods to reduce data selection time and improve reliability by integrating a process that incorporates physician judgment and uses the YOLO model to detect and exclude unnecessary artifacts in the images when identifying cardiomegaly. Additionally, building and modeling high-quality datasets through various methods and comparing the performance of these models is necessary. This approach could potentially lead to achieving a performance level higher than the 0.95 accuracy of the DNCD model presented in this study.

In conclusion, this study showed that early diagnosis of cardiomegaly and reduced interpretation time can improve the quality of life, reduce medical costs, and enhance the quality of healthcare services. This study also serves as a positive signal to researchers seeking to develop effective AI using public data. The proposed AI model, DNCD, for cardiomegaly detection opens new possibilities for data use in the medical field and is expected to change the paradigm of AI applications for various diseases.

Author Contributions

K.-H.L. and J.-W.C.: writing—original draft, data curation, and software. D.-H.H.: visualization. C.-O.P. and M.-S.K.: conceptualization, validation, writing—review and editing, and project administration. M.-S.K.: formal analysis and funding acquisition. K.-H.L. is the first author. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the University Innovation Support Project of Eulji University.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data presented in this study, accessed on 21 May 2024, are available from the NIH Chest X-rays dataset on Kaggle at https://www.kaggle.com/datasets/nih-chest-xrays/data. The high-quality dataset of 718 images was selected by radiologists at https://drive.google.com/drive/folders/1ejIP3YTIUBh9ce8oiwd2Z2BzuzCWXrBF?usp=sharing.

Conflicts of Interest

Author Chun-Oh Park is the president of the company Pnp Secure. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Geric, C.; Qin, Z.Z.; Denkinger, C.M.; Kik, S.V.; Marais, B.; Anjos, A.; Trajman, A. The rise of artificial intelligence reading of chest X-rays for enhanced TB diagnosis and elimination. Int. J. Tuberc. Lung Dis. 2023, 27, 367–372. [Google Scholar] [CrossRef]
Messerli, F.H.; Schmieder, R. Left ventricular hypertrophy: A cardiovascular risk factor in essential hypertension. Drugs 1986, 31, 192–201. [Google Scholar] [CrossRef]
Kim, H.J.; Jeong, M.H.; Yoon, H.J.; Kim, Y.C.; Sohn, S.J.; Kim, M.C.; Park, J.C. Difference of clinical outcomes according to left ventricular hypertrophy and its subtype in Korean patients with acute myocardial infarction. Korean J. Med. 2020, 95, 387–397. [Google Scholar] [CrossRef]
Bennani, S.; Regnard, N.E.; Ventre, J.; Lassalle, L.; Nguyen, T.; Ducarouge, A.; Chassagnon, G. Using AI to improve radiologist performance in detection of abnormalities on chest radiographs. Radiology 2023, 309, e230860. [Google Scholar] [CrossRef]
Ballantyne, A.; Schaefer, G.O. Consent and the ethical duty to participate in health data research. J. Med. Ethics 2018, 44, 392–396. [Google Scholar] [CrossRef] [PubMed]
Nam, J.G.; Kim, M.; Park, J.; Hwang, E.J.; Lee, J.H.; Hong, J.H.; Park, C.M. Development and validation of a deep learning algorithm detecting 10 common abnormalities on chest radiographs. Eur. Respir. J. 2021, 57, 200399. [Google Scholar] [CrossRef]
Sarpotdar, S.S. Cardiomegaly detection using deep convolutional neural network with U-net. arXiv 2022, arXiv:2205.11515. [Google Scholar]
Song, K.D.; Kim, M.; Do, S. The Latest Trends in the Use of Deep Learning in Radiology Illustrated through the Stages of Deep Learning Algorithm Development. J. Korean Soc. Radiol. 2019, 80, 202–212. [Google Scholar] [CrossRef]
Kateb, Y.; Meglouli, H.; Khebli, A. Coronavirus diagnosis based on chest X-ray images and pre-trained DenseNet-121. Rev. Intell. Artif. 2023, 37, 23. [Google Scholar] [CrossRef]
Que, Q.; Tang, Z.; Wang, R.; Zeng, Z.; Wang, J.; Chua, M.; Veeravalli, B. CardioXNet: Automated detection for cardiomegaly based on deep learning. In Proceedings of the 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Honolulu, HI, USA, 18–21 July 2018; pp. 612–615. [Google Scholar]
Jung, W.Y.; Kim, J.H.; Park, J.E.; Kim, M.J.; Lee, J.M. Evaluation of classification performance of Inception V3 algorithm for chest X-ray images of patients with cardiomegaly. J. Korean Soc. Radiol. 2021, 15, 455–461. [Google Scholar]
Kim, M.J.; Kim, J.H. Proposal of a Convolutional Neural Network Model for the Classification of Cardiomegaly in Chest X-ray Images. J. Korean Soc. Radiol. 2021, 15, 613–620. [Google Scholar] [CrossRef]
Ribeiro, E.; Cardenas, D.A.; Krieger, J.E.; Gutierrez, M.A. Interpretable deep learning model for cardiomegaly detection with chest X-ray images. In Proceedings of the XXIII Simpósio Brasileiro de Computação Aplicada à Saúde, Florianópolis, Brazil, 7–10 June 2023; SBC: Florianópolis, Brazil, 2023; pp. 340–347. [Google Scholar]
Sogancioglu, E.; Murphy, K.; Calli, E.; Scholten, E.T.; Schalekamp, S.; Van Ginneken, B. Cardiomegaly detection on chest radiographs: Segmentation versus classification. IEEE Access 2020, 8, 94631–94642. [Google Scholar] [CrossRef]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
Lee, H.S.; Kim, J.G.; Yoo, J.W.; Jung, Y.S.; Kim, S.S. A study on multi-class classification using convolutional neural networks based on transfer learning. J. Korean Inst. Intell. Syst. 2018, 28, 531–537. [Google Scholar]
Innat, M.; Hossain, M.F.; Mader, K.; Kouzani, A.Z. A convolutional attention mapping deep neural network for classification and localization of cardiomegaly on chest X-rays. Sci. Rep. 2023, 13, 6247. [Google Scholar] [CrossRef] [PubMed]
Decoodt, P.; Liang, T.J.; Bopardikar, S.; Santhanam, H.; Eyembe, A.; Garcia-Zapirain, B.; Sierra-Sosa, D. Hybrid classical–quantum transfer learning for cardiomegaly detection in chest X-rays. J. Imaging 2023, 9, 128. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
Lin, C.; Li, L.; Luo, W.; Wang, K.C.; Guo, J. Transfer learning based traffic sign recognition using inception-v3 model. Period. Polytech. Transp. Eng. 2019, 47, 242–250. [Google Scholar] [CrossRef]
Potsangbam, J.; Devi, S.S. Classification of Breast Cancer Histopathological Images Using Transfer Learning with DenseNet121. Procedia Comput. Sci. 2024, 235, 1990–1997. [Google Scholar] [CrossRef]
O’Shea, K.; Nash, R. An introduction to convolutional neural networks. arXiv 2015, arXiv:1511.08458. [Google Scholar]
Iandola, F.; Moskewicz, M.; Karayev, S.; Girshick, R.; Darrell, T.; Keutzer, K. Densenet: Implementing efficient convnet descriptor pyramids. arXiv 2014, arXiv:1404.1869. [Google Scholar]
Simonyan, K.; Vedaldi, A.; Zisserman, A. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv 2013, arXiv:1312.6034. [Google Scholar]
Dong, G.; Ma, Y.; Basu, A. Feature-guided CNN for denoising images from portable ultrasound devices. IEEE Access 2021, 9, 28272–28283. [Google Scholar] [CrossRef]

Figure 1. Chest X-ray sample images.

Figure 2. DenseNet-based model structure.

Figure 3. ROC curve of the DNCD model.

Figure 4. Results of saliency maps and guided backpropagation using the DNCD model.

Figure 5. Results of the saliency map and guided backpropagation for the excluded images using the DNCD model.

Table 1. Guided backpropagation conditions.

1	The activation value must be positive.
2	The relevance in the previous layer must be positive.
3	The relevance in the next layer must be positive.

Table 2. Evaluation of the cardiomegaly model for each classification algorithm.

	VCD	IVCD	DNCD
Accuracy	0.91	0.92	0.95
Precision	0.87	1.00	0.98
Recall	1.00	0.83	0.91
Loss	0.21	0.02	0.11
F1 Score	0.93	0.91	0.94
AUC	0.99	0.99	0.99

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lee, K.-H.; Choi, J.-W.; Park, C.-O.; Han, D.-H.; Kang, M.-S. A Development and Validation of an AI Model for Cardiomegaly Detection in Chest X-rays. Appl. Sci. 2024, 14, 7465. https://doi.org/10.3390/app14177465

AMA Style

Lee K-H, Choi J-W, Park C-O, Han D-H, Kang M-S. A Development and Validation of an AI Model for Cardiomegaly Detection in Chest X-rays. Applied Sciences. 2024; 14(17):7465. https://doi.org/10.3390/app14177465

Chicago/Turabian Style

Lee, Kang-Hee, Jun-Woo Choi, Chun-Oh Park, Dong-Hun Han, and Min-Soo Kang. 2024. "A Development and Validation of an AI Model for Cardiomegaly Detection in Chest X-rays" Applied Sciences 14, no. 17: 7465. https://doi.org/10.3390/app14177465

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Development and Validation of an AI Model for Cardiomegaly Detection in Chest X-rays

Abstract

1. Introduction

2. Related Research

2.1. Medical Data and Artificial Intelligence

2.2. Application to Cardiomegaly

2.3. Model Evaluation

3. Experiments

3.1. Preprocessing Data

3.2. Building the Model

3.2.1. VGG-16

3.2.2. Inception-V3

3.2.3. DenseNet121

3.3. Inspection Model

3.3.1. Saliency Map

3.3.2. Guided Backpropagation

4. Results

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI