Integrating Principal Component Analysis and Multi-Input Convolutional Neural Networks for Advanced Skin Lesion Cancer Classification

Madinakhon, Rakhmonova; Mukhtorov, Doniyorjon; Cho, Young-Im

doi:10.3390/app14125233

Open AccessArticle

Integrating Principal Component Analysis and Multi-Input Convolutional Neural Networks for Advanced Skin Lesion Cancer Classification

by

Rakhmonova Madinakhon

¹

,

Doniyorjon Mukhtorov

¹

and

Young-Im Cho

^2,*

¹

Department of IT Convergence Engineering, Gachon University, Sujeong-gu, Seongnam-si 461-701, Republic of Korea

²

Department of Computer Engineering, Gachon University, Sujeong-gu, Seongnam-si 13120, Republic of Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(12), 5233; https://doi.org/10.3390/app14125233

Submission received: 11 May 2024 / Revised: 12 June 2024 / Accepted: 12 June 2024 / Published: 17 June 2024

Download

Browse Figures

Versions Notes

Abstract

:

The importance of early detection in the management of skin lesions, such as skin cancer, cannot be overstated due to its critical role in enhancing treatment outcomes. This study presents an innovative multi-input model that fuses image and tabular data to improve the accuracy of diagnoses. The model incorporates a dual-input architecture, combining a ResNet-152 for image processing with a multilayer perceptron (MLP) for tabular data analysis. To optimize the handling of tabular data, Principal Component Analysis (PCA) is employed to reduce dimensionality, facilitating more focused and efficient model training. The model’s effectiveness is confirmed through rigorous testing, yielding impressive metrics with an F1 score of 98.91%, a recall of 99.19%, and a precision of 98.76%. These results underscore the potential of combining multiple data inputs to provide a nuanced analysis that outperforms single-modality approaches in skin lesion diagnostics.

Keywords:

skin cancer detection; multi-input deep learning; Principal Component Analysis (PCA); medical image augmentation

1. Introduction

1.1. Problem Description and Motivation

Skin cancer represents a prevalent health issue globally, with its early detection being crucial for enhancing treatment efficacy and reducing mortality rates. The challenge in diagnosing skin cancer lies in the similarity of its manifestations to other benign and malignant dermatological conditions. Skin cancer is one of the most common cancers worldwide, according to the World Health Organization (WHO). Every year, there are 2 to 3 million non-melanoma skin cancers and 132,000 melanoma skin cancers diagnosed globally [1]. This diagnostic complexity is heightened by the appearance of lesions such as skin-colored nodules or pearly rolled edges, which may resemble several types of skin neoplasms including basal cell carcinoma, squamous cell carcinoma, and other non-melanoma skin conditions [2]. Misdiagnosis is a significant risk due to these visual ambiguities, necessitating more reliable diagnostic tools. The early detection of skin cancer, particularly melanoma, is vital because it drastically increases the survival rates. For example, the five-year survival rate for melanoma when detected early is about 99%, compared to only 25% when diagnosed at an advanced stage. This stark contrast highlights the critical role of timely and accurate diagnosis in improving patient outcomes. Additionally, understanding the different types of skin cancer—basal cell carcinoma, squamous cell carcinoma, and melanoma—each with distinct appearances and progression rates, is essential for developing targeted diagnostic techniques. These distinctions help refine the focus of AI tools, enabling them to adapt and respond to the specific challenges presented by each cancer type, further enhancing diagnostic accuracy and treatment planning.

These challenges underscore the need for innovative approaches to skin cancer detection, such as telemedicine and artificial intelligence (AI)-assisted diagnostic systems [3,4]. Such technologies offer the potential to decentralize and expedite the diagnosis process, reducing hospital congestion and providing timely, accurate screenings. By integrating AI with dermatological diagnostics, non-specialist healthcare providers, especially in remote areas, can perform preliminary assessments and facilitate early interventions. This approach not only enhances patient access to care, but also improves the overall efficiency and effectiveness of medical services in detecting and treating skin cancer at its earliest stages.

1.2. Recent Advances in Medical Imaging

The field of medical imaging has undergone significant advancements in recent years, largely driven by breakthroughs in machine learning and artificial intelligence. These advancements have profoundly impacted the diagnostic capabilities in healthcare, particularly in the detection and analysis of various medical conditions, including skin cancer.

Convolutional neural networks (CNNs) have revolutionized image recognition tasks due to their ability to learn complex hierarchical features from images. In medical imaging, CNN architectures like EfficientNet [5], VGG-16 [6], ResNet [6], and GoogleNet [7] have been particularly effective. For example, EfficientNet has been utilized for its efficiency and scalability in processing high-resolution medical images, while ResNet’s deep residual learning framework helps in learning from enormous datasets commonly used in medical diagnostics. In addition, segmentation-based models are popular, which involve dividing an image into segments to simplify or change the representation of an image into something more meaningful and easier to analyze. YOLO (You Only Look Once) [8] has been adapted for real-time object detection, useful in identifying specific structures in medical scans. Meanwhile, Mask R-CNN [9] has been pivotal in performing pixel-level segmentation, assisting in precisely delineating the boundaries of tumors or other pathological lesions in medical images.

Traditional machine learning models, although somewhat overshadowed by deep learning, still play a critical role in areas where labeled data are scarce or when interpretability is a key requirement. Techniques like Support Vector Machine (SVM) and Random Forest [10] are used to classify images based on extracted features, providing robust baseline models for medical imaging tasks.

Multi-input CNN-based models [11] are used to integrate multiple data types (e.g., clinical data, image data) into a unified framework, offering a more holistic approach to diagnostics. An example includes a multi-input model that combines CNN-extracted features from dermatological images with patient metadata to improve the accuracy of skin cancer predictions. This approach leverages the strengths of both data types, enhancing the model’s predictive performance and reliability.

1.3. Contribution

The main contributions of the paper are as follows:

(1): Advanced preprocessing of the dataset, including data cleaning, encoding, and application of Principal Component Analysis (PCA) [12]. Although PCA is a well-known method, we have specifically optimized it for efficient tabular data integration into our deep learning model, focusing on the unique demands of skin lesion classification.
(2): Implementation of medical-specific image augmentation techniques, such as precisely controlled adjustments to brightness and color. While data augmentation is a common practice, our approach uses a low ratio of alterations to maintain the diagnostic integrity of medical images, thus enhancing the robustness of the model without compromising the diagnostic relevance of the images.
(3): Development of a custom multi-input CNN model, complete with detailed specification of hyperparameters, which integrates both medical imaging and patient metadata to improve diagnostic accuracy.
(4): Comprehensive analysis of the efficiency of different model configurations, providing insights into which non-default parameters significantly enhance classification performance, thereby informing future optimizations and applications.

The remaining part of paper is organized as follows. Section 2 reviews related works, providing a comprehensive background and situating our study within the existing scholarly landscape. In Section 3, Methods and Materials, we detail our data preprocessing techniques and the materials employed in our study. Section 4 discusses the implementation, where we delve into the technical specifics of our custom multi-input CNN model, including hyperparameters and the computational resources used. The paper concludes with Section 5, where we summarize our findings, discuss the implications of our research, and suggest potential avenues for future work.

2. Related Work

The development and application of machine learning and convolutional neural network (CNN) models for skin lesion classification have seen significant advancements in recent years. Initially, traditional machine learning techniques were widely employed in skin lesion classification, utilizing algorithms like Multi-class Support Vector Machines (SVMs) [13] and K-Means clustering [14]. These methods focused on statistical analysis of features extracted manually from images, providing a foundational approach for early classification efforts. As the field progressed, the advent of convolutional neural networks (CNNs) marked a significant shift, with architectures like MobileNetv2 [15] and ResNet-152 [16] offering more sophisticated image analysis capabilities through deep learning, enabling automatic feature extraction directly from raw images.

Further advancements led to the integration of segmentation techniques alongside CNNs, enhancing the precision of classifications. Methods such as U-Net began to be employed for isolating lesions from surrounding skin before classification, improving the accuracy by focusing the CNNs on relevant image segments only. The culmination of these developments has been the adoption of multi-input CNN models that integrate segmented image data with patient metadata, using diverse architectures to capture a broad spectrum of dermatological features effectively. In the advanced stages, researchers have explored various strategies to optimize the accuracy and efficiency of these models, particularly through the use of multi-input systems that combine different data types and processing techniques. Recent studies have heavily employed CNN architectures due to their efficacy in handling complex image data. For instance, Milanteva et al. [17] utilized a multi-faceted approach involving segmentation with R2U-Net followed by classification using a range of CNN models including EfficientNetB0-B7, SENet-154, ResNeXt-101 32x4d, and Inception-ResNet-v2. This methodology emphasizes the critical role of precise lesion isolation and the subsequent application of diverse CNN architectures to capture and classify various dermatological features effectively, followed by employing various CNN architectures—yielding an accuracy of 96.86%, re-call of 85%, precision of 82%, and an F1 score of 83%. Many existing models, while achieving high accuracy on specific datasets, struggle to maintain performance across diverse datasets due to overfitting and the highly specialized nature of the training data. This limitation often results in reduced effectiveness when applied to real-world clinical settings where data variability is high.

The incorporation of multi-input CNN models marks a pivotal advancement in dermatological imaging. These models integrate multiple data streams, such as high-resolution images, extracted regions of interest, and additional contextual information, to enhance diagnostic precision. J. Wu et al. [18] proposed an end-to-end model that maximizes the use of high-resolution images through a multi-input strategy, including cropping, original image downsampling, and region of interest extraction using CAM, supplemented by an attention mechanism to enhance classification accuracy, achieving an accuracy of 88.4%, recall of 76.7%, and precision of 96.3%. M. A. Kassem et al. [19] enhanced model performance through transfer learning with a pretrained GoogleNet, fine-tuning the model parameters during training, which resulted in an accuracy of 94.92%, recall of 79.8%, precision of 97%, and an F1 score of 80.36. Gessert [20] tackled skin lesion classification by employing an ensemble of deep learning models, including EfficientNet, SENet, and ResNeXt WSL, utilizing multiple input resolutions and dual cropping strategies during training.

3. Method and Materials

3.1. Dataset

For our study, we utilized the ISIC 2019 [21] dataset, a resource in dermatology aimed at advancing automated diagnostic models for skin lesions. This dataset includes 24,894 high-quality dermoscopic images, each annotated with one of eight distinct skin lesion classes: Melanoma (MEL), Melanocytic Nevus (NV), Basal Cell Carcinoma (BCC), Actinic Keratosis (AK), Benign Keratosis (BKL), Dermatofibroma (DF), Vascular Lesion (VASC), and Squamous Cell Carcinoma (SCC). The accompanying metadata provides additional context with fields such as the approximate age of the patient, the general anatomical site of the lesion, and the patient’s sex. Figure 1 shows an example of eight classes from the ISIC 2019 dataset.

3.2. Preprocessing

In the development of our skin lesion classification model, the preprocessing of the ISIC 2019 dataset was critical to ensure the integrity and usability of the data. The preprocessing steps undertaken are outlined in the following subsections.

3.2.1. Cleaning of Metadata

The first step involved the removal of any NaN values present in the metadata to maintain consistency and reliability in the dataset. This cleaning process ensured that each image in the dataset had complete and accurate metadata for analysis. Specifically, the removal of NaN values is imperative for the effective application of Principal Component Analysis (PCA), which requires complete datasets to function correctly. Moreover, ensuring that each image’s metadata matches perfectly is essential for the integrity of the data linkage, enabling accurate correlation between the image features and their corresponding clinical information.

3.2.2. One-Hot Encoding of Metadata

To facilitate the integration of categorical metadata into our model, we applied one-hot encoding to the metadata fields. This transformation converted categorical variables, such as anatomical site and sex, into a binary matrix representation, making them suitable for input into machine learning models.

3.2.3. Principal Component Analysis (PCA)

PCA was employed on the training dataset to reduce the dimensionality of the tabular data. This step was crucial to avoid data leakage and ensure that the model was trained only on the information available in the training set, without any influence from the test set. By focusing on the most informative features, PCA helped in enhancing the training process and the model’s performance. Figure 2 shows the tabular dataset before and after PCA.

3.2.4. Medical Image Augmentation

In our approach, we employ medical-specific image augmentation techniques to enhance the robustness of our model without significantly altering the diagnostic features of the images. Given the sensitivity of dermatological images, where subtle variations can be critical, our augmentations are carefully calibrated to maintain the integrity of the visual information. We utilize random rotations (x < 20 gradus) and flips (0 < x < 180) to introduce geometric variability, while ensuring that changes to brightness and color are minimal—restricted to less than 3 out of 10 on a standardized scale—to preserve the essential characteristics of the skin lesions. These augmentations are implemented using PyTorch’s (v2.3.0) transformation methods, which allow for seamless integration and reproducibility of the enhancement process in our training pipeline. Figure 3 shows the result of augmentation method.

3.3. Methodology

Our model integrates image data and tabular metadata to enhance skin lesion classification accuracy (Figure 4). We use four CNN models—EfficientNetV2 [22], ResNet-152, MobileNetV2 [23], and ResNet-50 [24]—for robust image feature extraction. Custom data augmentation ensures data integrity and variability. For metadata, we apply one-hot encoding and PCA to streamline the data, which are processed by a multilayer perceptron (MLP) [25]. The outputs from the CNNs and MLP are combined in a fully connected layer, leveraging both visual and contextual information. This hybrid approach, along with tailored data augmentation, significantly improves the model’s diagnostic performance in dermatology.

3.3.1. CNN Models

ResNet-152 is a deep residual network that is part of the ResNet family, which introduced the revolutionary concept of residual learning to ease the training of very deep neural networks. With 152 layers, it utilizes skip connections or shortcuts to jump over some layers, allowing it to combat the vanishing gradient problem effectively. This architecture is highly favored for its ability to scale up to hundreds of layers while maintaining performance, making it exceptionally powerful for image classification tasks.

MobileNetV2 is designed for mobile and resource-constrained environments, focusing on optimizing the trade-off between latency and accuracy. It introduces the inverted residual structure where the residual connections are between the bottleneck layers, and the use of lightweight depthwise convolutions to filter features in a computationally efficient manner. This model is particularly noted for its small size and lower computational requirements, without a substantial compromise on accuracy.

ResNet-50 is another variant from the ResNet family, which is shallower than the ResNet-152 model but still significantly deep with 50 layers. It employs the same residual learning principles to facilitate the training of deep networks by enabling feature reuse and preventing the training process from degrading. ResNet-50 is widely used across many domains due to its excellent balance between depth, performance, and computational efficiency, making it a robust choice for both academic and industrial applications.

EfficientNetV2 is part of the EfficientNet family, known for balancing model scaling in terms of depth, width, and resolution. This particular variant, “V2”, is optimized for real-world applications with an emphasis on achieving higher accuracy and efficiency on various computing devices. The architecture leverages compound scaling and a revised version of model scaling to enhance training speed and parameter efficiency, making it well suited for tasks requiring high computational performance and accuracy. The key features are compared in Table 1.

3.3.2. Tabular Data Processing with MLP and PCA

The tabular data in our model are processed using a multilayer perceptron (MLP), which is particularly effective for handling structured data. An MLP is a class of feedforward artificial neural network that consists of at least three layers of nodes: an input layer, hidden layers, and an output layer. Each node, except for the input nodes, is a neuron that uses a nonlinear activation function. MLP utilizes a supervised learning technique called backpropagation for training, capturing complex relationships in data by adjusting weights and biases through gradient descent. In our model, the MLP is tasked with extracting and synthesizing features from the tabular metadata, providing a rich set of predictive signals that complement the visual cues from the images.

3.3.3. Prediction Mechanisms of Multi-Input CNN Model

In our model, after extracting features from both image and tabular data, we combine these features using an element-wise addition, rather than a typical concatenation. This process is detailed in our forward function within the model’s architecture. Specifically, image features are obtained using a pretrained ResNet152 model, tailored to our task with a new fully connected layer adjusting the output to match the number of classes. Simultaneously, tabular data are processed through a multilayer perceptron (MLP) that maps the input features to the same dimensional space as the image features. The final step involves an element-wise addition of the image and tabular features, which are then passed directly as combined features for classification. This method of feature integration leverages the strengths of both data types effectively, ensuring that both visual and non-visual information contribute equally to the final prediction.

3.3.4. Impact of PCA to Tabular Data

We applied Principal Component Analysis (PCA) with three components to the original metadata, which included numerous variables like age, anatomical site, and sex, to effectively reduce the dimensionality of our dataset. This reduction helps to highlight the most significant features, enhancing the model’s efficiency and accuracy by focusing on the most informative aspects of the data. The impact of using PCA in this way is a more streamlined model that can process complex datasets more quickly while maintaining robustness in predictive performance.

4. Implementation

4.1. Hyperparameters

In developing our multi-input CNN model, we carefully configured the hyperparameters to optimize performance. We used two optimizers: Adam with a 0.01 learning rate for rapid initial convergence, and SGD with a 0.001 learning rate for stable later-stage training. Additional hyperparameters included batch size, number of epochs, dropout, and L2 regularization to prevent overfitting. Batch normalization was employed to stabilize and accelerate training by normalizing layer inputs. We chose the best parameters based on the batch size of 256, and eight workers were specifically set based on our GPU capacity, allowing for optimal utilization of our hardware resources without causing memory bottlenecks. The number of epochs was set to 50 as this was the point at which we observed the loss value to plateau, indicating that further training did not significantly improve the model’s performance. Regarding optimization techniques, we employed both Adam and Stochastic Gradient Descent (SGD) [26] to compare their impacts on model convergence and accuracy. Adam was found to be more effective in our context due to its adaptive learning rate capabilities, which helped in faster convergence, especially when dealing with the complex data inputs of our multi-input model. We chose a learning rate of 0.001 after conducting a series of preliminary tests to identify the optimal rate that balances fast convergence with the stability of the training process. We found that a higher learning rate, such as 0.01, often led to overshooting the minimum of the loss function, resulting in unstable training dynamics and poorer model performance. Table 2 illustrates the list of hyperparameters.

4.2. Computational Resource

The computational experiments were conducted on a high-performance GPU server provided by Gachon University’s Smart-city Lab. The server was equipped with an Nvidia A6000 GPU, which offers substantial computational power to handle complex model training involving large datasets and deep neural networks. The system was also supported by 64 GB of RAM, ensuring efficient handling of large-scale data operations and simultaneous processes without bottleneck issues.

4.3. Metrics and Results

In evaluating the performance of our multi-input CNN model for skin lesion classification, we employed a comprehensive suite of metrics that are standard in the field of machine learning for classification tasks. These metrics [27] provide a robust framework for assessing the accuracy and reliability of our predictive model, ensuring that our results are both scientifically valid and practically applicable.

Accuracy (1) is the simplest and most intuitive performance measure. It is defined as the ratio of correctly predicted observations to the total observations.

A c c u r a c y = \frac{T o t a l o f c o r r e c t P r e d i c t i o n s}{T o t a l N u m b e r o f P r e d i c t i o n s}

(1)

Recall (2) is a measure of a model’s ability to correctly identify all relevant instances (true positives). In the context of medical diagnostics, a high recall rate is crucial as it reflects the model’s effectiveness in identifying all positive cases, minimizing the number of false negatives.

R e c a l l = \frac{T r u e P o s i t i v e}{T r u e P o s i t i v e + F a l s e N e g a t i v e}

(2)

Precision (3) is the ratio of correctly predicted positive observations to the total predicted positives. High precision relates to a low false positive rate, essential in medical diagnostics where falsely identifying a condition could lead to unnecessary treatments.

P r e c i s i o n = \frac{T r u e P o s i t i v e}{T r u e P o s i t i v e + F a l s e P o s i t i v e}

(3)

The F1 score (4) is the weighted average of precision and recall. Therefore, this score takes both false positives and false negatives into account. It is particularly useful when the class distribution is uneven. The F1 score is a good measure to use if you need to seek a balance between precision and recall.

F 1 S c o r e = 2 \times (\frac{P r e c i s i o n \times R e c a l l)}{P r e c i s i o n + R e c a l l}

(4)

4.4. Training and Validation

In the evaluation of the performance of our multi-input CNN model in classifying skin lesions, we carefully structured the training and validation process using a split of the dataset in a 7:2:1 ratio. This distribution allowed us to train the model on 70% of the data, validate its performance on 20%, and test it on the remaining 10%. Table 3 shows the number of images per classes for the training, validation, and test set.

Each trial consisted of 50 epochs, providing a balanced compromise between sufficient training to reach convergence and computational efficiency. The long training duration per trial ensured that each model had ample opportunity to learn from the rich dataset, adapt to the nuances of the medical images, and stabilize its performance over time. Figure 5 shows the comprehensive training and validation results.

The models’ performance was continuously monitored through metrics like accuracy, F1 score, precision, and recall, calculated after each epoch during validation. Figure 4 shows the training and validation results for four alternative CNN models with MLP, and the ResNet-152-based multi-input CNN model achieved the highest training and validation accuracy with 99.016 and 98.9, respectively. The lowest training and validation losses were also found for ResNet-152 at 0.065 and 0.004. Table 4 shows the detailed results for the CNN models.

As we used PCA for the tabular dataset, we compared the impact of PCA on the test results. The Table 5 displays the performance metrics of our model with and without the implementation of Principal Component Analysis (PCA). Using PCA with three components improved the model’s accuracy, recall, precision, and F1 score, highlighting PCA’s effectiveness in enhancing the model’s overall diagnostic capabilities.

The results of our classification model are presented in Figure 6 using a confusion matrix, which quantitatively illustrates the model’s performance across various skin lesion types. The matrix in Figure 6 showcases a high degree of accuracy for the majority of the classes, with perfect classification observed for melanoma, melanocytic nevus, basal cell carcinoma, and benign keratosis, where no misclassifications occurred. However, the model struggled with the accurate classification of vascular lesions, as evident from the misclassifications into dermatofibroma and squamous cell carcinoma. This highlights specific areas where further model refinement and training data enhancement could be beneficial. The overall high diagonal values confirm the model’s robustness in distinguishing between different lesion types, emphasizing its potential utility in clinical settings.

4.5. Comparison with Related Methods

In our study, we conducted a detailed comparison between our multi-input CNN model and other related methods, primarily focusing on key performance metrics such as accuracy, recall, precision, and F1 scores. This comparative analysis was essential to evaluate the efficacy of our approach in the context of existing methodologies and to highlight the advancements our model introduces in the field of skin lesion classification. Our multi-input CNN model demonstrated higher performance across several metrics when compared to the benchmark models. Notably, the integration of image and tabular data in our model allowed for a more nuanced analysis, leading to higher accuracy and F1 scores. The specific adjustments and optimizations in our model architecture, such as the strategic use of different CNN backbones and the tailored handling of tabular data through PCA and MLP, were key factors in achieving these results. The success of our approach suggests promising directions for further research and potential implementation in clinical environments, where accurate and reliable diagnoses can significantly impact patient outcomes. Table 6 shows the results from several existing research papers and our proposed method in terms of accuracy, recall, precision, and F1 score.

5. Conclusions

In this research, we presented a novel multi-input CNN model tailored for the classification of skin lesions, combining the strengths of advanced image processing and metadata analysis. The architecture leverages medical-specific augmentation techniques, such as controlled adjustments to image brightness and color, ensuring the preservation of critical diagnostic information while enhancing model robustness. Through the integration of multiple CNN architectures, including EfficientNetV2, ResNet-152, MobileNetV2, and ResNet-50, our approach captures a comprehensive range of features from dermatoscopic images. Additionally, the incorporation of Principal Component Analysis (PCA) for tabular data effectively reduces dimensionality, focusing the model’s learning on the most salient features and thereby improving efficiency.

Our method demonstrated superior performance on the test set, outperforming existing models in terms of accuracy, recall, precision, and F1 scores. These results highlight the efficacy of our integrated approach in handling the complexity of skin lesion diagnostics, providing an effective tool for improving diagnostic accuracy in clinical settings. The use of multiple inputs allows our model to leverage both visual and contextual data, offering a more holistic analysis and reducing the likelihood of misdiagnosis. While our model demonstrates substantial improvements in skin lesion classification, it does require significant computational resources, which could limit its deployment in settings with restricted technological infrastructure. Addressing these limitations, future work will explore the potential for simplifying the model’s architecture without compromising its accuracy, aiming to make it more accessible and practical for use in varied clinical environments.

Future work with a focus on implementing XAI techniques would allow us to provide clear insights into the decision-making processes of our model, increasing trust and understanding among clinicians. This transparency is crucial for clinical acceptance and could help in identifying potential biases or errors in AI’s reasoning. Further research into advanced image processing techniques, such as super-resolution and noise reduction, could help in improving the quality of the input images. Enhanced image clarity might lead to better model performance, particularly in identifying subtle features of early-stage lesions. Also, developing more lightweight models that maintain high accuracy while being deployable on mobile devices or in remote areas could significantly expand the accessibility of advanced diagnostic tools. This is especially important in regions with limited medical infrastructure, where such tools can provide critical support to healthcare providers.

Author Contributions

This research was designed and written by R.M.; conceptualization, R.M. and D.M.; methodology, D.M. and R.M.; software, D.M.; writing—original draft preparation, D.M.; supervision and other contributions, Y.-I.C. All authors have read and agreed to the published version of the manuscript.

Funding

This paper is supported by Korean Agency for Technology and Standard under Ministry of Trade, Industry and Energy in 2023, project number is 1415181638 and the title is “Establishment of standardization basis for BCI and AI Interoperability”, project number is 1415180835 and the title is “Development of International Standard Technologies based on AI Learning and Inference Technologies”, and by the Gachon University Research Fund 2023 (GCU-202300770001).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here: https://www.kaggle.com/datasets/andrewmvd/isic-2019 (accessed on 7 May 2024).

Acknowledgments

Madinakhon Rakhmonova, the first author, extends heartfelt thanks and deep appreciation to her supervisor, Young Im Cho of Gachon University, for her invaluable support, insights, and active involvement throughout this research. Additionally, the authors are grateful to the Academic editor and reviewers whose constructive feedback significantly enhanced the quality and clarity of this paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

World Health Organization. Skin Cancer. International Agency for Research on Cancer (IARC). Available online: https://www.iarc.who.int/cancer-type/skin-cancer/ (accessed on 8 May 2024).
Vlachos, C.; Tziortzioti, C.; Bassukas, I.D. Paraneoplastic Syndromes in Patients with Keratinocyte Skin Cancer. Cancers 2022, 14, 249. [Google Scholar] [CrossRef] [PubMed]
Tang, H.; Huang, H.; Liu, J.; Zhu, J.; Gou, F.; Wu, J. AI-Assisted Diagnosis and Decision-Making Method in Developing Countries for Osteosarcoma. Healthcare 2022, 10, 2313. [Google Scholar] [CrossRef] [PubMed]
Ravi, V. Attention Cost-Sensitive Deep Learning-Based Approach for Skin Cancer Detection and Classification. Cancers 2022, 14, 5872. [Google Scholar] [CrossRef] [PubMed]
Bechelli, S.; Delhommelle, J. Machine Learning and Deep Learning Algorithms for Skin Cancer Classification from Dermoscopic Images. Bioengineering 2022, 9, 97. [Google Scholar] [CrossRef] [PubMed]
Mukhtorov, D.; Rakhmonova, M.; Muksimova, S.; Cho, Y.-I. Endoscopic Image Classification Based on Explainable Deep Learning. Sensors 2023, 23, 3176. [Google Scholar] [CrossRef] [PubMed]
Raza, A.; Ayub, H.; Khan, J.A.; Ahmad, I.; Salama, A.S.; Daradkeh, Y.I.; Javeed, D.; Ur Rehman, A.; Hamam, H. A Hybrid Deep Learning-Based Approach for Brain Tumor Classification. Electronics 2022, 11, 1146. [Google Scholar] [CrossRef]
Doniyorjon, M.; Madinakhon, R.; Shakhnoza, M.; Cho, Y.-I. An Improved Method of Polyp Detection Using Custom YOLOv4-Tiny. Appl. Sci. 2022, 12, 10856. [Google Scholar] [CrossRef]
Fang, S.; Zhang, B.; Hu, J. Improved Mask R-CNN Multi-Target Detection and Segmentation for Autonomous Driving in Complex Scenes. Sensors 2023, 23, 3853. [Google Scholar] [CrossRef]
Dabija, A.; Kluczek, M.; Zagajewski, B.; Raczko, E.; Kycko, M.; Al-Sulttani, A.H.; Tardà, A.; Pineda, L.; Corbera, J. Comparison of Support Vector Machines and Random Forests for Corine Land Cover Mapping. Remote Sens. 2021, 13, 777. [Google Scholar] [CrossRef]
Uyanık, H.; Şentürk, E.; Akpınar, M.H.; Ozcelik, S.T.A.; Kokum, M.; Freeshah, M.; Sengur, A. A Multi-Input Convolutional Neural Networks Model for Earthquake Precursor Detection Based on Ionospheric Total Electron Content. Remote Sens. 2023, 15, 5690. [Google Scholar] [CrossRef]
Dweekat, O.Y.; Lam, S.S. Cervical Cancer Diagnosis Using an Integrated System of Principal Component Analysis, Genetic Algorithm, and Multilayer Perceptron. Healthcare 2022, 10, 2002. [Google Scholar] [CrossRef] [PubMed]
Popescu, D.; El-khatib, M.; Ichim, L. Skin Lesion Classification Using Collective Intelligence of Multiple Neural Networks. Sensors 2022, 22, 4399. [Google Scholar] [CrossRef] [PubMed]
Maurya, R.; Singh, S.K.; Maurya, A.K.; Kumar, A. GLCM and Multi Class Support vector machine based automated skin cancer classification. In Proceedings of the 2014 International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, 5–7 March 2014; pp. 444–447. [Google Scholar] [CrossRef]
Batista, L.G.; Bugatti, P.H.; Saito, P.T.M. Classification of Skin Lesion through Active Learning Strategies. Comput. Methods Programs Biomed. 2022, 226, 107122. [Google Scholar] [CrossRef] [PubMed]
Srinivasu, P.N.; SivaSai, J.G.; Ijaz, M.F.; Bhoi, A.K.; Kim, W.; Kang, J.J. Classification of Skin Disease Using Deep Learning Neural Networks with MobileNet V2 and LSTM. Sensors 2021, 21, 2852. [Google Scholar] [CrossRef]
Milanteva, S.; Olyunina, V.; Milantevaa, N.; Bykova, I.; Bessmertnya, I. Skin Lesion Analysis Using Ensemble of CNN with Dermoscopic Images and Metadata. In Proceedings of the 12th Majorov International Conference on Software Engineering and Computer Systems, Online, Saint Petersburg, Russia, 10–11 December 2020. [Google Scholar]
Wu, J.; Hu, W.; Wang, Y.; Wen, Y. A Multi-Input CNNs with Attention for Skin Lesion Classification. In Proceedings of the 2020 IEEE International Conference on Smart Cloud (SmartCloud), Washington, DC, USA, 6–8 November 2020; pp. 78–83. [Google Scholar] [CrossRef]
Kassem, M.A.; Hosny, K.M.; Fouad, M.M. Skin Lesions Classification into Eight Classes for ISIC 2019 Using Deep Convolutional Neural Network and Transfer Learning. IEEE Access 2020, 8, 114822–114832. [Google Scholar] [CrossRef]
Gessert, N.; Nielsen, M.; Shaikh, M.; Werner, R.; Schlaefer, A. Skin lesion classification using ensembles of multi-resolution EfficientNets with meta data. MethodsX 2020, 7, 100864. [Google Scholar] [CrossRef]
Kaur, R.; GholamHosseini, H.; Sinha, R.; Lindén, M. Melanoma Classification Using a Novel Deep Convolutional Neural Network with Dermoscopic Images. Sensors 2022, 22, 1134. [Google Scholar] [CrossRef] [PubMed]
Tummala, S.; Thadikemalla, V.S.G.; Kadry, S.; Sharaf, M.; Rauf, H.T. EfficientNetV2 Based Ensemble Model for Quality Estimation of Diabetic Retinopathy Images from DeepDRiD. Diagnostics 2023, 13, 622. [Google Scholar] [CrossRef]
Li, X.; Du, J.; Yang, J.; Li, S. When Mobilenetv2 Meets Transformer: A Balanced Sheep Face Recognition Model. Agriculture 2022, 12, 1126. [Google Scholar] [CrossRef]
Kumar, D.; Sharma, S.; Mishra, M.P. Unimodal biometric identification system on Resnet-50 residual block in deep learning environment fused with serial fusion. Glob. J. Enterp. Inf. Syst. 2023, 15, 40–49. Available online: https://gjeis.com/index.php/GJEIS/article/view/707 (accessed on 10 June 2024).
Jegorowa, A.; Kurek, J.; Kruk, M.; Górski, J. The Use of Multilayer Perceptron (MLP) to Reduce Delamination during Drilling into Melamine Faced Chipboard. Forests 2022, 13, 933. [Google Scholar] [CrossRef]
Zhang, Z. Improved Adam Optimizer for Deep Neural Networks. In Proceedings of the 2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS), Banff, AB, Canada, 4–6 June 2018; pp. 1–2. [Google Scholar] [CrossRef]
Orozco-Arias, S.; Piña, J.S.; Tabares-Soto, R.; Castillo-Ossa, L.F.; Guyot, R.; Isaza, G. Measuring Performance Metrics of Machine Learning Algorithms for Detecting and Classifying Transposable Elements. Processes 2020, 8, 638. [Google Scholar] [CrossRef]

Figure 1. Example of eight classes from ISIC 2019 dataset.

Figure 2. (a) Original dataset with columns C1 to C11, representing variables such as age, anatomic site, general categories, and sex. (b) After PCA with 3 components.

Figure 3. Effects of image transformations on medical skin lesion images.

Figure 4. Overall architecture of multi-input CNN model.

Figure 5. Training accuracy, training loss, validation accuracy, and validation loss in four alternative CNN models combined with MLP as the multi-input model.

Figure 6. Confusion matrix for skin lesion classification model.

Table 1. Key features and best-use cases of CNN models used in the hybrid skin lesion classification model.

Model	Key Features	Best-Use Cases
ResNet-152	Deep with 152 layers, uses residual learning with skip connections	Suitable for very deep learning without performance loss, ideal for complex image classification tasks
MobileNetV2	Inverted residuals, lightweight depthwise convolutions	Optimized for mobile and resource-constrained environments, balancing speed and accuracy
ResNet-50	Shallower than ResNet-152, employs residual learning	Offers a good balance between depth and efficiency, used widely in both academia and industry
EfficientNetV2	Balances model scaling, uses compound scaling for efficiency	Designed for high efficiency and accuracy across various devices, optimized for real-world applications

Table 2. Hyperparameters of multi-input model.

Parameter	Value
Optimizer	Adam, SGD
Learning Rate	0.01, 0.001
Batch Size	256
Number of Epochs	50
Loss Function	Cross-Entropy Loss
Number of worker	8
PCA Components	5, 3

Table 3. Number of images per class for training, validation, and test set.

Class Name	Training	Validation	Test
Melanoma	3160	877	400
Melanocytic Nevus	8728	2525	1305
Basal Cell Carcinoma	2343	617	357
Actinic Keratosis	610	184	70
Benign Keratosis	1793	547	258
Dermatofibroma	171	48	20
Vascular Lesion	171	62	20
Squamous Cell Carcinoma	449	119	60

Table 4. Detailed results for the CNN models.

Model Name	Training Accuracy	Training Loss	Validation Accuracy	Validation Loss
ResNet-50	98.017	0.12	96.123	0.09
MobileNet-v2	96.3	0.051	95.36	0.045
EfficientNet-v2	93.97	0.175	92.45	0.126
ResNet-152	99.016	0.065	98.9	0.004

Table 5. Comparative performance of the model with and without PCA on test set.

Model Configuration	Accuracy	Recall	Precision	F1 Score
With PCA (3 Components)	98.7%	99.19%	98.76%	98.91%
Without PCA	97.5%	98.0%	97.8%	97.9%

Table 6. Comparison of different methods used for the classification of skin lesions.

Source	Methodology	Accuracy	Recall	Precision	F1 Score	Advantages and Limitations
J. Wu et al. [18]	Incorporating multi-input streams and an attention mechanism to address challenges such as class similarity and foreground–background imbalances. The model integrates downsampling for global context, and targeted local views enhanced by CAM (Class Activation Mapping) to rectify imbalances, which are then processed through Dense Blocks for feature consolidation.	88.4%	76.7%	96.3%	N/A	The model allows for a nuanced understanding of both detailed and global image features, but the integration of multiple data inputs and an advanced attention mechanism might complicate the model tuning and increase the difficulty in achieving optimal performance without extensive hyperparameter optimization.
M. A. Kassem et al. [19]	The model leveraging transfer learning with the pretrained GoogleNet architecture.	94.92%	79.8%	97%	80.36%	The use of a pretrained GoogleNet in our model accelerates feature extraction and training, leveraging transfer learning for rapid, high-accuracy convergence. However, reliance on GoogleNet’s architecture may restrict adaptability to new or complex datasets, requiring significant adjustments for optimal performance.
Gessert et al. [20]	Ensemble of deep learning models; addresses class imbalances and multiple input resolutions with a data-driven approach.	N/A	71.7%	N/A	N/A	The ensemble approach, combining models like EfficientNets, SENet, and ResNeXt WSL, allows for a diverse capture of image features, enhancing the model’s ability to generalize across different types of skin lesions.
Milanteva et al. [17]	A multi-faceted deep learning approach is employed for skin lesion classification, where each image is first segmented using R2U-Net to effectively isolate the lesion area. The segmented images are then processed using a suite of convolutional neural networks including EfficientNetB0-B7, SENet-154, ResNeXt-101 32x4d, and Inception-ResNet-v2 to enhance feature extraction and classification accuracy.	96.86%	85%	82%	83%	The methodology benefits from comprehensive segmentation and the use of an ensemble of powerful CNN architectures, enhancing the model’s ability to handle diverse and complex image characteristics. Method did not use metadata.
Our Method	Novel multi-input CNN model leveraging medical-specific augmentation and PCA for metadata, focusing on a comprehensive range of features from dermatoscopic images.	98.7%	99.19%	98.76%	98.91%	Our hybrid method employs both image and tabular data, integrated through our custom data augmentation techniques to enhance accuracy.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Madinakhon, R.; Mukhtorov, D.; Cho, Y.-I. Integrating Principal Component Analysis and Multi-Input Convolutional Neural Networks for Advanced Skin Lesion Cancer Classification. Appl. Sci. 2024, 14, 5233. https://doi.org/10.3390/app14125233

AMA Style

Madinakhon R, Mukhtorov D, Cho Y-I. Integrating Principal Component Analysis and Multi-Input Convolutional Neural Networks for Advanced Skin Lesion Cancer Classification. Applied Sciences. 2024; 14(12):5233. https://doi.org/10.3390/app14125233

Chicago/Turabian Style

Madinakhon, Rakhmonova, Doniyorjon Mukhtorov, and Young-Im Cho. 2024. "Integrating Principal Component Analysis and Multi-Input Convolutional Neural Networks for Advanced Skin Lesion Cancer Classification" Applied Sciences 14, no. 12: 5233. https://doi.org/10.3390/app14125233

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Integrating Principal Component Analysis and Multi-Input Convolutional Neural Networks for Advanced Skin Lesion Cancer Classification

Abstract

1. Introduction

1.1. Problem Description and Motivation

1.2. Recent Advances in Medical Imaging

1.3. Contribution

2. Related Work

3. Method and Materials

3.1. Dataset

3.2. Preprocessing

3.2.1. Cleaning of Metadata

3.2.2. One-Hot Encoding of Metadata

3.2.3. Principal Component Analysis (PCA)

3.2.4. Medical Image Augmentation

3.3. Methodology

3.3.1. CNN Models

3.3.2. Tabular Data Processing with MLP and PCA

3.3.3. Prediction Mechanisms of Multi-Input CNN Model

3.3.4. Impact of PCA to Tabular Data

4. Implementation

4.1. Hyperparameters

4.2. Computational Resource

4.3. Metrics and Results

4.4. Training and Validation

4.5. Comparison with Related Methods

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI