Skin Cancer Classification Using Fine-Tuned Transfer Learning of DENSENET-121

Bello, Abayomi; Ng, Sin-Chun; Leung, Man-Fai

doi:10.3390/app14177707

Open AccessArticle

Skin Cancer Classification Using Fine-Tuned Transfer Learning of DENSENET-121

by

Abayomi Bello

,

Sin-Chun Ng

^*

and

Man-Fai Leung

School of Computing and Information Science, Faculty of Science and Engineering, Anglia Ruskin University, Cambridge CB1 1PT, UK

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(17), 7707; https://doi.org/10.3390/app14177707

Submission received: 15 July 2024 / Revised: 19 August 2024 / Accepted: 23 August 2024 / Published: 31 August 2024

Download

Browse Figures

Versions Notes

Abstract

:

Skin cancer diagnosis greatly benefits from advanced machine learning techniques, particularly fine-tuned deep learning models. In our research, we explored the impact of traditional machine learning and fine-tuned deep learning approaches on prediction accuracy. Our findings reveal significant improvements in predictability and accuracy with fine-tuning, particularly evident in deep learning models. The CNN, SVM, and Random Forest Classifier achieved high accuracy. However, fine-tuned deep learning models such as EfficientNetB0, ResNet34, VGG16, Inception _v3, and DenseNet121 demonstrated superior performance. To ensure comparability, we fine-tuned these models by incorporating additional layers, including one flatten layer and three densely interconnected layers. These layers play a crucial role in enhancing model efficiency and performance. The flatten layer preprocesses multidimensional feature maps, facilitating efficient information flow, while subsequent dense layers refine feature representations, capturing intricate patterns and relationships within the data. Leveraging LeakyReLU activation functions in the dense layers mitigates the vanishing gradient problem and promotes stable training. Finally, the output dense layer with a sigmoid activation function simplifies decision making for healthcare professionals by providing binary classification output. Our study underscores the significance of incorporating additional layers in fine-tuned neural network models for skin cancer classification, offering improved accuracy and reliability in diagnosis.

Keywords:

deep learning; transfer learning; skin lesions; VGG16; Resnet; Densenet; Inceptionnet; CNN

1. Introduction

One of the most widespread cancers globally is skin cancer, significantly impacting the quality of human life. The primary cause is excessive exposure of the skin to ultraviolet (UV) light from the sun. Melanocytes, the pigment-producing cells present in the skin’s top layer, are the origins of melanoma, a relatively common form of cancer. These cells can undergo malignant mutations due to damage from UV light exposure, both from the sun and tanning beds. However, melanoma can also manifest in areas of the body not typically exposed to sunlight [1]. Nonmelanoma pertains to skin tumors originating in the epidermis, the outermost skin layer, and the affected skin region is often referred to as a lesion. Every year, the global count of reported nonmelanoma skin cancer cases ranges between 2 and 3 million [2] Cutaneous melanoma ranks as the 17th most prevalent cancer worldwide. Overall, it stands as the 15th most common cancer, holding the 13th position among men and the 15th among women [3].

Skin cells undergo continuous processes of growth and aging on a daily basis. Nevertheless, there are instances where the normal progression of skin growth becomes disrupted, leading to the development of surplus cells, referred to as carcinomas. Among the diverse array of skin cancer types, melanoma emerges as the most malignant and severe variant, significantly contributing to cutaneous mortality rates [4]. Both melanoma and nonmelanoma skin tumors can frequently result in fatalities, as highlighted by [5]. While it holds true that the majority of skin tumors globally are benign and possess a low probability of evolving into skin cancer, it is crucial to recognize the existence of potentially perilous malignant skin tumors. If left untreated, these tumors can spread throughout the body and lead to the demise of the host [6].

Doctors utilize a range of methods for skin cancer detection [7]. Visual assessment serves as an initial approach to gauge the likelihood of the disease. To assist in the preliminary screening of melanoma, the American Center for the Study of Dermatology has developed the ABCD guide, which stands for asymmetry, border, color, and diameter [8]. This guide aids doctors in evaluating the morphology of suspected melanomas. When a potentially concerning skin lesion is identified, a biopsy is conducted on the visible anomaly. Subsequently, the biopsy sample is scrutinized under a microscope to ascertain its benign or malignant nature, as well as to identify the specific type of skin cancer [9]. This methodology faces several challenges, such as the inability to discern lesion characteristics due to hair presence and the resemblance between cancerous and noncancerous lesions. Additionally, certain communities lack access to adequate healthcare resources.

In diverse literature, skin cancer classification has incorporated various machine learning techniques, including support vector machines (SVM), neural networks, naïve Bayes classifiers, and decision trees. These methodologies have effectively classified distinct skin cancer types. One limitation of machine learning approaches is their dependence on manually engineered features. However, over the past decade, transfer learning methods have gained prominence due to their capacity for automated feature extraction. As a result, they have found widespread application in research studies [10]. The aim of this study is to develop an automated system for the classification of skin lesion images, distinguishing between benign and potentially cancerous cases. Our approach stands apart from similar studies due to the following distinctive features:

We employ a transfer learning strategy, utilizing the Densenet architecture, to classify skin cancer as malignant or benign.
Enhancing the Densenet model, we introduce a flatten layer, two dense layers with LeakyReLU activation functions, and an output dense layer with a sigmoid activation function. These modifications aim to boost model accuracy.
Our methodology incorporates preprocessing techniques, including data augmentation to balance the dataset, the removal of hair, and the adjustment of image lighting. These steps are undertaken to further enhance model accuracy.
Through utilizing the pretrained Densenet model, which dynamically extracts features from images, and subsequently passing them through the flatten and dense layers for prediction, we achieve an impressive accuracy of 87% and F-measure scores of 87%.

2. Literature Review

It has been discovered that an expert dermatologist typically follows a process that begins with visual inspection of potential lesions with the unaided eye, then dermoscopy, which magnifies lesions under a microscope, after which biopsy can be undertaken. This approach is subjective and prone to errors. Automated approaches like the use of AI are desired to enhance accuracy and efficiency in diagnosis.

AI encompasses the discipline of crafting intelligent machines, particularly sophisticated computer programs. An alternative definition of AI involves the emulation or recreation of cognitive processes through computer systems, endowing them with the capability for logical reasoning and behavior akin to humans. The prevalence of AI in scholarly discourse escalated after the 1950s. Industries spanning communication, information technology, healthcare, agriculture, logistics, education, and aviation are among the many domains that leverage AI [11,12,13,14,15]. Transfer learning in machine learning refers to the practice of repurposing a model that has previously been trained on a distinct problem. This approach enables a computer to leverage its comprehension of one task to enhance its ability to make generalizations about another. Transfer learning involves employing the early and intermediate layers of a model, while only retraining the latter layers. This technique proves valuable in capitalizing on the labeled data from the original task on which the model was trained [16].

The study conducted by [17] employed a MobileNet model pretrained on around 1,280,000 images from the 2014 ImageNet Challenge. This model was subsequently fine-tuned using transfer learning on a dataset comprising 10,015 dermoscopy images from the HAM10000 dataset. The outcome of this approach yielded an impressive overall accuracy of 83.1%. Brinker et al. [18] employed a pretrained architecture called ResNet50 for classifying skin lesions as either melanoma or nevi. The model they proposed attained sensitivity and specificity rates of 77.9% and 82.3%, respectively. Esteva et al. [19] employed a pretrained model known as Inception-v3 for the classification of skin lesions. They expanded the testing process, and the resulting classification model achieved an accuracy of 71.2%.

Milton used transfer learning methods with the HAM10000 dataset. The pretrained model chosen, PNASNet-5-Large, has an accuracy of 76%. The HAM10000 dataset is uneven, with a notable difference in the overall amount of photos for each class, which is crucial to highlight. It is difficult to generalize the characteristics of the lesions because of this imbalance [20]. A customized CNN model was created by Nugroho [21] and trained on the HAM10000 dataset, When it came to categorizing the skin lesions in the dataset, the model had a 78% accuracy rate. In [22], authors utilized a pre-trained MobileNet model to build their own model. They conducted their research using the HAM10000 dataset. Impressively, they achieved a categorical accuracy of up to 80.81%. In [23], authors proposed a method to improve the quality of images for extracting Regions of Interest (ROIs). They utilized a deep residual model for the classification of these images. The results showed that their system achieved an accuracy of 85.5%. Codella et al. [24] proposed utilizing a pre-trained CNN (ImageNet) to distinguish between healthy and melanoma images by extracting high-level feature representations.

In a study by [25], a deep learning-based classification model was developed for pigmented skin lesions. The model’s training employed datasets encompassing six classes, including malignant tumors such as malignant melanoma and basal cell carcinoma, as well as benign tumors like nevus, seborrheic keratosis, senile lentigo, and hematoma/hemangioma. The researchers utilized a Faster Region-Based Convolutional Neural Network (FRCNN) for training and evaluating the dataset. The test outcomes demonstrated an accuracy of 86.2% in classifying skin lesions across the six categories. The focus of [26] proposed methods lies in the precise detection and classification of skin cancer, employing a specialized implementation of deep learning techniques. The approaches detailed within this paper involve the comprehensive analysis of datasets, notably MNIST: HAM10000, featuring an assortment of seven distinct skin lesion types comprising a sample set of 10,015, and the PH2 dataset, encompassing 200 skin lesion images. These methods encompass vital elements such as data augmentation while training the model utilizing prominent deep learning architectures like mobilenet and VGG-16 [27]. The ensuing accuracy attained stands at 81.52% through the application of mobilenet and 80.07% through the utilization of VGG-16.

By reviewing several papers within the literature, it becomes evident that previous research has made significant progress in understanding skin lesion classification. However, there remain certain limitations, particularly in achieving consistently high levels of accuracy in predicting whether lesions are benign or malignant. To address this challenge, our research employs a transfer learning approach by fine-tuning pretrained convolutional neural network models. Transfer learning has emerged as a promising technique in various domains, leveraging the knowledge learned from one task to improve performance on another. By fine-tuning pretrained models on skin lesion images, we aim to capitalize on the wealth of information already captured by these networks, thereby enhancing the accuracy of lesion classification.

This approach offers several advantages, including reduced training time and the ability to work effectively with smaller datasets. Moreover, by leveraging the rich feature representations learned from large-scale image datasets, we anticipate that our model will exhibit improved generalization and robustness, leading to more accurate predictions of skin lesion malignancy.

Overall, through the application of transfer learning via fine-tuning, our research holds the promise of significantly advancing the field of skin lesion classification, ultimately contributing to improved diagnostic accuracy and patient care.

3. Materials and Methods

3.1. Data Collection and Preprocessing

The dataset used in this study is a combination of two subsets: the Human Against Machine (HAM10000) and the International Skin Imaging Collaboration (ISIC 2020), both obtained from Kaggle. From this combined dataset, we selected a total of 3297 RGB images, consisting of 1800 labeled as Benign and 1497 labeled as Malignant with each image having a dimension of

(224 \times 224 \times 3)

[28,29]. The dataset was split into training, testing, and validation sets as follows: the training set has dimensions (2818, 224, 224, 3), the test set has dimensions (330, 224, 224, 3), and the validation set has dimensions (149, 224, 224, 3), as shown in Table 1 below.

In Figure 1, the image shows a sample of a Benign and Malignant image before being preprocessed.

3.2. Data Augmentation

Since deep learning requires a large training dataset for good accuracy, we performed data augmentation by generating augmented samples and labels, appending them to the original data to increase the number of training samples. We set a rotation range of 90 degrees, used a width and height shift range of 0.15, and allowed images to be flipped horizontally and vertically. We set the brightness range within 0.8 to 1.1 and set the fill mode to nearest, ensuring that when applying transformations, any areas needing filling due to shifts or rotation are filled using the nearest available pixels. Subsequently, we added Gaussian noise for variability and replaced the original background with a synthetic one to enhance the dataset for model training. The augmented image is shown in Figure 2 below.

Table 2 below shows the training dataset count before and after augmentation. The training dataset count before augmentation was 2818, and after augmentation, it increased to 5636. This substantial increase in the dataset size demonstrates the effectiveness of data augmentation in enriching the training set and improving the robustness of the deep learning model. By introducing variations in the training samples through augmentation techniques such as rotation, shifting, flipping, brightness adjustment, and gamma correction, we ensure that the model learns to generalize well to different variations and scenarios it may encounter during inference. This not only enhances the model’s performance but also helps mitigate overfitting, leading to more reliable and accurate predictions in real-world applications.

3.3. Model Flow Chart

Figure 3 shows the flow chart of the different models that were used in this study. The images were loaded from the drive, after which they were split. A total of 83% were used for training, 10% for testing, and 7% for validation. However, there was a need for more training images to be generated. The image generator was used to produce additional images. Subsequently, a normalization technique was applied to ensure that every image was divided by 255 before being fed into the different models used to predict whether a skin lesion is benign or malignant.

3.3.1. Convolutional Neural Network (CNN)

CNNs, which stands for convolutional neural networks, have a wide range of applications in image processing tasks such as image recognition and feature categorization. A CNN’s design typically includes three types of layers: convolutional layers, pooling layers, and fully linked layers. The model parameters are shown in Table 3.

The CNN model underwent training for 20 epochs to expedite convergence while effectively preventing overfitting, This approach allowed the model to achieve commendable accuracy even without extensive training epochs. It utilized the Adam optimizer with a batch size of 64 and incorporated a sigmoid activation function in the output dense layer. Additionally, binary cross-entropy served as the designated loss function, as illustrated in Table 4.

3.3.2. Support Vector Machine (SVM)

SVM is a supervised machine learning model used to classify a dataset into two classes by constructing a hyperplane. This hyperplane acts as a boundary, similar to a dividing line or surface, separating the data points. The prediction of the SVM model relies on the segmentation data obtained from this hyperplane. Due to its simplicity and effectiveness, SVM is highly beneficial for working with large amounts of data. The SVM classifier was trained using the Radial Basis Function (RBF) kernel, which is known to capture the nonlinear relationship between the input features and the target classes.

3.3.3. Random Forest Classifier

The Random Forest Classifier is a mathematical tool that makes collective predictions using a versatile classification algorithm. It achieves this by employing an ensemble of decision trees trained using the bootstrap approach and incorporating randomization during tree growth. During tree construction, the algorithm selects the best characteristics from a randomly chosen group of features. The Random Forest Classifier was trained with 100 estimators and a random state of 42.

3.3.4. Fine-Tuned VGG-16

VGG16, an architecture proposed by K. Simonyan and A. Zisserman from the University of Oxford in their paper titled “Very Deep Convolutional Networks for Large-Scale Image Recognition”, is a convolutional neural network model. This model gained significant recognition after its submission to ILSVRC-2014. It introduced improvements over the AlexNet model by replacing large kernel-sized filters (11 and 5 in the first and second convolutional layers, respectively) with a series of multiple 3 × 3 kernel-sized filters. VGG16 underwent several weeks of training using NVIDIA Titan Black GPUs. For this study, the model was then fine-tuned by adding one flatten layer and three dense layers. The first dense layer has 32 neurons with a LeakyReLU activation function, while the second dense layer has 16 neurons with a LeakyReLU activation function, and the output dense layer has one neuron with a sigmoid activation function, as shown in Table 5. The pretrained VGG16 extracted features from the input image and transferred them to the layers to classify the skin lesion as either Benign or Malignant. Traditionally, the VGG-16 model is designed to accept input images of size 224 × 224 pixels. For our fine-tuned VGG-16 model, we adhere to this convention by setting the input size to 224 × 224 pixels.

The model parameters are shown in Table 5.

The fine-tuned VGG16 model underwent training for 20 epochs using the Adam optimizer with a batch size of 64. The activation function employed for the output dense layer was sigmoid, and binary cross-entropy served as the designated loss function.

3.3.5. Fine-Tuned ResNet-34

ResNet, a revolutionary architecture, was formulated by Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun from Microsoft Research. It introduces a residual learning framework that facilitates training deeper neural networks.

The ResNet architecture consists of multiple layers, typically divided into several blocks. Each block contains a set of convolutional layers followed by batch normalization and ReLU activation. The key innovation of ResNet lies in the introduction of residual connections or skip connections. These connections allow the network to learn residual functions by adding the original input of a layer to its output. This way, the network can focus on learning the residual mapping, making it easier to train deeper models.

The basic building block of ResNet is the residual block, which includes two convolutional layers with smaller filter sizes (usually 3 × 3) and the skip connection. In addition to the residual blocks, ResNet also incorporates pooling layers, fully connected layers, and a softmax layer for classification.

By leveraging the residual connections, ResNet models can achieve remarkable depths, with variants ranging from 18 layers to over 150 layers. The specific configuration used in the original ResNet paper is ResNet-152, which consists of 152 layers [30]. It was then fine-tuned by adding one flatten layer and three dense layers. The first dense layer has 32 neurons with a LeakyReLU activation function, while the second dense layer has 16 neurons with a LeakyReLU activation function, and the output dense layer has one neuron with an activation function of sigmoid. The pretrained ResNet-34 extracted features from the input image and transferred them to the layers to classify the skin lesion as either benign or malignant. The model parameters are shown in Table 6.

The fine-tuned ResNet-34 model underwent training for 20 epochs using the Adam optimizer with a batch size of 64. The activation function employed for the output dense layer was sigmoid, and binary cross-entropy served as the designated loss function.

3.3.6. Fine-Tuned DenseNet-121 (Proposed Model)

In the realm of skin cancer classification, transfer learning techniques offer a compelling advantage by leveraging pretrained neural network models, such as DenseNet-121, ResNet-34, and VGG-16, which have been trained on large datasets like ImageNet. This approach capitalizes on the knowledge learned from these diverse datasets, enabling the models to extract meaningful features from skin lesion images and thus improve classification accuracy [31]. By transferring the learned representations to the task of skin cancer classification, these models can effectively distinguish between benign and malignant lesions, contributing to early detection and improved patient outcomes.

Earlier techniques in skin cancer classification have been constrained by several limitations. Traditional machine learning approaches often rely on handcrafted features, which may not adequately capture the complex patterns present in skin lesion images. Additionally, these methods may struggle with scalability and generalization to unseen data due to their fixed feature extraction processes [32].

DenseNet-121 adopts an approach wherein each layer is connected to every other layer in a forward manner. Unlike traditional convolutional networks, which have only one connection between each layer and its subsequent layer, DenseNet-121 forms direct connections that total

L (L + 1) / 2

, where L is the number of layers. In DenseNet-121, each layer receives the feature maps from all preceding layers as inputs and also passes its own feature maps to all subsequent layers. This design choice brings several advantages. Firstly, it mitigates the vanishing-gradient problem that can hinder the training of deep networks. Secondly, it promotes effective feature propagation throughout the network. Thirdly, it encourages feature reuse, enabling efficient information flow. Finally, it significantly reduces the number of parameters, making it a computationally efficient architecture [33]. It was then fine-tuned by adding one flatten layer and three dense layers. The first dense layer has 32 neurons with a LeakyReLU activation function, while the second dense layer has 16 neurons with a LeakyReLU activation function, and the output dense layer has one neuron with a sigmoid activation function. The model was trained on 1,606,209 parameters and the epochs were reduced to 20; additionally, the DenseNet-121 architecture itself offers regularization benefits by mitigating the vanishing gradient problem and promoting efficient feature reuse throughout the network. These architectural choices, combined with the use of LeakyReLU activation functions in the dense layers, were helpful for faster training and preventing overfitting, allowing the model to achieve better accuracy than if it were trained for more epochs, where overfitting might occur. Additionally, the model takes only a few seconds to run. The pretrained DenseNet-121 extracts features from the input image and transfers them to the layers to classify the skin lesion as either benign or malignant. The model parameters are shown in Table 7.

The fine-tuned DenseNet-121 model underwent training for 20 epochs using the Adam optimizer with a batch size of 64. The activation function employed for the output dense layer was sigmoid, and binary cross-entropy served as the designated loss function.

3.3.7. Fine-Tuned Inception v3

Inception v3 explores methods for convolutional network scaling that prioritize computational effectiveness. The emphasis is on investigating methods like aggressive regularization and factorized convolutions to maximize the effective use of new computer resources. The goal is to balance enhanced quality and computational efficiency, particularly in applications like mobile vision and big data scenarios.

By employing the proposed strategies, remarkable results are achieved in terms of error rates. For single-frame evaluation, the network achieves top-1 and top-5 error rates of 21.2% and 5.6%, respectively, while utilizing a computational cost of 5 billion multiply–adds per inference and maintaining a parameter count of fewer than 25 million. Additionally, leveraging ensemble learning and employing multicrop evaluation further enhances performance, resulting in impressive top-5 error rates of 3.5% on the validation set (3.6% on the test set) and a top-1 error rate of 17.3% on the validation set [34].

It was then fine-tuned by adding one flatten layer and three dense layers. The first dense layer has 32 neurons with a LeakyReLU activation function, while the second dense layer has 16 neurons with a LeakyReLU activation function, and the output dense layer has one neuron with an activation function of sigmoid, as shown in Table 8. The pretrained Inception-v3 extracts features from the input image and transfers them to the layers to classify the skin lesion as either Benign or Malignant. The model parameters are shown in Table 8 below.

The fine-tuned Inception-v3 model underwent training for 20 epochs using the Adam optimizer with a batch size of 64. The activation function employed for the output dense layer was sigmoid, and binary cross-entropy served as the designated loss function.

3.3.8. EfficientNetB0

Within the EfficientNet family of convolutional neural networks (ConvNets), EfficientNet-B0 is a specific model. It aims to strike a good balance between model accuracy and computational efficiency. As the lowest version of the EfficientNet series, EfficientNet-B0 provides a standard architecture for comparison and scalability. Despite its compact size, EfficientNet-B0 delivers competitive performance. It has been trained and evaluated on various image classification tasks, including benchmark datasets like ImageNet. The model demonstrates strong accuracy while maintaining a relatively low computational cost, making it suitable for applications and scenarios with limited resources [35]. It was then fine-tuned by adding one flatten layer, and three dense layers. The first dense layer has 32 neurons, with LeakyReLU activation function, while the second dense layer has 16 neurons with LeakyReLU activation function and an output dense layer with one neuron and activation function of sigmoid, as shown in Table 9. The pretrained Inception-v3 extracted features from the input image and transferred them to the layers to classify the skin lesion as either Benign or Malignant. The model parameters are shown in Table 9.

The fine-tuned EfficientNetB0 model underwent training for 20 epochs using the Adam optimizer with a batch size of 64. The activation function employed for the output dense layer was sigmoid, and binary cross-entropy served as the designated loss function.

4. Results and Discussion

4.1. Performance Indicators

Our research, which involved both traditional machine learning and fine-tuned deep learning approaches, demonstrates that fine-tuning deep learning models can significantly enhance predictability and accuracy. The results reveal varied performance among different classifiers, reflecting their distinct strengths and weaknesses.

As shown in Table 10. The CNN achieved an accuracy of 82%, while SVM reached 83%. These results indicate that traditional machine learning methods, while effective, often fall short compared with advanced deep learning techniques. The Random Forest Classifier recorded an accuracy of 81%, which is slightly lower, suggesting that its ensemble approach may not capture the complex patterns in skin cancer data as effectively as deep learning models as shown in Table 10. EfficientNetB0, a fine-tuned deep learning model, achieved an accuracy of 84% as shown in Table 10. This improvement over CNN and SVM highlights EfficientNetB0’s capability to leverage efficient model architecture for better performance in complex classification tasks. ResNet34 also performed well with an accuracy of 82% as shown in Table 10, showing that while residual networks offer significant advantages, their performance can be highly dependent on specific tuning parameters and dataset characteristics. VGG16 showed notable performance with an accuracy of 85% as shown in Table 10, demonstrating the efficacy of its deep network architecture in learning intricate features from the data. Inception_v3 and DenseNet121 achieved accuracies of 81% and 87%, respectively as shown in Table 10. DenseNet121’s superior accuracy suggests that its dense connectivity pattern effectively facilitates feature reuse and gradient flow, leading to enhanced model performance.

For all fine-tuned models, we introduced a flatten layer followed by three dense layers. The flatten layer transforms multidimensional feature maps into a one-dimensional vector, ensuring efficient data processing. The dense layers refine feature representation, capturing intricate patterns and relationships within the data. The use of LeakyReLU activation functions in the dense layers addresses the vanishing gradient problem and ensures stable training. Finally, the output dense layer with a sigmoid activation function simplifies binary classification, providing clear decision boundaries for healthcare professionals. The incorporation of these additional layers improves the efficiency and performance of the neural network models, leading to enhanced accuracy and reliability in skin cancer classification. The comparative performance of the classifiers underscores the importance of model architecture and tuning in achieving optimal results.

Precision is a metric that evaluates the accuracy of predictions by measuring the proportion of correctly classified positive instances out of all predicted positive instances. It is computed by dividing the number of true positive predictions by the sum of true positive and false positive predictions. A high precision value indicates a well-performed model. Precision is defined as follows:

P r e c i s i o n = \frac{T P}{T P + F P},

(1)

where TP is true positive and FP is false positive.

Recall is the ratio of all positively classified classes with a positive outcome that is correctly predicted. A good model should have a high recall rate. Recall is defined as follows:

R e c a l l = \frac{T P}{T P + F N},

(2)

where FN is a false negative. A high F1-score indicates high precision and recall because the score contains information about these two variables. It is defined as follows:

F 1 = \frac{2 \times Precision \times Recall}{Precision + Recall}

(3)

The ROC curve is a graphical representation of a model’s ability to distinguish between different classes. It plots the True Positive Rate (Sensitivity) against the False Positive Rate (1 − Specificity) at various threshold settings. A perfect model would reach the top-left corner, indicating a high True Positive Rate and a low False Positive Rate.

TPR = \frac{TP}{TP + FN}

(4)

against

FPR = \frac{FP}{FP + TN}

(5)

where the following definitions are used:

TPR: True Positive Rate (Sensitivity or Recall).
FPR: False Positive Rate.
TP: True Positives.
FP: False Positives.
TN: True Negatives.
FN: False Negatives.

AUC is the area under the ROC curve and provides a single metric to summarize the model’s performance. The AUC value ranges from 0 to 1, where 1 indicates a perfect model, 0.5 suggests a model with no discriminative ability (equivalent to random guessing), and 0 indicates a completely wrong model.

AUC = \int_{0}^{1} TPR (FPR) d (FPR)

(6)

where the following definitions are used:

TPR(FPR): The True Positive Rate as a function of the False Positive Rate.
d(FPR): The infinitesimal change in the False Positive Rate.

The Precision–Recall curve is a plot of Precision (Positive Predictive Value) against Recall (Sensitivity) for different threshold values. Precision measures the proportion of true positive predictions among all positive predictions, while Recall measures the proportion of true positives among all actual positives. This curve is especially useful when dealing with imbalanced datasets, where one class is much more common than the other.

The confusion matrix as shown in Figure 4, Figure 5, Figure 6, Figure 7, Figure 8, Figure 9, Figure 10 and Figure 11 are summarized as follows:

CNN: Benign: 141/186 (76%), Malignant: 122/144 (85%) SVM: Benign: 152/186 (82%), Malignant: 123/144 (85%) Random Forest Classifier:Benign: 151/186 (81%), Malignant: 119/144 (83%) Fine-tuned VGG-16:Benign: 153/186 (82%), Malignant: 133/144 (92%) Fine-tuned ResNet-34: Benign: 159/186 (86%), Malignant: 101/144 (70%) Fine-tuned DenseNet-121: Benign: 163/186 (88%), Malignant: 122/144 (85%) Fine-tuned Inception-v3: Benign: 161/186 (87%), Malignant: 108/144 (75%) Fine-tuned EfficientNetB0: Benign: 155/186 (83%), Malignant: 123/144 (85%)

The fine-tuned DenseNet-121 achieved the highest correct classification rate for benign lesions (88%) and performed well on malignant lesions (85%). The fine-tuned VGG-16 also showed strong performance, with 82% for benign lesions and 92% for malignant lesions. ResNet-34, while achieving the highest accuracy for benign lesions (86%), had the lowest performance for malignant lesions (70%). The other models demonstrated competitive performance, with varying strengths in benign and malignant classifications.

Fine-tuned DenseNet-121 has the highest AUC of 0.87, indicating the best overall performance in distinguishing between benign and malignant lesions. Fine-tuned VGG-16 follows closely with an AUC of 0.86, reflecting strong classification ability. SVM and EfficientNetB0 also show strong performance with AUCs of 0.84. ResNet-34 has the lowest AUC of 0.78, suggesting it is less effective in distinguishing between the classes compared with other models, as shown in Figure 12, Figure 13, Figure 14, Figure 15, Figure 16, Figure 17, Figure 18 and Figure 19.

Fine-tuned VGG-16 and DenseNet-121 exhibit strong performance with high precision and recall, making them well suited for applications requiring reliable positive classification. Models like SVM and EfficientNetB0 also show high precision and recall but are slightly lower compared with VGG-16 and DenseNet-121, as shown in Figure 20, Figure 21, Figure 22, Figure 23, Figure 24, Figure 25, Figure 26 and Figure 27.

4.2. Detector Fusion

Detector fusion refers to the combination of different detectors, all of them deciding about the same two hypotheses H1 and H0 with the aim of improving the individual performance. Detector fusion can be broadly classified into hard and soft fusion. In hard fusion, the individual binary decisions are combined to obtain a final decision. In soft fusion, some continuous statistics generated by the detectors are combined to obtain one fused statistics, which are considered to obtain the final decision [36]. In [37], authors proposed separated score integration (SSI), a new method based on alpha integration to perform soft fusion of scores in multiclass classification problems, which inspired the present study. We first convert the probability outputs of each classifier—VGG, ResNet, DenseNet, InceptionNet, EfficientNet, CNN, Random Forest, and SVM—into binary predictions by applying a threshold of 0.5. This converts each probability score into a definitive class label of 0 or 1. Next, we aggregate the binary predictions from all models using a majority voting scheme. Specifically, we average the binary predictions across all models and apply a rounding function to determine the final class label. If more than half of the models predict a class as positive, that class is chosen for the final combined prediction. Finally, we evaluate the accuracy of the combined classifier by comparing its predictions to the true labels, providing a measure of how well the integrated approach performs relative to the individual models. This method aims to leverage the strengths of each model to enhance overall classification accuracy. The fused classifier achieved an accuracy of 0.86, indicating that it correctly classified 86% of the test samples.

4.3. Statistical Significance

The statistical significance to be used in this study are mean Accuracy, Standard Deviation Accuracy, and Confidence Interval.

Mean accuracy is the average performance of the model over multiple tests. If one tests one’s model several times, mean accuracy tells one, on average, how often the model correctly predicts the outcome.

Mean Accuracy = \frac{1}{N} \sum_{i = 1}^{N} {Accuracy}_{i}

(7)

where the following definitions are used:

N is the number of tests or experiments.
${Accuracy}_{i}$ is the accuracy obtained in the i-th test.
$\frac{1}{N}$ averages the sum of accuracies to provide the mean.

Confidence interval gives one a range of values within which one can be fairly sure the true accuracy of the model lies. It is like saying we are 95% confident that the model’s true accuracy is between these two numbers.

CI = Mean Accuracy \pm t \times (\frac{Standard Deviation}{\sqrt{N}})

(8)

where the following definitions are used:

$Mean Accuracy$ is the average accuracy calculated previously.
t is the t-score (or z-score) for the desired confidence level (e.g., 1.96 for a 95% confidence level).
$Standard Deviation$ is the measure of variability in accuracy.
$\sqrt{N}$ is the square root of the number of tests, scaling the standard deviation to reflect sample size.

Standard deviation of accuracy measures how much the accuracy varies across different tests. It tells one whether the model’s performance is consistent or if it fluctuates a lot.

Standard Deviation = \sqrt{\frac{1}{N - 1} \sum_{i = 1}^{N} {({Accuracy}_{i} - Mean Accuracy)}^{2}}

(9)

where the following definitions are used:

$Mean Accuracy$ is the average accuracy calculated previously.
N is the number of tests or experiments.
${Accuracy}_{i}$ is the accuracy obtained in the i-th test.
The term $\frac{1}{N - 1}$ normalizes the sum of squared deviations to account for sample size.

Fine-tuned VGG16: The mean accuracy of 0.87 indicates that, on average, the model correctly classifies 87% of the test samples. The standard deviation of 0.01 reflects very little variability in its performance across different experiments. The 95% confidence interval, ranging from 0.86 to 0.87, suggests that if we repeated the experiments multiple times, the true accuracy would fall within this range 95% of the time, as shown in Table 11.

Fine-tuned ResNET-34: With a mean accuracy of 0.79, RESNET correctly identifies 79% of the test samples on average. The standard deviation of 0.02 shows slightly more variation in performance across tests. The confidence interval between 0.78 and 0.80 indicates that we can be 95% confident the true accuracy lies within this narrow range, as shown in Table 11.

Fine-tuned DenseNET-121: This model also achieves a mean accuracy of 0.87, meaning it performs similarly to VGG16 by correctly classifying 87% of the samples on average. The standard deviation of 0.02 suggests some minor variability. The 95% confidence interval from 0.86 to 0.87 confirms consistent performance, with the true accuracy likely falling within this range, as shown in Table 11.

Fine-tuned Inception_v3: With a mean accuracy of 0.82, this model correctly classifies 82% of the samples on average. The standard deviation of 0.03 indicates greater variability in results. The confidence interval of 0.81 to 0.83 suggests a moderate range where the true accuracy is likely to be found, as shown in Table 11.

Fine-tuned EfficientNETB0: Achieving a mean accuracy of 0.85, this model correctly identifies 85% of the samples on average. The standard deviation of 0.012 indicates very consistent performance across different tests. The confidence interval, from 0.84 to 0.85, suggests that the true accuracy is tightly confined within this range as shown in Table 11.

CNN: With a mean accuracy of 0.83, the CNN correctly classifies 83% of the samples on average. The standard deviation of 0.03 indicates some variability. The confidence interval from 0.85 to 0.87 suggests a range in which the true accuracy likely lies, though this range is broader than some other models, as shown in Table 11.

SVM: This model achieves a mean accuracy of 0.83, correctly identifying 83% of the samples on average. The standard deviation of 0.02 suggests minimal variability. The confidence interval from 0.83 to 0.84 indicates that the true accuracy is likely to fall within this narrow range, as shown in Table 11.

Random Forest Classifier: With a mean accuracy of 0.82, RFC correctly classifies 82% of the test samples on average. The standard deviation of 0.02 shows consistent performance. The confidence interval between 0.81 and 0.82 suggests that the true accuracy is within this range with 95% confidence, as shown in Table 11.

5. Conclusions

This study investigated the efficacy of classification machine learning models and pretrained transfer learning models for skin cancer classification. Among the evaluated models, DenseNet-121 emerged as the most promising choice. Its unique architecture facilitates feature reuse and enables efficient information flow, with each layer receiving feature maps from preceding layers and passing its own feature maps to subsequent ones. By fine-tuning the DenseNet-121 model and incorporating additional layers, a noteworthy accuracy rate of 87% was achieved. This result highlights the suitability of DenseNet-121 for skin cancer classification, emphasizing the value of leveraging its intricate layer connections to enhance performance.

In essence, this study underscores the significance of utilizing advanced convolutional neural network designs such as DenseNet-121 for dermatological tasks like skin cancer classification. The study holds significant managerial implications for healthcare organizations and dermatological practices. Implementing DenseNet-121 as part of a computer-aided diagnosis system could enhance the efficiency and accuracy of skin cancer diagnosis processes. By leveraging DenseNet-121’s unique architecture, which promotes feature reuse and efficient information flow, healthcare professionals can potentially streamline the diagnosis workflow, reduce diagnostic errors, and improve patient outcomes. Healthcare organizations may consider allocating resources toward acquiring and implementing these cutting-edge technologies to enhance their diagnostic capabilities and stay at the forefront of medical innovation. Additionally, ongoing research efforts should focus on exploring alternative models and preprocessing techniques to further improve accuracy and address specific challenges in skin cancer diagnosis. However, it is important to acknowledge that the study faced a limitation of data. The publicly available dataset used in this research did not cover all skin colors, which may limit the generalizability of the findings. Therefore, acquiring more diverse datasets that encompass a broader range of skin colors and ethnicities would be essential to train the model more comprehensively and achieve greater accuracy in real-world applications.

Author Contributions

Conceptualization, A.B., S.-C.N. and M.-F.L.; methodology, A.B., S.-C.N. and M.-F.L.; software, A.B.; validation, A.B., S.-C.N. and M.-F.L.; formal analysis, A.B.; investigation, A.B.; resources, S.-C.N. and M.-F.L.; data curation, A.B.; writing—original draft preparation, A.B.; writing—review and editing, A.B., S.-C.N. and M.-F.L.; visualization, A.B.; supervision, S.-C.N. and M.-F.L.; project administration, S.-C.N.; funding acquisition, S.-C.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used to support the findings of the study are available from the first author upon request. The author’s email address is abayomi_bellow@yahoo.com.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Difference between Melanoma & Nonmelanoma Skin Cancer. Moffitt Cancer Center, 2023. Available online: https://moffitt.org/cancers/skin-cancer-nonmelanoma/faqs/what-is-the-difference-between-melanoma-and-nonmelanoma-skin-cancer/ (accessed on 5 June 2023).
Radiation: Ultraviolet (UV) Radiation and Skin Cancer (No Date) World Health Organization. Available online: https://www.who.int/news-room/questions-and-answers/item/radiation-ultraviolet-(uv)-radiation-and-skin-cancer (accessed on 5 June 2023).
Skin Cancer Statistics: World Cancer Research Fund International. WCRF International, 2022. Available online: https://www.wcrf.org/cancer-trends/skin-cancer-statistics/ (accessed on 5 June 2023).
Vatekar, K.; Phapale, S.; Bhor, A.; Patel, C.; Tiwary, A. Skin cancer prediction using Deep Learning. Int. J. Adv. Res. Sci. Commun. Technol. 2023, 3, 570–574. [Google Scholar] [CrossRef]
Mühr, L.S.A.; Hultin, E.; Dillner, J. Transcription of human papillomaviruses in non-melanoma skin cancers of the immunosuppressed. Int. J. Cancer 2021, 149, 1341–1347. [Google Scholar] [CrossRef]
Vardasca, R.; Magalhaes, C. Towards an effective imaging-based decision support system for skin cancer. In Handbook of Research on Applied Intelligence for Health and Clinical Informatics; IGI Global: Hershey, PA, USA, 2022; pp. 354–382. [Google Scholar]
Larre Borges, A.; Nicoletti, S.; Dufrechou, L.; Nicola Centanni, A. Dermatoscopy in the Public Health Environment. In Dermatology in Public Health Environments; Springer International Publishing: Cham, Switzerland, 2017; pp. 1157–1188. [Google Scholar] [CrossRef]
Chatterjee, S.; Dey, D.; Munshi, S.; Gorai, S. Extraction of features from cross correlation in space and frequency domains for classification of skin lesions. Biomed. Signal Process. Control 2019, 53, 101581. [Google Scholar] [CrossRef]
Yin, W.; Li, Y.-W.; Gu, Y.-Q.; Luo, M. Nanoengineered targeting strategy for cancer immunotherapy. Acta Pharmacol. Sin. 2020, 41, 902–910. [Google Scholar] [CrossRef]
Lu, S.; Lu, Z.; Zhang, Y.D. Pathological brain detection based on AlexNet and transfer learning. J. Comput. Sci. 2019, 30, 41–47. [Google Scholar] [CrossRef]
Yu, F.; Chen, Z.; Jiang, M.; Tian, Z.; Peng, T.; Hu, X. Smart clothing system with multiple sensors based on digital twin technology. IEEE Internet Things J. 2022, 10, 6377–6387. [Google Scholar] [CrossRef]
Bello, A.; Ng, S.-C.; Leung, M.-F. A Bert framework to sentiment analysis of Tweets. Sensors 2023, 23, 506. [Google Scholar] [CrossRef]
Che, H.; Pan, B.; Leung, M.-F.; Cao, Y.; Yan, Z. Tensor Factorization with Sparse and Graph Regularization for Fake News Detection on Social Networks. IEEE Trans. Comput. Soc. Syst. 2023, 11, 4888–4898. [Google Scholar] [CrossRef]
Zhao, T.; He, J.; Lv, J.; Min, D.; Wei, Y. A comprehensive implementation of road surface classification for vehicle driving assistance: Dataset, models, and deployment. IEEE Trans. Intell. Transp. Syst. 2023, 24, 8361–8370. [Google Scholar] [CrossRef]
Yu, F.; Yu, C.; Tian, Z.; Liu, X.; Cao, J.; Liu, L.; Du, C.; Jiang, M. Intelligent wearable system with motion and emotion recognition based on digital twin technology. IEEE Internet Things J. 2024, 11, 26314–26328. [Google Scholar] [CrossRef]
Donges, N. What Is Transfer Learning? Exploring the Popular Deep Learning Approach. Built In. 2022. Available online: https://builtin.com/data-science/transfer-learning (accessed on 9 June 2023).
Chaturvedi, S.S.; Gupta, K.; Prasad, P.S. Skin Lesion Analyser: An Efficient Seven-Way Multi-Class Skin Cancer Classification Using MobileNet. arXiv 2021, arXiv:1907.03220. [Google Scholar]
Brinker, T.J.; Hekler, A.; Enk, A.H.; Berking, C.; Haferkamp, S.; Hauschild, A.; Weichenthal, M.; Klode, J.; Schadendorf, D.; Holland-Letz, T.; et al. Deep neural networks are superior to dermatologists in melanoma image classification. Eur. J. Cancer 2019, 119, 11–17. [Google Scholar] [CrossRef] [PubMed]
Esteva, A.; Kuprel, B.; Novoa, R.A.; Ko, J.; Swetter, S.M.; Blau, H.M.; Thrun, S. Dermatologist-level classification of skin cancer with deep neural networks. Nature 2017, 542, 115–118. [Google Scholar] [CrossRef] [PubMed]
Milton, M.A.A. Automated Skin Lesion Classification Using Ensemble of Deep Neural Networks in ISIC 2018: Skin Lesion Analysis Towards Melanoma Detection Challenge. arXiv 2019, arXiv:1901.10802. [Google Scholar]
Nugroho, A.A.; Slamet, I.; Sugiyanto, S. Skin cancer identification system of HAM10000 skin cancer dataset using convolutional neural network. AIP Conf. Proc. 2019, 2202, 020039. [Google Scholar]
Agrahari, P.; Agrawal, A.; Subhashini, N. Skin Cancer Detection Using Deep Learning. In Futuristic Communication and Network Technologies; Springer: Singapore, 2022; pp. 179–190. [Google Scholar]
Yu, L.; Chen, H.; Dou, Q.; Qin, J.; Heng, P.A. Automated melanoma recognition in dermoscopy images via very deep residual networks. IEEE Trans. Med. Imaging 2016, 36, 994–1004. [Google Scholar] [CrossRef]
Codella, N.; Cai, J.; Abedini, M.; Garnavi, R.; Halpern, A.; Smith, J.R. Deep learning, sparse coding, and SVM for melanoma recognition in Dermoscopy Images. In Proceedings of the 6th International Workshop on Machine Learning in Medical Imaging, Munish, Germany, 5–9 October 2015; pp. 118–126. [Google Scholar] [CrossRef]
Panchal, R. A review on protection against fileless malware attacks using gateway. Turk. J. Comput. Math. Educ. (TURCOMAT) 2021, 12, 7302–7307. [Google Scholar]
Uçkuner, M.; Erol, H. A New Deep Learning Model for Skin Cancer Classification. In Proceedings of the 2021 6th International Conference on Computer Science and Engineering (UBMK), Ankara, Turkey, 15–17 September 2021; pp. 27–31. [Google Scholar]
Filali, Y.; El Khoukhi, H.; Sabri, M.A.; Aarab, A. Analysis and classification of skin cancer based on deep learning approach. In Proceedings of the 2022 International Conference on Intelligent Systems and Computer Vision (ISCV), Fez, Morocco, 18–20 May 2022; pp. 1–6. [Google Scholar]
Fanconic. Skin Cancer—Malignant vs. Benign. Kaggle. 2022. Available online: https://www.kaggle.com/datasets/fanconic/skin-cancer-malignant-vs-benign (accessed on 10 February 2023).
Dhankhar, N. ISIC 2020 JPG 224X224 Resized, Kaggle. 2024. Available online: https://www.kaggle.com/datasets/nischaydnk/isic-2020-jpg-224x224-resized/data (accessed on 18 August 2024).
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
Xin, C.; Liu, Z.; Zhao, K.; Miao, L.; Ma, Y.; Zhu, X.; Zhou, Q.; Wang, S.; Li, L.; Yang, F.; et al. An improved transformer network for skin cancer classification. Comput. Biol. Med. 2022, 1, 105939. [Google Scholar] [CrossRef]
Murugan, A.; Nair, D.H.; Kumar, D.S. Research on SVM and KNN classifiers for Skin cancer detection. Int. J. Eng. Adv. Technol. 2019, 9, 4627–4632. [Google Scholar] [CrossRef]
Huang, G.; Liu, Z.; van der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2261–2269. [Google Scholar]
Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar] [CrossRef]
Hassanien, A.; Bhatnagar, R.; Darwish, A. (Eds.) Advanced Machine Learning Technologies and Applications. In Advances in Intelligent Systems and Computing; Springer: Singapore, 2020; Volume 1141. [Google Scholar] [CrossRef]
Salazar, A.; Safont, G.; Vergara, L.; Vidal, E. Graph regularization methods in soft detector fusion. IEEE Access 2023, 11, 144747–144759. [Google Scholar] [CrossRef]
Safont, G.; Salazar, A.; Vergara, L. Multiclass alpha integration of scores from multiple classifiers. Neural Comput. 2019, 31, 806–825. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Image of Benign and Malignant before preprocessing.

Figure 2. Benign and Malignant sample after augmentation.

Figure 3. Model flow chart.

Figure 4. CNN confusion matrix.

Figure 5. Fine-tuned DenseNet-121 confusion matrix.

Figure 6. Fine-tuned EfficientNetB0 confusion matrix.

Figure 7. Fine-tuned ResNet-34 confusion matrix.

Figure 8. Fine-tuned VGG-16 confusion matrix.

Figure 9. Fine-tuned Inception v3 confusion matrix.

Figure 10. SVM confusion matrix.

Figure 11. Random Forest Classifier confusion matrix.

Figure 12. CNN Receiver Operating Characteristic.

Figure 13. Fine-tuned DenseNet-121 Receiver Operating Characteristic.

Figure 14. Fine-tuned EfficientNetB0 Receiver Operating Characteristic.

Figure 15. Fine-tuned Inception v3 Receiver Operating Characteristic.

Figure 16. Fine-tuned ResNet-34 Receiver Operating Characteristic.

Figure 17. Random Forest Classifier Receiver Operating Characteristic.

Figure 18. SVM Receiver Operating Characteristic.

Figure 19. Fine-tuned VGG-16 Receiver Operating Characteristic.

Figure 20. CNN Precision–Recall curve.

Figure 21. Fine-tuned DenseNet-121 Precision–Recall curve.

Figure 22. Fine-tuned EfficientNetB0 Precision–Recall curve.

Figure 23. Fine-tuned Inception v3 Precision–Recall curve.

Figure 24. Fine-tuned ResNet-34 Precision–Recall curve.

Figure 25. Fine-tuned VGG-16 Precision–Recall curve.

Figure 26. RFC Precision–Recall curve.

Figure 27. SVM Precision–Recall curve.

Table 1. Dataset description.

Dataset Label	Training Set	Test Set	Validation Set	Total
Benign	1534	186	80	1800
Malignant	1284	144	69	1497
Total	2818	330	149	3297

Table 2. Dataset training before and after augmentation.

Dataset Labels	Before Augmentation	After Augmentation
Benign	1534	3068
Malignant	1284	2568
Total	2818	5636

Table 3. Parameters of CNN.

Layer	Output Shape	Parameters Trained
Conv2D	(None, 222, 222, 32)	896
MaxPooling2D	(None, 111, 111, 32)	0
Conv2D	(None, 109, 109, 64)	18,496
MaxPooling2D	(None, 54, 54, 64)	0
Conv2D	(None, 52, 52, 128)	73,856
MaxPooling2D	(None, 26, 26, 128)	0
Flatten	(None, 86,528)	0
Dense	(None, 64)	5,537,856
Dense	(None, 1)	65
Total Parameters		5,631,169
Trainable Parameters		5,631,169
Nontrainable Parameters		0

Table 4. CNN Hyperparameters.

Parameter	Values
Epoch	20
Optimizer	Adam
Batch size	64
Activation	Sigmoid
Loss	Binary Cross-entropy

Table 5. Parameters of fine-tuned VGG16.

Layer	Output Shape	Parameters Trained
VGG16	(None, 7, 7, 512)	14,714,688
Flatten	(None, 25,088)	0
Dense_1	(None, 32)	802,848
LeakyReLU_1	(None, 32)	0
Dense_2	(None, 16)	528
LeakyReLU_2	(None, 16)	0
Dense_3	(None, 1)	17
Total Parameters		15,518,081
Trainable Parameters		803,393
Nontrainable Parameters		14,714,688

Table 6. Parameters of fine-tuned ResNet-34.

Layer	Output Shape	Parameters Trained
ResNet-34	(None, 7, 7, 2048)	23,587,712
Flatten	(None, 100,352)	0
Dense_1	(None, 32)	3,211,296
LeakyReLU_1	(None, 32)	0
Dense_2	(None, 16)	528
LeakyReLU_2	(None, 16)	0
Dense_3	(None, 1)	17
Total Parameters		26,799,553
Trainable Parameters		3,211,841
Nontrainable Parameters		23,587,712

Table 7. Parameters of fine-tuned DenseNet-121.

Layer	Output Shape	Parameters Trained
DenseNet-121	(None, 7, 7, 1024)	7,037,504
Flatten	(None, 50,176)	0
Dense_1	(None, 32)	1,605,664
LeakyReLU_1	(None, 32)	0
Dense_2	(None, 16)	528
LeakyReLU_2	(None, 16)	0
Dense_3	(None, 1)	17
Total Parameters		8,643,713
Trainable Parameters		1,606,209
Nontrainable Parameters		7,037,504

Table 8. Parameters of fine-tuned Inception-v3.

Layer	Output Shape	Parameters Trained
Inception_v3	(None, 5, 5, 2048)	21,802,784
Flatten	(None, 51,200)	0
Dense_1	(None, 32)	1,638,432
LeakyReLU_1	(None, 32)	0
Dense_2	(None, 16)	528
LeakyReLU_2	(None, 16)	0
Dense_3	(None, 1)	17
Total Parameters		23,441,761
Trainable Parameters		1,638,977
Nontrainable Parameters		21,802,784

Table 9. Parameters of fine-tuned EfficientNetB0.

Layer	Output Shape	Parameters Trained
EfficientNetB0	(None, 7, 7, 1280)	4,049,571
Flatten	(None, 62,720)	0
Dense_1	(None, 32)	2,007,072
LeakyReLU_1	(None, 32)	0
Dense_2	(None, 16)	528
LeakyReLU_2	(None, 16)	0
Dense_3	(None, 1)	17
Total Parameters		6,057,188
Trainable Parameters		2,007,617
Nontrainable Parameters		4,049,571

Table 10. Model performance metrics.

Models	Accuracy	P		R		F1
Models	Accuracy	Benign	Malignant	Benign	Malignant	Benign	Malignant
CNN	82%	0.87	0.73	0.76	0.85	0.81	0.78
SVM	83%	0.88	0.78	0.83	0.83	0.85	0.81
Random Forest Classifier	81%	0.85	0.77	0.81	0.82	0.83	0.79
Fine-tuned EfficientNetB0	84%	0.88	0.80	0.83	0.85	0.86	0.83
Fine-tuned ResNet-34	82%	0.83	0.79	0.84	0.78	0.84	0.79
Fine-tuned VGG16	85%	0.90	0.79	0.82	0.88	0.86	0.84
Fine-tuned Inception_v3	81%	0.81	0.80	0.86	0.74	0.84	0.77
Fine-tuned DenseNet-121	87%	0.87	0.87	0.90	0.82	0.88	0.84

Table 11. Statistical measure table.

Models	Mean Accuracy	Standard Deviation	95% Confidence Interval
Fine-tuned VGG16	0.87	0.01	0.86, 0.87
Fine-tuned ResNET-34	0.79	0.02	0.78, 0.80
Fine-tuned DenseNET-121	0.87	0.02	0.86, 0.87
Fine-tuned Inception_v3	0.82	0.03	0.81, 0.83
Fine-tuned EFFICIENTNETB0	0.85	0.012	0.84, 0.85
CNN	0.83	0.03	0.85, 0.87
SVM	0.83	0.02	0.83, 0.84
Random Forest Classifier	0.82	0.02	0.81, 0.82

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bello, A.; Ng, S.-C.; Leung, M.-F. Skin Cancer Classification Using Fine-Tuned Transfer Learning of DENSENET-121. Appl. Sci. 2024, 14, 7707. https://doi.org/10.3390/app14177707

AMA Style

Bello A, Ng S-C, Leung M-F. Skin Cancer Classification Using Fine-Tuned Transfer Learning of DENSENET-121. Applied Sciences. 2024; 14(17):7707. https://doi.org/10.3390/app14177707

Chicago/Turabian Style

Bello, Abayomi, Sin-Chun Ng, and Man-Fai Leung. 2024. "Skin Cancer Classification Using Fine-Tuned Transfer Learning of DENSENET-121" Applied Sciences 14, no. 17: 7707. https://doi.org/10.3390/app14177707

APA Style

Bello, A., Ng, S.-C., & Leung, M.-F. (2024). Skin Cancer Classification Using Fine-Tuned Transfer Learning of DENSENET-121. Applied Sciences, 14(17), 7707. https://doi.org/10.3390/app14177707

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Skin Cancer Classification Using Fine-Tuned Transfer Learning of DENSENET-121

Abstract

1. Introduction

2. Literature Review

3. Materials and Methods

3.1. Data Collection and Preprocessing

3.2. Data Augmentation

3.3. Model Flow Chart

3.3.1. Convolutional Neural Network (CNN)

3.3.2. Support Vector Machine (SVM)

3.3.3. Random Forest Classifier

3.3.4. Fine-Tuned VGG-16

3.3.5. Fine-Tuned ResNet-34

3.3.6. Fine-Tuned DenseNet-121 (Proposed Model)

3.3.7. Fine-Tuned Inception v3

3.3.8. EfficientNetB0

4. Results and Discussion

4.1. Performance Indicators

4.2. Detector Fusion

4.3. Statistical Significance

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI