1. Introduction
Skin cancer (SC) is one of the most prevalent types of cancer in the current decade [
1]. According to several reports, in many countries, the number of new SC patients and the loss of lives from SC has increased in recent years. Among various types of SC, e.g., basal cell carcinoma, squamous cell carcinoma, melanoma, etc., melanoma is one of the most deadly and lethal compared to other types [
2]. The majority of malignant cases such as melanoma are distinguished by the growth of lesion accompanied by its asymmetric shape and various colors, entirely with a record of variation in texture, color, structure and/or size. UV radiation disclosure is one of the primary origins of SC. Melanoma exhibits the highest number of cases among various other types of SC. Prior to age 50, incidence rates are higher in women than men, but after that, it become progressively higher among men. Melanoma skin cancer can be treated early to reduce patient mortality. So, early detection is crucial to reduce the incidence of melanoma SC. The “ABCDE” rule and the “7-point checklist” are two of the most used techniques for diagnosis. Decisions based on asymmetry, uneven borders, colour variation, diameter, evolution, inflammation, and sensory alterations are strongly relied upon.
In order to identify melanoma SC, an experienced dermatologist typically follows a series of steps. They begin by visually inspecting suspicious lesions, then proceed to dermoscopy and, if necessary, a biopsy. In the biopsy procedure, a suspected cutaneous lesion is obtained and examined in the pathological laboratory to ascertain if it is cancerous. This procedure is time consuming and painful [
1]. Because of the increasing number of cases, the inter-human variations in views and the infrastructural constraints, the manual way of examination is not found to be reliable. Moreover, melanoma is hard to notice in the early stages as they are visually quite similar to benign lesions. Because of all these reasons, the manual diagnosis procedure is usually challenging. To overcome such problems, a computer-aided diagnosis using dermoscopic images could be highly beneficial to discriminate between melanoma SC and benign cases in its early phases and may be utilized as a second opinion. Dermoscopy is an effective imaging mechanism for skin lesions where an enlarged high-resolution image is captured while suppressing the skin surface back scattering. Though improved high resolution pictures of skin lesions are available through dermoscopy, dermatologists still encounter difficulties achieving higher SC diagnosis improvement rates. The inter-expert variation in opinions, the time factor and the availability of experienced dermatologists pose limitations in the manual examination of dermoscopic images. An automatic computer-aided system for SC diagnosis from dermoscopic images is important for the improved diagnosis of SC and may be utilized as a second opinion for experts.
In this study, we construct a hybrid deep feature set in order to capture the comprehensive features and improve the generalization capacity. The best appropriate features are selected through an effective BTG algorithm. Our contributions in this paper are encapsulated below:
We fine-tuned the Densenet-201 and Inception-v3 models on skin lesion data. The deep features from each model are extracted and blended together to form a powerful hybrid deep-feature set in order to obtain improved discriminating features. It is observed that the features extracted from each model are complementary to each other.
In order to further improve the classification outcomes, the redundant or irrelevant features from the hybrid feature set are removed using the BTG-based optimization procedure. The BTG-based optimization is implemented employing a SVM classifier which is used for computation of categorizing errors and to build a symmetry between the classification categories. Extensive tests have been performed to verify the benefits of the suggested method.
The remaining work is assembled as follows—
Section 2 presents the literature survey on various skin lesion classification studies and feature selection techniques.
Section 3 provides an overview of the dataset used, and the proposed methodology is described in
Section 4.
Section 5 presents the experimental results and related discussion, and finally,
Section 6 concludes the paper.
2. Related Work
In [
3], the authors presented a deep learning (DL)-based CAD technique in which they carried out image augmentation on a dermoscopic training dataset, and the fine tuning of pre-trained CNN (Resnet-50 and Resnet-101) models was performed to classify the skin lesion images. They extracted deep features from the global average pooling (GAP) layer and performed feature fusion along with feature selection followed by machine learning classification. They had encouraging outcomes on the HAM10000 dataset. Using transfer learning with pre-trained GoogleNet, Kassem et al. [
2] developed a technique for the ISIC 2019 challenge dataset. Even with varying numbers of images in each category, the suggested technique can reliably classify eight types of lesions. A few filters in some layers were added to improve the features. Popescu et al. [
4] presented a CAD system to classify SC using the collective intelligence of nine CNNs. The judgement made by each neural network is fused into a single one using a weight matrix to create a decision fusion module. Compared to the best performing individual network, the authors demonstrate a considerable increase in accuracy. Zhao et al. [
5] introduced a novel SC classification framework where high-quality synthetic images were generated using StyleGAN. The structure of noise input and style control in the actual generator of the module was reorganized and a new loss function was implemented in order to boost balanced multi-class accuracy. In [
6], Al-Masni et al. developed a novel framework in which full-resolution CNN (FrCN) [
7] was used to segment the lesions and different pre-trained CNN models such as Inception-v3, Resnet-50, Inception-Resnet-v2 and Densenet-201 were applied on segmented lesions in order to classify them. The analysis revealed that ResNet-50 is the best performing model for identifying SC, followed by the Inception-ResNet-v2 model. The authors also show that their method performs better with augmented and balanced datasets than unaugmented and imbalanced datasets. A framework [
8] is developed to collect a skin sample through a mobile device and label it according to its lesion. The lesion is initially segmented using a 16-layered CNN architecture with an enhanced high dimension contrast transform technique. This method reduced computational time while increasing segmentation accuracy compared to contemporary methods. On a well-known dataset of dermoscopic images, i.e., HAM10000, Kousis et al. [
9] examined 11 pre-trained networks to diagnose skin cancer. Their findings showed that the Densenet-169 performs better than other networks. A two-class Densenet-169 mapping model was developed which has shown excellent results. A mobile application was also developed to assist users in obtaining a preliminary understanding of their skin lesions based on the basic 2-class model. Ali et al. [
10] suggested an architecture that used sigmoid as the output activation layer. The HAM10000 dataset was used to evaluate the suggested framework. Compared to other transfer learning models already in use, the authors achieved better training and testing accuracy results. The dataset was balanced to enhance classification accuracy for all the datasets. Lan et al. [
11] introduced a capsule network called FixCaps. Applying a high-performance large-kernel with a kernel size of up to 31 × 31 at the bottom convolution layer allows FixCaps to achieve a significant receptive field. In order to reduce spatial information losses and prevent model underfitting in the capsule layer, the convolutional block attention module and group convolution, respectively, were adopted. FixCap can increase detection accuracy while requiring fewer calculations than other available techniques. Almaraz-Damian et al. [
12] presented a new CAD system combining handcrafted features with deep features to discriminate between melanoma and nevus lesions. The suggested approach combines the features using a mutual information metric to pick up the most important features. In order to improve the efficacy of skin lesion classification, the authors [
13] proposed a new deep CNN model, i.e., CSLNet with fewer filters and parameters that employ numerous layers, and different filter sizes. CSLNet unlike many other algorithms does not require rigorous pre-processing or handcrafted features to classify the skin lesions effectively. In [
14], the authors combined image deep features, hand-designed features and some patient related meta-data in order to accurately detect skin cancer. In [
15], Kadirappa et al. proposed an efficient skin lesion segmentation technique called SASegNet which is based on a new U-Net and spatial attention blocks. The segmented output data are then fed to EfficientNet B1 in order to produce the local features. The global features are produced by feeding the actual preprocessed images to the EfficientNet B1 network and eventually both local and global features are combined to produce the best features for accurate classification. A novel 56-layered residual deep CNN, i.e., RDCNN for the detection of skin cancers is introduced in [
16]. Golnoori et al. [
17] optimized the hyper parameters of a few pre-trained networks using metaheuristic algorithms and blended the features extracted from them. A KNN classifier is employed to classify the skin lesions. Alsahafi et al. [
18] introduced a 54-layered deep residual network for skin lesion categorization. This method captures multi-level features by employing variable filter dimensions. The authors carried out cross-channel correlation and neglected the spatial dimensions. Jasil and Ulagamuthalvi [
19] offered a multi-class classification system for skin lesion categorization that uses a Densenet and residual-based architecture. In order to enhance the discriminativeness of the features, they blended the layers from Densenet 121 and the residual-based network. The features from both networks are blended together and are fed to the convolutional layers and eventually to the classifier. It is shown that the powerful pre-processing procedures before classification can notably enhance the overall performance of the system. In order to increase the system’s ability to recognize skin lesions, Juan et al. [
20] employed an optimized form of transfer learning version concerning to Densenet-201, Inception-Resnet-v2 and Inception-v3 models. Experiments with/without augmentation with/without optimization is employed to handle the class imbalance issue. The experiments achieved an accuracy of up to 98% on the HAM10000 dataset and up to 93% on the ISIC2019 dataset. Hassan et al. [
21] demonstrated the benefits of a decision fusion scheme that utilizes the accuracy associated with a deep network model for SC classification. The authors combined a few of such schemes to achieve a global decision framework with much-enhanced accuracy than any other single classifier. The models were fine-tuned to discriminate the various categories of skin lesions. In [
22], Samia et al., examined the efficiency of seventeen deep CNN architectures for capturing of features and twenty-four different classifiers for categorizing of skin lesions. The Densenet-201 coupled with cubic SVM was observed to have achieved superior results on ISIC2019. In [
23], Rashid et al. introduced a deep feature framework based on the Mobilenet-v2 model for skin cancer detection. Their model classifies the skin lesions into malignant or benign categories. Data augmentation was carried out to tackle the class imbalance problem. A deep feature framework called MSLANet for skin lesion categorization, which is made of multiple long attention networks, is proposed in [
24]. Each network utilizes context details and enhances the crucial information description via long attention technique. The global context and local scale details are captured through the MSLANet model. A deep data augmentation scheme is introduced to improve the overall performance of the framework.
Each pre-trained deep network model based on its architectural design captures some unique features from the input images and accordingly shows some misclassification outcomes. The strength of different deep feature models can be used to reduce the wrong predictions by blending their features for SC classification.
In [
25,
26,
27,
28,
29,
30,
31,
32,
33,
34,
35], it was observed that the presence of irrelevant or redundant features may lead to the incorrect prediction of images and may bring down the classification results. Many techniques [
25,
26,
27,
28,
29,
30,
31,
32,
33,
34,
35] employ feature selection (FS) algorithms for selecting the most appropriate features in order to maximize the classification performance. In [
25], the authors suggested a technique where the best features from a fine-tuned NASNET-large network are selected through a hybrid whale optimization algorithm (WOA) and entropy mutual information technique, and, fused using modified canonical correlation scheme and eventually classified through ELM. In [
26], the authors introduced an SC categorization technique that uses a grasshopper optimization algorithm for optimized FS. In [
27], Khan et al., used entropy coupled with Bhattacharyya distance and variance as an FS scheme to capture the significant features. Khan et al., in [
28] used an FS technique based on distance-guided entropy for SC categorization. An iterative Newton-Raphson-based FS scheme is utilized in [
29] for skin lesion categorization. In [
31], Wen et al., introduced a technique for the categorization of meta-spectral remote sensing images where the ant colony optimization scheme is applied for feature selection. The ant colony optimization (ACO) imitates the foraging conduct of ants. In [
32], Venkata proposed an effective framework for the detection of kidney carcinoma. Their method comprised of region of interest segmentation, image pre-processing, extraction and then optimal selection of features, and finally categorization. The optimal feature selection here is carried out using dragonfly algorithm. The dragonfly technique concentrates on dragonfly characteristics and their psychological potential. The dragonfly swarming nature during migration and hunting is known as a stationary swarm and it is characterized by small groups of dragonflies changing their movement quickly and in close proximity. An amalgamation of deep GoogleNet features and a natural environment inspired optimization scheme, i.e., particle swarm optimization (PSO) is used for autonomous vehicle categorization [
33]. The canonical PSO is greatly influenced by the transformative conduct of the creatures and is based on the social co-ordination and flocking conducts of birds and fish schools. An effective scheme, which combines the butterfly optimization and ant lion procedures in order to effectively lessen the feature dimensions through eliminating redundant features, is proposed in [
36]. The selected features are then used to foresee the benign/malignant condition of breast tissues employing different classifiers. In [
37], an efficient metaheuristic procedure called the tree growth algorithm (TGA) which is motivated from a tree’s struggle to obtain light and nourishment, is proposed. Zhong et al., in [
34] introduced a binary TGA and a linearly escalating variable adjustment scheme to adjust the variable value in TGA. In [
38], Khasanov et al., proposed an integrated optimization scheme utilizing TGA and power loss sensitivity factor, i.e., PLSF to recognize the best dimension and position of different distributed generation unit in distributed systems in order to lessen the complete power losses. In [
35], Too et al., proposed a framework for myoelectric signal categorization utilizing the binary TGA-based feature selection. In any image classification framework, the choice of optimized feature selection (FS) algorithms plays a crucial role. While many effective feature selection-based image classification frameworks exists in the literature, yet more powerful frameworks that facilitate improved and robust feature selection are still demanding.
Several meta-heuristic procedures have been employed in the feature selection applications. The monarch butterfly optimization algorithm (MBOA) [
39] is a category of swarm intelligence and motivated by relocation conduct of monarch butterflies. Lone cases in MBOA are modified through the relocation process and butterfly adaptation action. The performance of MBOA was compared to five different metaheuristic optimization schemes via 38 criteria. MBOA performs at the fifth best level on six out of the 38 criteria when taken as the mean. The very recent RIME optimization algorithm [
40] imitates the formation of rime-ice’s soft-rime and hard-rime layers, and then builds a soft-rime probe tactic and a hard-rime piercing scheme in order to apply the exploration and exploitation conducts in optimization approaches. In [
41], Wang introduced a new all-purpose metaheuristic approach called Moth search where the photo axis and levy flights of moths in the natural environment are abstracted and mapped in this study. The Moth search implementation was demonstrated to be simple and flexible. Iman et al. [
42] introduced a new optimizer called weighted mean of vectors to optimize various issues. This approach is an improved weight-mean approach that employs the weight mean tactic for a solid layout and three key schemes to change the location of the vectors: (i) an updating way, (ii) vector merging and (iii) a local exploration. This technique has been demonstrated to have converged to 0.99% of the overall best solution. In [
43], a nature-motivated algorithm called Harris hawks optimization algorithm (HHOA) which is based on population-kind, is proposed. The primary source of HHO’s motivation is the friendly attitude and tracking manner of Harris hawks in their natural environment known as surprise pounce. In this clever plan, many hawks work together to attack prey from various angles in an effort to surprise it. In order to create an optimization procedure, this study computationally imitates such active patterns and conducts. This technique exhibits good results at random occasions when compared to popular relevant schemes. The Runge Kutta optimization scheme proposed by Iman et al. [
44] can be applied in many real word applications. The Runge Kutta optimization scheme is based on a logical probing technique for global optimization that makes use of the rationale of slope differences computed by the Runge Kutta scheme. In order to explore the crucial areas in the feature space and to make progress towards the overall optimal result, this search technique takes advantage of two dynamic stages, i.e., exploration and exploitation. This method has shown good results and faster convergence. Motivated by the action of animals in a starving situation, in [
45], a hunger game search algorithm (HGSA) is introduced. The HGSA integrates the hunger idea into the feature operation. To put it differently, an adaptive weight depending on the hunger idea is created and used to mimic the impact of hunger on each search stage. The key benefits of this algorithm over other approaches are its dynamic behavior, straight-forward framework, good converging results and adequate nature of solutions. In [
46], Li et al., introduced a slime mould (SM) optimization algorithm which is based on the fluctuation fashion of slime mould in the natural environment. This algorithm has many novel features and a special computational model that imitates the creation of (+ve) and (−ve) feedbacks of the SM propagation wave built on bio-oscillator to construct the ideal path for linking food with very good exploratory capability and exploitation tendency. This model utilizes adaptive weights. To validate the effectiveness of this SM-based technique, it was tested on four traditional engineering issues, where it demonstrated the best results quite often on various search prospects. An optimization method called the colony predation algorithm which utilizes the joint kind of predation of animals is proposed in [
47]. This algorithm uses a computational depiction that mimics the tactics employed by animal-hunting parties like scattering prey, surrounding the prey, assisting the hunter with the best chance of success and looking for alternative prey. This algorithm has shown good performance over a few other meta-heuristics on some criteria.
From the literature, it can be seen that the TGA [
37] is a straightforward meta-heuristic that has proven to be more effective than many others. Its performance was tested and found to be satisfactory in solving different engineering optimization problems. Its convergence conduct on two standard functions demonstrates that TGA is very fast and has powerful convergence nature and can detect the global optima in very few iterations. Moreover, with the adjustment of very few parameters, the compromise between intensification and diversification can be achieved. Therefore, motivated by the encouraging performance of TGA, we have attempted the implementation of a binary tree growth (BTG) algorithm which simulates the behavior of a flourishing tree in feature selection with an application to the skin lesion classification problem.
A feature selection issue is a search-space issue and needs complete balance between diversification or exploration and intensification or exploitation phases. Although a good number of work exists in the literature that shows good balance between these two phases, discerning schemes that exhibit more appropriate balance between these two phases for a feature selection problem is demanding.
3. Datasets
We have considered ISIC2018 [
48,
49] and ISIC2019 [
49,
50] datasets for experiments in this study.
The ISIC2018 dataset contains 10,015 dermoscopic images. The dataset images were collected from the Medical University of Vienna (MUV), Austria, and C. R. skin cancer practice in Queensland, Australia. It has taken twenty years to put this collection together. Prior to the easy access of digital cameras, lesion photographs were taken, saved, and placed at the Department of Dermatology, MUV, Austria. The Nikon-Coolscan-5000-ED scanner was used to digitally scan these image prints which were next transformed into 8-bit color JPEG pictures with a quality of 300 DPI. After necessary editing, the photos were saved at 72 DPI and 800 × 600 pixels of resolution. This dataset comprised of seven classes, i.e., (i) vascular lesions (VASC), (ii) actinic keratosis (AKIEC), (iii) melanoma (MEL), (iv) benign keratosis (BKL), (v) melanocytic nevus (NV), (vi) basal cell carcinoma (BCC) and, (vii) dermatofibroma (DF) with 142, 327, 1113, 1099, 6705, 514 and 115 images in each class, respectively.
The ISIC2019 dataset contains 25,331 dermoscopic images and is comprised of eight classes, i.e., (i) melanoma (MEL), (ii) squamous cell carcinoma (SCC), (iii) basal cell carcinoma (BCC), (iv) dermatofibroma (DF), (v) melanocytic nevus (NV), (vi) vascular lesion (VASC), (vii) benign keratosis (BKL) and actinic keratosis (AKIEC) with 4522, 628, 3323, 239, 12,875, 253, 2624 and 867 images in each class, respectively. Because photographs from previous ISIC challenges were re-employed in subsequent challenges, ISIC images were grouped by their actual datasets, as depicted in the ISIC records, to avoid the same images being considered more than one time in analysis.
The sample images from each class of the ISIC2019 dataset are shown in
Figure 1.