1. Introduction
The largest organ in the human body is the skin, which is composed of many layers (the epidermis, the dermis, the subcutaneous tissues, the blood vessels, the lymphatic vessels, the nerves, and the muscles). The ability of the skin to act as a barrier can be strengthened by employing fluids to stop the breakdown of lipids in the epidermis. Diseases of the skin can be brought on by a fungus that grows on the skin, bacteria that are not visible to the naked eye, allergic responses, bacteria that change the texture of the skin, or pigmentation [
1]. Chronic skin conditions can develop into cancerous tissues on rare occasions. Skin disorders must be treated as soon as they appear to keep them from spreading and progressing [
2]. Imaging-based treatments for determining the consequences of various skin diseases are in high demand. It may take months before a patient is diagnosed with the signs of several skin illnesses, making treatment difficult. There has been a lack of generalization in previous dermatological computer-aided categorization works due to a lack of data and a concentration on routine tasks such as dermoscopy, which refers to the examination of the skin utilizing a microscope of the skin surface. Computer-aided diagnosis can be used to diagnose skin illnesses and provide treatment based on the symptoms of patients [
3]. Skin illnesses can be accurately identified using supervisory procedures that reduce the cost of diagnosis. The progression of sick growth is monitored using a grey-level co-occurrence matrix. For more effective treatment and lower pharmaceutical costs, it is critical that a diagnosis be accurate.
There is a big disparity between those who have skin illnesses and those who have the training to treat them. Dermatologists, equipment, drugs, and researchers are among the resources available. Those living in rural areas suffer the most from a lack of resources, according to the World Health Organization. Automated expert systems for the classification of early skin lesions are necessary because of the massive imbalance between the skin patients and the expertise. In resource-constrained locations, these categorization algorithms can aid in the early detection of skin lesions [
4,
5]. Computer vision algorithms have been offered in the literature as comprehensive research solutions for early skin lesion diagnosis and the aforementioned complexity [
6]. DTs, SVMs, and ANNs are just a few examples of the many different approaches available for classifying data [
7,
8]. In Reference [
9], a comprehensive evaluation of various strategies is provided. As a result, many machine learning approaches rely on photos with low noise and high contrast that cannot be used with skin cancer data. Color, texture, and structural traits play a role in skin classification. As skin lesions have a significant degree of inter-class homogeneity and intra-class heterogeneity, the classification may lead to incorrect findings with weak feature sets [
10]. Since skin cancer data are not normally distributed, the usual methodologies cannot use them because they are parametric. These approaches are ineffective since each lesion has a unique pattern. Using deep learning approaches in skin classification, dermatologists can accurately diagnose lesions. Deep learning’s role in medical applications has been explored in depth in some studies [
11,
12].
Basal cell carcinoma, squamous cell carcinoma, and melanocyte carcinoma are the most common subtypes of skin cancer [
13]. The most prevalent kind of cancer, known as basal cell carcinoma, is characterized by sluggish progression and does not metastasize to other areas of the body. Because it often comes back, getting rid of it thoroughly from the body is essential. Squamous cell carcinoma is a different type of skin cancer that can spread to other places of the body and goes deeper into the skin than basal cell carcinoma. When the skin is exposed to sunlight, melanocytes, the cells that make the skin dark or tan, create melanin. Cancerous moles, which are also known as melanoma cancer, arise when the melanin within these cells accumulates in the body. Melanocyte-based malignancies, on the other hand, are classified as malignant and can be life-threatening due to their ability to inflict minor damage to surrounding tissues. The ISIC Skin Imaging Collaboration is one of the datasets that has been utilized rather frequently for the purpose of this study [
14]. According to the data provided by the ISIC 2016–2020, lesions may be broken down into the following four categories: The most frequent types of skin lesions are known as nevus (NV), seborrheic keratosis (SK), benign (BEN), and malignant (MEL). The trunk, arms, and legs can all display varying hues of pink, brown, and tan that indicate NV cancer. The next kind is the SK, which, when it is not malignant, can have the appearance of a waxy brown, black, or tan. BEN is a sort of lesion that is not malignant and does not penetrate the tissues that are nearby, nor does it spread to other parts of the body. A lesion is said to be BEN if it possesses both NV and SK components. MEL is a massive brown mole with dark speckles that can bleed or change color over time. This is the final and most important point. It is a cancer that is quite aggressive and quickly spreads throughout the body. There are several subtypes of MEL, including acute, nodular, and superficial. The purpose of this research is to differentiate between MEL and BEN cancers, which is the study’s primary objective. Our major contributions are as follows: (i) An improved convolutional neural network is proposed by using variable size kernels and activation function in the network. Moreover, fewer numbers of kernels are used in the first three layers of the network as compared to the last two layers, which results in efficient utilization of kernels. (ii) The ReLU activation function is used in the first three layers of the network, whereas leakyReLU is used in the last two layers of the convolutional neural network to improve the performance of the skin lesion classification. (iii) Class-wise balancing of data has been performed to unbiased the training. (iv) The model has achieved high accuracy with fewer parameters and in less computational time as compared to other state-of-the-art models and existing works.
2. Background of Study
There is a huge number of deaths each year due to skin cancer, which is prevalent all over the world [
15]. To preserve lives, it is critical to perform early identification of this aggressive disease. The ABCDE [
16] criteria are followed by several histopathology tests by clinical professionals. Preprocessing, feature extraction, segmentation, and classification are some of the standard processes that can be automated using artificial intelligence-based algorithms. Handcrafted feature sets, which lack generalizability for dermoscopic skin pictures, were heavily relied upon in several classification algorithms [
17,
18]. Because of their similarities in color, shape, and size, lesions are highly linked, resulting in inadequate feature information [
19,
20].
In order to extract features, the ABCD scoring method was applied to the data. Lesion classification was completed by employing a combination of existing approaches. The thickness of the lesion was used to classify melanoma in [
21]. First, lesions were classified as thin or thick, and second, they were classed as thin, medium, and thick. The logistic regression and artificial neural networks were proposed for classification purposes. To increase the number of lesions, a median filter was applied in a distinct manner to each of the RGB channels [
22]. In order to segregate these lesions, a deformable model was utilized. The Chan–Vese model was used as the foundation for a segmentation approach that was developed in [
23].
A support vector machine was used to classify these features support vector machine (SVM). The paraconsistent logic (PL) method was used by the authors to classify melanoma (MEL) and basal cell carcinoma (BCC) [
24]. They were able to determine the strength of the evidence, the pattern of formation, and the diagnostic contrast. BCC and MEL were distinguished using spectra with values of 30, 96, and 19, respectively. In [
25], the binary mask of ROIs was extracted using a Delaunay Triangulation. By removing the granular layer boundary, the authors of [
26] were able to identify only two lesions in the histological pictures. Alam et al. [
27] presented an SVM to automate the detection of eczema. This was accomplished by segmenting the acquired image, choosing features based on texture-based information for more accurate predictions, and ultimately utilizing the support vector machine (SVM) for evaluating the advancement of eczema as reported by I. Immagulate [
28]. When dealing with noisy image data, it is inappropriate to apply the support vector machine modeling technique [
29]. When working with an SVM, it is essential to locate parameter values that are feature-based. If there are more parameters in each feature vector than there are data samples that were utilized for training, then its performance will be subpar.
Artificial neural networks and convolutional neural networks (CNN) are the methods that are employed most frequently for artificial neural networks to detect and diagnose abnormalities in radiological imaging data [
30,
31]. The CNN method of diagnosing skin diseases has produced good results [
32]. This makes working with images taken on a smartphone or digital camera difficult because CNN models are not scaled or rotation invariant. Both neural network approaches require enormous amounts of training data to achieve the model’s high performance, which in turn necessitates a substantial amount of computational effort [
33]. The models based on neural networks are more abstract, and we are unable to modify them to suit our own requirements because of this. Additionally, the number of trainable parameters in ANN skyrockets as picture resolution improves, which necessitates massive training efforts in order to achieve accurate results. The gradient shrinks and explodes, which causes problems for the ANN model. In CNN’s findings, the object’s magnitude and size are not correctly interpreted [
34,
35].
J. Zhang et al. [
35] proposed CNN for skin classification. A deep convolutional neural network (DCNN) was used to investigate the network’s inherent ability to pay attention to itself. With the use of attention maps at lower layers, each ARL block develops residual learning mechanisms that help it better categorize input data. On the basis of the ISIC 2017-19 datasets, Iqbal et al. [
36] developed a DCNN model for classifying multi-class skin lesions. In the beginning, the model transmits feature information from the top to the bottom of the network; their model employs 68 convolutional layers, which are made up of interconnected blocks. In addition, a similar approach was used by Jinnai and colleagues [
37]. They classify melanoma using 5846 clinical photos rather than dermoscopy using the FRCNN algorithm. To prepare the training dataset, they manually drew borders around lesion locations. Ten board-certified dermatologists and ten dermatology trainees were outperformed by the FRCNN, which had a better level of accuracy. Barata et al. [
38] offered an inquiry into boosting the performance of the ensemble CNN model in terms of accuracy by developing the proposed model. The fusion of data generated by four separate classification layers was utilized to create an ensemble model for three class classifications from GoogleNet, AlexNet, VGG, and ResNet Classification accuracy can be improved by taking into account the patient’s metadata as proposed by Jordan Yap et al. [
39]. Dermoscopic and macroscopic pictures were both sent into the ResNet50 network and then were utilized to classify them together. Multimodel classification outperformed the simple macroscopy-based model with an AUC of 0.866. The ISIC 2019 dataset was used by Gessert et al. [
40] to develop an ensemble model that incorporated EfficientNet, SENet, and ResNeXt WSL. They used a cropping approach to deal with photos having different resolutions from different models. In addition, a technique of loss balancing was created to deal with datasets that were unbalanced. On the HAM10000 dataset, Srinivasu et al. [
41] classified lesions by employing a deep convolutional neural network (DCNN) equipped with MobileNetV2 and long short-term memory (LSTM). MobileNetV2 was a CNN model that offered several advantages over existing CNN models, including a cheaper computational cost, a smaller network size, and compatibility with mobile devices. In the LSTM network, the features of MobileNetV2 were each given a timestamp as they were stored. When MobileNetV2 was combined with LSTM, there was an improvement in accuracy of up to 85.34 percent.
A straightforward and efficient method for improving images is known as histogram equalization. Because the equalizing method has the potential to dramatically alter the luminance of a picture in certain circumstances, it has never before been used in a video system; this is the reason why the technology has never been used. In this research, a novel histogram equalization method known as equal area dualistic sub-image histogram equalization is proposed [
42].
Huang et al. [
43] proposed deep learning techniques to create a lightweight model for classifying skin cancer that might be used to improve medical care. In this study, they looked at the clinical images and medical records of patients who had received a histological diagnosis of basal cell carcinoma, squamous cell carcinoma, melanoma, seborrheic keratosis, or melanocytic nevus in the Department of Dermatology at Kaohsiung Chang Gung Memorial Hospital between the years 2006 and 2017. In order to develop a skin cancer classification model, they used deep learning models to differentiate between malignant and benign skin tumors in the KCGMH and HAM10000 datasets. This was accomplished by binary classification and multi-class classification. In the KCGMH dataset, the deep learning model achieved an accuracy of 89.5% for binary classifications (benign vs. malignant), whereas in the HAM10000 dataset, the accuracy of the deep learning model was 85.8%.
Thurnhofer-Hemsi et al. [
44] introduced a deep learning model, namely MobileNet V2 and long short-term memory-based deep learning, to identify skin cancer. Experiments were conducted on the HAM10000 dataset, a sizable collection of dermatoscopic images, with the aid of data augmentation techniques to boost results. The investigation’s findings indicate that the DenseNet201 network is well suited for the undertaking at hand, as it achieved high classification accuracies and F-measures while simultaneously reducing the number of false negatives [
45].
Ioannis Kousis et al. [
46] presented a convolutional neural network (CNN) for detecting skin cancer; using the HAM10000 dataset, they trained and evaluated 11 different CNN architectures for identifying seven distinct types of skin lesions. In order to combat the imbalance issue and the great similarity between images of some skin lesions, they employed data augmentation (during training), transfer learning, and fine-tuning. According to their results, the DenseNet169 transfer model outperformed the other 10 CNN architecture variants.