*Article* **Equilibrium Optimization Algorithm with Ensemble Learning Based Cervical Precancerous Lesion Classification Model**

**Rasha A. Mansouri <sup>1</sup> and Mahmoud Ragab 2,3,\***


**Abstract:** Recently, artificial intelligence (AI) with deep learning (DL) and machine learning (ML) has been extensively used to automate labor-intensive and time-consuming work and to help in prognosis and diagnosis. AI's role in biomedical and biological imaging is an emerging field of research and reveals future trends. Cervical cell (CCL) classification is crucial in screening cervical cancer (CC) at an earlier stage. Unlike the traditional classification method, which depends on hand-engineered or crafted features, convolution neural network (CNN) usually categorizes CCLs through learned features. Moreover, the latent correlation of images might be disregarded in CNN feature learning and thereby influence the representative capability of the CNN feature. This study develops an equilibrium optimizer with ensemble learning-based cervical precancerous lesion classification on colposcopy images (EOEL-PCLCCI) technique. The presented EOEL-PCLCCI technique mainly focuses on identifying and classifying cervical cancer on colposcopy images. In the presented EOEL-PCLCCI technique, the DenseNet-264 architecture is used for the feature extractor, and the EO algorithm is applied as a hyperparameter optimizer. An ensemble of weighted voting classifications, namely long short-term memory (LSTM) and gated recurrent unit (GRU), is used for the classification process. A widespread simulation analysis is performed on a benchmark dataset to depict the superior performance of the EOEL-PCLCCI approach, and the results demonstrated the betterment of the EOEL-PCLCCI algorithm over other DL models.

**Keywords:** medical imaging; healthcare; decision making; cervical cancer; ensemble learning

## **1. Introduction**

Cervical cancer (CC) ranks as the fourth most common cancer in females. As per the statistical report by WHO, approximately 604,000 new cases occurred worldwide in 2020, particularly 6.5% of cancer cases in females [1]. Although the initial treatment rate of CC is high, lack of symptoms and signs hinders the initial diagnoses. An effective screening program may prevent CC deaths and decrease the persistence and incidence of the disease. The statistical reports stated that over 311,000 CC deaths occurred annually [2]. Because of amateur healthcare staff and inadequate screening funds, CC screening facilities seem to be very scarce in developing nations [3]. Thus, employing effective and automated screening techniques is essential to reduce the cost of initial detection of CC. CC screening follows the following workflow: colposcopy, HPV test, biopsy, and PAP smear test or cytology.

Numerous tools reinforce the task, which make it inexpensive, practical, and very effective [4]. The PAP smear image screening can be used for the treatment of CC; however, it needs several microscopic analyses to find non-cancer and cancer patients, and even if it takes more time and necessitates skilled professionals, there comes a chance of missing the positive case with the use of the traditional screening technique [5]. The HPV testing and PAP smear are expensive medications and offer less sensitivity. In contrast, colposcopy

**Citation:** A. Mansouri, R.; Ragab, M. Equilibrium Optimization Algorithm with Ensemble Learning Based Cervical Precancerous Lesion Classification Model. *Healthcare* **2023**, *11*, 55. https://doi.org/10.3390/ healthcare11010055

Academic Editor: Mahmudur Rahman

Received: 11 November 2022 Revised: 17 December 2022 Accepted: 21 December 2022 Published: 25 December 2022

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

treatment can be broadly employed in developing nations. Colposcopy screening is employed to address the limitations of HPV testing and PAP smear images [6]. The cervical and other cancers are probably treated at the initial level. However, the lack of symptoms at this phase will hinder the initial diagnosis. CC deaths are evaded by effective screening methods and result in impermanence and lowered sickness [7]. CC screening facilities are very sparse in middle-and-low-income countries due to a lack of educated and experienced healthcare professionals and inadequate funding to fund screening mechanisms.

Some of the important advancements of deep learning (DL) in various applications are battery health monitoring, natural language processing (NLP), forecasting, and computer vision (CV) [8]. Medical image processing, which includes registration, classification, segmentation, and identification, had a significant role in diagnosing disease. Medical images of blood smears, MRI, ultrasound, and CT constitute the major part of the image data processed [9]. The multilayer neural network perception system of DL has more extracted features in images and was anticipated to overcome the challenges plaguing standard CAD systems. Still, the DL methods have to be reinforced with a wide range of datasets, particularly for positive cases [10]. Several ensemble learning and transfer learning (TL) methods were used to solve this problem [11–13].

This study develops an equilibrium optimizer with ensemble learning-based cervical precancerous lesion classification on colposcopy images (EOEL-PCLCCI) technique. The presented EOEL-PCLCCI technique mainly focuses on identifying and classifying cervical cancer on colposcopy images. In the presented EOEL-PCLCCI technique, the DenseNet-264 architecture is used for the feature extractor. Since the trial and error method for hyperparameter tuning is tedious and erroneous, metaheuristic algorithms can be applied. Therefore, in this work, we employ the EO algorithm for the parameter selection of the DenseNet model. An ensemble of weighted voting classifiers, namely long short-term memory (LSTM) and gated recurrent unit (GRU), is used for the classification process. A widespread simulation analysis is performed on a benchmark dataset to depict the enhanced performance of the EOEL-PCLCCI algorithm.

#### **2. Related Works**

Khamparia et al. [14] developed a new Internet of Health Things (IoHT)-based DL algorithm for classifying and recognizing CC in pap smear images with a TL model. Then, CNN was fused with outdated ML approaches. In this work, feature extraction from cervical images can be carried out by pre-trained CNN modules such as ResNet50, InceptionV3, VGG19, and SqueezeNet and are fed into flattened and dense layers for the classification of normal and abnormal CCLs. Shi et al. [15] recommend a classification of CCLs based GCN model. The study aims at exploring the possible relations of CCL images for enhancing the accuracy of classification. The CNN feature of each CCL image was clustered initially, and the inherent relationship of images can be exposed earlier through the clustering. A graph model has been constructed to capture the fundamental correlation among the clusters further.

Allehaibi et al. [16] propose a CCL segmentation with mask regional CNN (Mask R-CNN) and categorizes by a small VGG-like Net. ResNet10 uses prior knowledge and spatial information as the backbone of Mask R-CNN. Chen et al. [17] developed a TL-based snapshot ensemble (TLSE) technique by incorporating them in a unified and coordinated manner. SE technique offers ensemble advantages within a single model training method, whereas TL emphasizes the smaller sampling problems in CCL classification. Archana and Panicker [18] advise a new methodology for the multiclass classification of CCLs with less computing power, optimum feature extraction, and minimal parameters. The application of ConvNet with the TL method validates substantial diagnoses of cancer cells.

Dong et al. [19] proposed a cell classification technique which combines artificial and Inceptionv3 features that considerably enhance the performance of CCL detection. Furthermore, the study inherits the stronger learning capability from TL to address the under-fitting problems and perform effectual DL training with a less quantity of medicinal

datasets and accomplishes precise and effective CCL image classification based on Herlev data. Li et al. [20] introduced an L-PCNN which incorporates a global context dataset and attention module for categorizing CCLs. The cell image was transferred to the improved ResNet50 model for extracting DL features. For extracting deep features, every convolutional block presents an attention module for guiding the network to emphasize the cell region. Next, the network includes a pyramid pooling layer and an LSTM for aggregating image features in distinct areas. tasets and accomplishes precise and effective CCL image classification based on Herlev data. Li et al. [20] introduced an L-PCNN which incorporates a global context dataset and attention module for categorizing CCLs. The cell image was transferred to the improved ResNet50 model for extracting DL features. For extracting deep features, every convolutional block presents an attention module for guiding the network to emphasize the cell region. Next, the network includes a pyramid pooling layer and an LSTM for aggregating image features in distinct areas.

Dong et al. [19] proposed a cell classification technique which combines artificial and Inceptionv3 features that considerably enhance the performance of CCL detection. Furthermore, the study inherits the stronger learning capability from TL to address the underfitting problems and perform effectual DL training with a less quantity of medicinal da-

#### **3. The Proposed Model 3. The Proposed Model**

In this study, we introduced an automated cervical cancer classification model, the EOEL-PCLCCI technique, on colposcopy images. The EOEL-PCLCCI technique uses a DenseNet-264 feature extractor, EO hyperparameter optimizer, and weighted voting classifier. Figure 1 illustrates the working process of the EOEL-PCLCCI system. In this study, we introduced an automated cervical cancer classification model, the EOEL-PCLCCI technique, on colposcopy images. The EOEL-PCLCCI technique uses a DenseNet-264 feature extractor, EO hyperparameter optimizer, and weighted voting classifier. Figure 1 illustrates the working process of the EOEL-PCLCCI system.

*Healthcare* **2023**, *11*, x FOR PEER REVIEW 3 of 16

**Figure 1.** Working process of EOEL-PCLCCI system. **Figure 1.** Working process of EOEL-PCLCCI system.

#### *3.1. Feature Extraction 3.1. Feature Extraction*

In the presented EOEL-PCLCCI technique, the DenseNet-264 architecture is used for the feature extraction. In the typical CNN, every layer is gradually interconnected, making the network difficult to go deeper and wider. Meanwhile, it has a gradient exploding or vanishing problem [21]. Consequently, DenseNet analyzes the module by successively concatenating all the feature maps instead of outputting feature maps from every prior layer in the following: In the presented EOEL-PCLCCI technique, the DenseNet-264 architecture is used for the feature extraction. In the typical CNN, every layer is gradually interconnected, making the network difficult to go deeper and wider. Meanwhile, it has a gradient exploding or vanishing problem [21]. Consequently, DenseNet analyzes the module by successively concatenating all the feature maps instead of outputting feature maps from every prior layer in the following:

$$\mathbf{x}\_{l} = H\_{l}(\mathbf{x}\_{l-1}) \tag{1}$$

$$\mathbf{x}\_{l} = H\_{l}(\mathbf{x}\_{l-1}) + \mathbf{x}\_{l-1} \tag{2}$$

$$\mathbf{x}\_{l} = H\_{l}([\mathbf{x}\_{0\prime} \mathbf{x}\_{1\prime} \ \mathbf{x}\_{2\prime} \ \dots \mathbf{x}\_{l-1}]) \tag{3}$$

*H* indicates the nonlinear function from the expression, and *l* characterizes the layer index. *x<sup>l</sup>* symbolizes the feature of *l*-th layers. DenseNet concurs all the feature maps from previous layers, indicating that all the feature maps are propagated toward the last layer and interconnected toward the new feature maps. The recently designed DenseNet has certain benefits, namely feature reutilization and reduction in gradient exploding or vanishing problems. Once the size of feature maps continuously changes, the concatenation function becomes impossible to be implemented. Among the dense blocks, transition layers exist: convolution, pooling, and BN operations. Meanwhile, each layer receives feature maps from all the previous layers. Note that *k* feature maps are constructed for each *H<sup>l</sup>* operation. Meanwhile, there exist five layers, and we obtain *k*<sup>0</sup> + 4*k* feature maps. *k*<sup>0</sup> symbolizes the number of feature maps from prior layers.

However, there exists a huge quantity of inputs, and bottleneck layers are introduced for the DenseNet, viz., implemented using the 1 × 1 convolution layer beforehand 3 × 3 convolution layers that are beneficial to save the computational cost and decrease the feature map. Subsequently, considering the model compactness, a transition layer is applied to reduce the feature maps: consider *m* feature maps are constructed using DenseBlock and assume the compression factors θ*e*(0, 1). If *θ* = 1, the quantity of feature maps remains unchanged. The DenseNet module encompasses transition layers, input layers, Dense Blocks, and global average pooling (GAP). The transition layer comprises the BN layer, 1 × 12 × 2 convolution, and average pooling layers with a stride of 2.

To adjust the hyperparameters associated with the DenseNet-264 model, the EO algorithm is exploited in this work. The fundamental idea of single objective EO has been established based on the dynamic mass balance [22]. This characteristic can maintain the balance between exploitation and detection and the ability to retain flexibility among individual solutions. In the initialization, EO uses a certain group, while each particle explains the vector of focus that has solutions to the problem.

$$Y\_j^{initial} = lb + rand\_j(ub - lb), \; j = 0, 1, 2, 3, \dots, n \tag{4}$$

*Y initial j* denotes the vector focus on *j*th particles, *ub* and *lb* represent the upper and lower boundaries of each parameter, *rand<sup>j</sup>* indicates the arbitrary integer within [0, 1], and *n* shows the number of particles. Hence, it assigns an equilibrium candidate to the optimal four particles from the population. In the exploitation and exploration methods, these five equilibrium candidate assists EO. The initial four candidates seek optimal exploration. However, the 5th candidate with average values seeks alteration from exploitation.

$$
\stackrel{\rightarrow}{\dot{\mathcal{C}}}\_{\text{eq,pool}} = \left\{ \stackrel{\rightarrow}{\mathcal{C}}\_{\text{eq}(1)}, \stackrel{\rightarrow}{\mathcal{C}}\_{\text{eq}(2)}, \stackrel{\rightarrow}{\mathcal{C}}\_{\text{eq}(3)}, \stackrel{\rightarrow}{\mathcal{C}}\_{\text{eq}(4)}, \stackrel{\rightarrow}{\mathcal{C}}\_{\text{eq}(\text{ave})} \right\} \tag{5}
$$

The upgrade of concentration enables EO to balance exploitation and exploration equally:

$$
\stackrel{\rightarrow}{F} = e^{-\stackrel{\rightarrow}{\lambda}(t - t\_0)}\tag{6}
$$

Equation (6) → *λ* indicates the arbitrary integer within [0, 1], and *t* reduces as the iteration amount enhances.

$$t = \left(1 - \frac{It}{\mathbf{Max\\_it}}\right)^{\left(a\_2 \frac{It}{\mathbf{Max\\_it}}\right)}\tag{7}$$

*It* and Max\_*it* denote existing and maximal iteration counts, and *a*<sup>2</sup> shows the constant control of the ability for exploiting. Another parameter, *a*1, has been employed to enhance exploration and exploitation:

$$t = \frac{1}{\stackrel{\cdot}{\lambda}} \ln \left( -a\_1 \operatorname{sign} \left( \stackrel{\cdot}{\bar{r}} - 0.5 \right) \left[ 1 - e^{-\stackrel{\cdot}{\lambda}t} \right] \right) + t \tag{8}$$

The generation rate is denoted as *G* rises exploitation:

$$
\stackrel{\rightarrow}{G} = \stackrel{\rightarrow}{G\_0}e^{-\stackrel{\rightarrow}{l}(t-t\_0)}\tag{9}
$$

Equation (9) → *l* denotes the arbitrary number within [0, 1], and the initial generation rate represented by → *G*0:

$$
\stackrel{\rightarrow}{G}\_0 = \stackrel{\rightarrow}{G} \stackrel{\rightarrow}{C} P \left( \stackrel{\rightarrow}{\mathcal{C}}\_{\text{eq}} - \stackrel{\rightarrow}{\lambda} \stackrel{\rightarrow}{\mathcal{C}} \right) \tag{10}
$$

$$\stackrel{\rightarrow}{G\dot{C}P} = \begin{cases} \; 0.5r\_1, r\_2 \ge GP \\ \; 0, r\_2 < GP \end{cases} \tag{11}$$

From the expression, the arbitrary integers are denoted by *r*<sup>1</sup> and *r*<sup>2</sup> and vary between zero and one. The vector → *GCP* represents the variable which controls the generation rate executed for the upgrading phase.

$$
\stackrel{\rightarrow}{\dot{C}} = \stackrel{\rightarrow}{\dot{C}} + \left(\stackrel{\rightarrow}{\dot{C}} - \stackrel{\rightarrow}{\dot{C}\_{\text{eq}}}\right) \cdot \stackrel{\rightarrow}{\dot{F}} + \frac{\stackrel{\rightarrow}{\dot{C}}}{\stackrel{\rightarrow}{\dot{\lambda}}V} \left(1 - \stackrel{\rightarrow}{\dot{F}}\right) \tag{12}
$$

The value of *V* corresponds to 1.

#### *3.2. Weighted Voting-Based Ensemble Classification*

An ensemble of weighted voting classifiers, GRU and LSTM, is used for the classification process. The DL algorithm is incorporated, and the maximal result is preferred by the weighted voting method [23]. Considering the *D* base classification and amount of classes as *n* for voting, the predictive class *c<sup>k</sup>* of weighted voting for every instance, *k*, can be defined by:

$$c\_k = \underset{j}{\text{argmax}} \sum\_{i=1}^{D} (\Delta\_{ji} \times w\_i) \tag{13}$$

The expression ∆ji indicates the binary variable. As soon as the *i*-th base classification classifies the *k* instances into *j*-th classes, then ∆ji = 1, or else, ∆ji = 0. *w*<sup>i</sup> shows the weight of *i*-th base classifications:

$$Acc = \frac{\sum\_{k} \{1 | c\_k \text{ is the true class of instance } k\}}{\text{Size of test instances}} \times 100\%. \tag{14}$$

#### 3.2.1. GRU Model

GRU is an LSTM network which inherits the advantages of RNN: it learns features automatically and effectively models long dependency datasets. It is utilized for short-term traffic prediction. Intuitively, input and forget gates are integrated as a reset gate in GRU, which determines how to incorporate the novel input dataset in the previous time. Another gate in GRU is an update gate; it determines the information stored from the previous time to the existing time. Therefore, GRU is one gate lower than LSTM. This makes the

GRU network have faster training speed and lesser variables and needs lesser datasets for efficiently generalizing the system:

$$z\_n = \sigma f(\mathcal{W}\_z \cdot [h\_{n-1}, \mathbf{x}\_n]) \tag{15}$$

$$r\_n = \sigma f(\mathcal{W}\_r \cdot [h\_{n-1} \mid \mathfrak{x}\_n]) \tag{16}$$

$$\overline{h}\_n = \text{tanh}\left(\mathcal{W}\cdot[r\*h\_{n-1}, \mathbf{x}\_n]\right) \tag{17}$$

$$h\_n = (1 - z\_n) \* h\_{n-1} + z\_n \* \overline{h} \tag{18}$$

Equations (15) and (16) illustrate how *rn*, *z<sup>n</sup>* reset, and update gates are evaluated. *W<sup>z</sup>* is the weight of *zn*, 0 denotes the sigmoid function, *W<sup>r</sup>* characterizes the weight of *rn*. A larger value of *z<sup>n</sup>* denotes that data were retained through the present cell *r<sup>n</sup>* and proposes that when the value corresponds to 0, the dataset from the prior cell is eliminated. Equations (17) and (18) demonstrate the estimation of *h<sup>n</sup>* and *h* final and pending output of GRU-NN. *W* characterizes the weight of *zn*, *hn*−<sup>1</sup> denotes the output from the preceding cell, and tan h denotes the hyperbolic tangent function. *h<sup>n</sup>* can be obtained by multiplying *hn*−<sup>1</sup> of the prior cell using *r<sup>n</sup>* and *xn*, multiplying by *W* and tan h. *h<sup>n</sup>* denotes the amount of two vectors. Equations (15) and (16) illustrate how , reset, and update gates are evaluated. is the weight of , 0 denotes the sigmoid function, characterizes the weight of . A larger value of denotes that data were retained through the present cell and proposes that when the value corresponds to 0, the dataset from the prior cell is eliminated. Equations (17) and (18) demonstrate the estimation of ℎ and ℎ final and pending output of GRU-NN. characterizes the weight of , ℎ−1 denotes the output from the preceding cell, and tanh denotes the hyperbolic tangent function. ℎ can be obtained by multiplying ℎ−1 of the prior cell using and , multiplying by and tanh. ℎ denotes the amount of two vectors.

#### 3.2.2. LSTM Model 3.2.2. LSTM Model

The RNN approach was widely employed for predicting and analyzing time sequence datasets. RNN often undergoes the gradient vanishing problem. Hence, it is hard to remember the previous dataset, namely the long dependence problem. To overcome these problems, the LSTM is introduced and applies a gate-controlling method for altering data flow and systematically determines the count of received datasets that are regathered from each time step. Figure 2 represents the architecture of LSTM. The RNN approach was widely employed for predicting and analyzing time sequence datasets. RNN often undergoes the gradient vanishing problem. Hence, it is hard to remember the previous dataset, namely the long dependence problem. To overcome these problems, the LSTM is introduced and applies a gate-controlling method for altering data flow and systematically determines the count of received datasets that are regathered from each time step. Figure 2 represents the architecture of LSTM.

**Figure 2.** The architecture of LSTM. **Figure 2.** The architecture of LSTM.

,

The architecture of the LSTM unit is encompassed by storing unit and three control gates (forget, input, and output gates). and ℎ correspond to the input and hidden state of time . , , and determine the forgetting, input, and output gates. C⃗⃗ z indicates the candidate dataset to the input. The architecture of the LSTM unit is encompassed by storing unit and three control gates (forget, input, and output gates). *x<sup>z</sup>* and *h<sup>z</sup>* correspond to the input and hidden state of time *z*. *fz*, *iz*, and *o<sup>z</sup>* determine the forgetting, input, and output gates. → C<sup>z</sup> indicates the candidate dataset to the input.

dataset of the existing time interval , and ℎ−1 denotes the resultant memory unit from

The proposed method is simulated using a Python tool. The experimental results of the EOEL-PCLCCI model are tested using the Herlev database [21]. Figure 3 demonstrates some sample images. The proposed model is simulated using Python 3.6.5 tool on PC i5- 8600k, GeForce 1050Ti 4 GB, 16 GB RAM, 250 GB SSD, and 1 TB HDD. The parameter

$$f\_z = \sigma f\left(\mathcal{W}\_f \cdot [h\_{z-1} \mid \mathbf{x}\_z] + b\_f\right) \tag{19}$$

$$i\_z = \sigma(\mathcal{W}\_{\dot{l}} \cdot [h\_{z-1}, \ge\_z] + b\_{\dot{l}}) \tag{20}$$

$$o\_z = \sigma(\mathcal{W}\_o \cdot [h\_{z-1} \ \propto\_z] + b\_o) \tag{21}$$

$$\tilde{\mathcal{C}} = \tanh\left(\mathcal{W}\_{\mathbb{C}} \cdot \left[h\_{\mathbb{Z}-1\prime} \,\,\, \mathbb{x}\_{\mathbb{Z}}\right] + b\_{\mathbb{C}}\right) \tag{22}$$

$$\mathbf{C}\_{z} = f\_{z} \cdot \mathbf{C}\_{z-1} + i\_{l} \cdot \widetilde{\mathbf{C}} \tag{23}$$

$$h\_z = o\_\text{Z} \cdot \tanh\left(\mathbb{C}\_z\right) \tag{24}$$

represents the time sequence

and bias vector of forget, input, output, and update state.

, , and ,

the previous time interval − 1.

**4. Results and Discussion**

ReLU.

*W<sup>f</sup>* , *W<sup>i</sup>* , *Wo*, and *W<sup>c</sup> b<sup>f</sup>* , *b<sup>i</sup>* , *bo*, and *b<sup>c</sup>* correspondingly denote the weight matrices and bias vector of forget, input, output, and update state. *x z* represents the time sequence dataset of the existing time interval *z*, and *hz*−<sup>1</sup> denotes the resultant memory unit from the previous time interval *z* − 1.

#### **4. Results and Discussion** *Healthcare* **2023**, *11*, x FOR PEER REVIEW 7 of 16

The proposed method is simulated using a Python tool. The experimental results of the EOEL-PCLCCI model are tested using the Herlev database [21]. Figure 3 demonstrates some sample images. The proposed model is simulated using Python 3.6.5 tool on PC i5-8600k, GeForce 1050Ti 4 GB, 16 GB RAM, 250 GB SSD, and 1 TB HDD. The parameter settings are learning rate: 0.01, dropout: 0.5, batch size: 5, epoch count: 50, and activation: ReLU. settings are learning rate: 0.01, dropout: 0.5, batch size: 5, epoch count: 50, and activation:

**Figure 3.** Sample images. (**a**) Superficial squamous (SSE), (**b**) intermediate squamous (ISE), (**c**) columnar (CE), (**d**) mild dysplasia (MS-NKD), (**e**) moderate dysplasia (MOS-NKD), (**f**) severe dysplasia (SS-NKD), (**g**) carcinoma in situ (SCCSI). **Figure 3.** Sample images. (**a**) Superficial squamous (SSE), (**b**) intermediate squamous (ISE), (**c**) columnar (CE), (**d**) mild dysplasia (MS-NKD), (**e**) moderate dysplasia (MOS-NKD), (**f**) severe dysplasia (SS-NKD), (**g**) carcinoma in situ (SCCSI).

In Figure 4, the confusion matrices of the EOEL-PCLCCI model on cervical cancer In Figure 4, the confusion matrices of the EOEL-PCLCCI model on cervical cancer classification performance are provided. The figure implied that the EOEL-PCLCCI model detected all cervical cancer classes.

classification performance are provided. The figure implied that the EOEL-PCLCCI model detected all cervical cancer classes. Table 1 and Figure 5 demonstrate the overall cervical cancer classification results of the EOEL-PCLCCI technique on entire datasets. The experimental value indicates that the EOEL-PCLCCI method has recognized all different class labels. It is observed that the EOEL-PCLCCI approach has reached an average *accu<sup>y</sup>* of 98.94%, *prec<sup>n</sup>* of 96%, *reca<sup>l</sup>* of 95.61%, *Fscore* of 95.80%, and MCC of 95.18%.

**Figure 4.** Confusion matrices of EOEL-PCLCCI system on cervical cancer classification; (**a**) entire

database, (**b**) 70% of TR database, and (**c**) 30% of TS database.

sia (SS-NKD), (**g**) carcinoma in situ (SCCSI).

detected all cervical cancer classes.

ReLU.

settings are learning rate: 0.01, dropout: 0.5, batch size: 5, epoch count: 50, and activation:

**Figure 3.** Sample images. (**a**) Superficial squamous (SSE), (**b**) intermediate squamous (ISE), (**c**) columnar (CE), (**d**) mild dysplasia (MS-NKD), (**e**) moderate dysplasia (MOS-NKD), (**f**) severe dyspla-

In Figure 4, the confusion matrices of the EOEL-PCLCCI model on cervical cancer

**Figure 4.** Confusion matrices of EOEL-PCLCCI system on cervical cancer classification; (**a**) entire database, (**b**) 70% of TR database, and (**c**) 30% of TS database. **Figure 4.** Confusion matrices of EOEL-PCLCCI system on cervical cancer classification; (**a**) entire database, (**b**) 70% of TR database, and (**c**) 30% of TS database.


**Table 1.** CC outcome of EOEL-PCLCCI system with various classes under entire database.

of 95.61%, of 95.80%, and MCC of 95.18%.

Table 1 and Figure 5 demonstrate the overall cervical cancer classification results of the EOEL-PCLCCI technique on entire datasets. The experimental value indicates that the EOEL-PCLCCI method has recognized all different class labels. It is observed that the EOEL-PCLCCI approach has reached an average of 98.94%, of 96%,

**Figure 5.** Result analysis of the EOEL-PCLCCI system on the entire database in terms of different measures (**a**) , (**b**) , (**c**) , (**d**) , and (**e**) MCC. **Figure 5.** Result analysis of the EOEL-PCLCCI system on the entire database in terms of different measures (**a**) *Accuy*, (**b**) *Precn*, (**c**) *Reca<sup>l</sup>* , (**d**) *Fscore*, and (**e**) MCC.

**Table 1.** CC outcome of EOEL-PCLCCI system with various classes under entire database. **Entire Dataset Labels MCC** SSE 99.35 97.22 94.59 95.89 95.55 Table 2 and Figure 6 illustrate the overall cervical cancer classification results of the EOEL-PCLCCI technique on the TR database. The simulation values exhibited that the EOEL-PCLCCI approach recognized all different class labels. The EOEL-PCLCCI algorithm has attained an average *accu<sup>y</sup>* of 98.84%, *prec<sup>n</sup>* of 95.65%, *reca<sup>l</sup>* of 95.09%, *Fscore* of 95.34%, and MCC of 94.68%.

ISE 98.69 92.65 90.00 91.30 90.61 CE 99.24 95.96 96.94 96.45 96.02 Table 3 and Figure 7 show the overall cervical cancer classification results of the EOEL-PCLCCI approach on the TS database. The simulation values designated that the EOEL-PCLCCI approach has recognized all different class labels. The EOEL-PCLCCI technique has gained an average *accu<sup>y</sup>* of 99.17%, *prec<sup>n</sup>* of 97.02%, *reca<sup>l</sup>* of 97.05%, *Fscore* of 96.96%, and MCC of 96.51%.


**Table 2.** CC outcome of EOEL-PCLCCI system with various classes under TR database.

**Figure 6.** Result analysis of EOEL-PCLCCI system on 70% of TR database in terms of different measures (**a**) *Accuy*, (**b**) *Precn*, (**c**) *Reca<sup>l</sup>* , (**d**) *Fscore*, and (**e**) MCC.


**Table 3.** CC outcome of EOEL-PCLCCI system with various classes under TS database.

**Figure 7.** Result analysis of EOEL-PCLCCI system on 30% of TS database in terms of different measures (**a**) , (**b**) , (**c**) , (**d**) , and (**e**) MCC. **Figure 7.** Result analysis of EOEL-PCLCCI system on 30% of TS database in terms of different measures (**a**) *Accuy*, (**b**) *Precn*, (**c**) *Reca<sup>l</sup>* , (**d**) *Fscore*, and (**e**) MCC.

The TACC and VACC of the EOEL-PCLCCI method are investigated on CC performance in Figure 8. The figure implied that the EOEL-PCLCCI methodology has exhibited improved performance with increased values of TACC and VACC. It is noted that the

The TACC and VACC of the EOEL-PCLCCI method are investigated on CC performance in Figure 8. The figure implied that the EOEL-PCLCCI methodology has exhibited improved performance with increased values of TACC and VACC. It is noted that the EOEL-PCLCCI approach has reached maximum TACC outcomes. *Healthcare* **2023**, *11*, x FOR PEER REVIEW 12 of 16 *Healthcare* **2023**, *11*, x FOR PEER REVIEW 12 of 16

The TLS and VLS of the EOEL-PCLCCI method are tested on CC performance in Figure 9. The figure designated the EOEL-PCLCCI approach has revealed better performance with minimal values of TLS and VLS. It is noted the EOEL-PCLCCI approach has resulted in reduced VLS outcomes. The TLS and VLS of the EOEL-PCLCCI method are tested on CC performance in Figure 9. The figure designated the EOEL-PCLCCI approach has revealed better performance with minimal values of TLS and VLS. It is noted the EOEL-PCLCCI approach has resulted in reduced VLS outcomes. The TLS and VLS of the EOEL-PCLCCI method are tested on CC performance in Figure 9. The figure designated the EOEL-PCLCCI approach has revealed better performance with minimal values of TLS and VLS. It is noted the EOEL-PCLCCI approach has resulted in reduced VLS outcomes.

**Figure 9.** TLS and VLS analysis of EOEL-PCLCCI system.

A clear precision-recall inspection of the EOEL-PCLCCI system under test database is shown in Figure 10. The precision-recall curve shows the tradeoff between precision and recall for different threshold. A high area under the curve represents both high recall and high precision, where high precision relates to a low false positive rate, and high recall relates to a low false negative rate. The figure shows the EOEL-PCLCCI method has resulted in superior values of precision-recall value in all the class labels. A clear precision-recall inspection of the EOEL-PCLCCI system under test database is shown in Figure 10. The precision-recall curve shows the tradeoff between precision and recall for different threshold. A high area under the curve represents both high recall and high precision, where high precision relates to a low false positive rate, and high recall relates to a low false negative rate. The figure shows the EOEL-PCLCCI method has resulted in superior values of precision-recall value in all the class labels.

*Healthcare* **2023**, *11*, x FOR PEER REVIEW 13 of 16

**Figure 9.** TLS and VLS analysis of EOEL-PCLCCI system.

**Figure 10.** Precision-recall analysis of EOEL-PCLCCI system. **Figure 10.** Precision-recall analysis of EOEL-PCLCCI system.

The detailed ROC analysis of the EOEL-PCLCCI system under the test database is shown in Figure 11. ROC curves summarize the trade-off between the true positive rate and false positive rate for a predictive model using different probability thresholds. The outcomes exhibited by the EOEL-PCLCCI methodology has signified its ability to categorize distinct classes in test database. The experimental results of the EOEL-PCLCCI model are compared with other DL The detailed ROC analysis of the EOEL-PCLCCI system under the test database is shown in Figure 11. ROC curves summarize the trade-off between the true positive rate and false positive rate for a predictive model using different probability thresholds. The outcomes exhibited by the EOEL-PCLCCI methodology has signified its ability to categorize distinct classes in test database. *Healthcare* **2023**, *11*, x FOR PEER REVIEW 14 of 16

**Figure 12.** Comparative analysis of EOEL-PCLCCI algorithm with recent approaches.

**Figure 11.** ROC curve analysis of EOEL-PCLCCI system. **Figure 11.** ROC curve analysis of EOEL-PCLCCI system.

The experimental results of the EOEL-PCLCCI model are compared with other DL models in Table 4 and Figure 12 [24,25]. The result implies that the ShuffleNet and ShuffleNet\_SE models have shown lower performance, whereas the ResNet34 and DenseNet121 models have reported moderately improved performance.


**Table 4.** Comparative analysis of EOEL-PCLCCI algorithm with recent approaches.

*Healthcare* **2023**, *11*, x FOR PEER REVIEW 14 of 16

**Figure 12.** Comparative analysis of EOEL-PCLCCI algorithm with recent approaches. **Figure 12.** Comparative analysis of EOEL-PCLCCI algorithm with recent approaches.

In contrast, the Mor-27 and ResNet-101 models have tried to obtain reasonable outcomes. Although the GCN model has shown near-optimal performance, the EOEL-PCLCCI model has shown enhanced results with *accu<sup>y</sup>* of 99.17%, *prec<sup>n</sup>* of 97.02%, *reca<sup>l</sup>* of 97.05%, and *Fscore* of 96.96%. Therefore, the EOEL-PCLCCI model has shown superior results over other models.

#### **5. Conclusions**

In this study, we have introduced an automated cervical cancer classification method, named EOEL-PCLCCI algorithm on colposcopy images. In the presented EOEL-PCLCCI technique, the DenseNet-264 architecture is used for feature extraction and EO algorithm is applied as a hyperparameter optimizer. For classification process, an ensemble of weighted voting classifiers namely GRU and LSTM is used. A widespread simulation analysis is performed on benchmark dataset to depict the superior performance of the EOEL-PCLCCI technique, and the results demonstrate the superiority of the EOEL-PCLCCI algorithm over other DL models with maximum accuracy of 99.17%. Thus, the EOEL-PCLCCI approach can be used for cervical cancer classification effectively. In the future, the performance of EOEL-PCLCCI technique needs to be enhanced by deep instance segmentation.

**Author Contributions:** Conceptualization, R.A.M.; Methodology, R.A.M.; Software, M.R.; Formal analysis, R.A.M.; Investigation, R.A.M.; Resources, M.R.; Data curation, R.A.M.; Writing—original draft, R.A.M.; Writing—review & editing, M.R.; Supervision, M.R.; Funding acquisition, M.R. All authors have read and agreed to the published version of the manuscript.

**Funding:** This project was financed by the Deanship of Scientific Research (DSR) at King Abdul-Aziz University (KAU), Jeddah, Saudi Arabia, under grant no. (G: 246-247-1443). The authors, therefore, thank DSR for technical and financial support.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** This article does not contain any studies with human participants performed by authors.

**Data Availability Statement:** Data sharing does not apply to this article as no datasets were generated during the current study.

**Conflicts of Interest:** The authors declare that they have no conflict of interest. The manuscript was written through the contributions of all authors. All authors have approved the final version of the manuscript.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

## *Article* **Melanoma Detection Using Deep Learning-Based Classifications**

**Ghadah Alwakid 1,\*, Walaa Gouda <sup>2</sup> , Mamoona Humayun <sup>3</sup> and Najm Us Sama <sup>4</sup>**


**Abstract:** One of the most prevalent cancers worldwide is skin cancer, and it is becoming more common as the population ages. As a general rule, the earlier skin cancer can be diagnosed, the better. As a result of the success of deep learning (DL) algorithms in other industries, there has been a substantial increase in automated diagnosis systems in healthcare. This work proposes DL as a method for extracting a lesion zone with precision. First, the image is enhanced using Enhanced Super-Resolution Generative Adversarial Networks (ESRGAN) to improve the image's quality. Then, segmentation is used to segment Regions of Interest (ROI) from the full image. We employed data augmentation to rectify the data disparity. The image is then analyzed with a convolutional neural network (CNN) and a modified version of Resnet-50 to classify skin lesions. This analysis utilized an unequal sample of seven kinds of skin cancer from the HAM10000 dataset. With an accuracy of 0.86, a precision of 0.84, a recall of 0.86, and an F-score of 0.86, the proposed CNN-based Model outperformed the earlier study's results by a significant margin. The study culminates with an improved automated method for diagnosing skin cancer that benefits medical professionals and patients.

**Keywords:** deep learning; machine learning; convolutional neural network; HAM10000; skin lesion; ESRGAN

## **1. Introduction**

Cells that proliferate and divide uncontrollably are referred to as "cancer"; they can quickly spread and invade nearby tissues if left untreated. Any sort of cancer, not just skin cancer, has the most significant probability of developing into a malignant tumor [1,2]. Melanoma (mel), Basal-cell carcinoma (BCC), nonmelanoma skin cancer (NMSC), and squamous-cell carcinoma (SCC) are the most common forms of skin cancer. It should be noted that some kinds of skin cancer, such as actinic keratosis (akiec), Kaposi sarcoma (KS), and sun keratosis (SK) are scarce [3]. Skin cancer of all varieties is increasing, as illustrated in Figure 1.

Malignant and non-malignant skin cancers are the most common [1,5]. The presence of cancerous lesions exacerbates cancer morbidity and healthcare expenses. Consequently, scientists have focused their efforts on creating algorithms that are both highly precise and flexible when it comes to spotting early signs of cancer in the skin. Malignant melanocyte cells proliferate, invade, and disseminate rapidly; therefore, early detection is critical [6]. Dermoscopy and epiluminescence microscopy (ELM) are frequently used by specialists to identify if a skin lesion is benign or cancerous.

**Citation:** Alwakid, G.; Gouda, W.; Humayun, M.; Sama, N.U. Melanoma Detection Using Deep Learning-Based Classifications. *Healthcare* **2022**, *10*, 2481. https:// doi.org/10.3390/healthcare10122481

Academic Editor: Mahmudur Rahman

Received: 14 November 2022 Accepted: 5 December 2022 Published: 8 December 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

**Figure 1.** According to Reference [4], a number of different types of skin cancer are widespread. **Figure 1.** According to Reference [4], a number of different types of skin cancer are widespread.

Malignant and non-malignant skin cancers are the most common [1,5]. The presence of cancerous lesions exacerbates cancer morbidity and healthcare expenses. Consequently, scientists have focused their efforts on creating algorithms that are both highly precise and flexible when it comes to spotting early signs of cancer in the skin. Malignant melanocyte cells proliferate, invade, and disseminate rapidly; therefore, early detection is critical [6]. Dermoscopy and epiluminescence microscopy (ELM) are frequently used by specialists to identify if a skin lesion is benign or cancerous. A magnifying lens and light in dermatology are used to see medical patterns better such as hues, veils, pigmented nets, globs, and ramifications [7,8].. Visually impaired people can see the morphological structures that are otherwise hidden. These include the ABCD (Asymmetrical form, Border anomaly, Color discrepancy, Diameter, and Evolution) [9], 7-point checklist [10], and pattern analysis [11]. Non-professional dermoscopic images have a predictive value of 75% to 80% for Melanoma, but the interpretation takes time and is highly subjective, depending on the experience of the dermatologist [12]. Computer-Aided Diagnosis (CAD) approaches have made it easier to overcome these difficulties [8,12]. CAD of malignancies made a giant leap forward thanks to Deep Learning (DL) based Artificial Intelligence (AI) [13,14]. In rural areas, dermatologists and labs are in poor supply; therefore, using DL approaches to classify skin lesions could help automate skin cancer screening and early detection [15,16]. To classify images in the past, dermoscopic A magnifying lens and light in dermatology are used to see medical patterns better such as hues, veils, pigmented nets, globs, and ramifications [7,8].. Visually impaired people can see the morphological structures that are otherwise hidden. These include the ABCD (Asymmetrical form, Border anomaly, Color discrepancy, Diameter, and Evolution) [9], 7-point checklist [10], and pattern analysis [11]. Non-professional dermoscopic images have a predictive value of 75% to 80% for Melanoma, but the interpretation takes time and is highly subjective, depending on the experience of the dermatologist [12]. Computer-Aided Diagnosis (CAD) approaches have made it easier to overcome these difficulties [8,12]. CAD of malignancies made a giant leap forward thanks to Deep Learning (DL)-based Artificial Intelligence (AI) [13,14]. In rural areas, dermatologists and labs are in poor supply; therefore, using DL approaches to classify skin lesions could help automate skin cancer screening and early detection [15,16]. To classify images in the past, dermoscopic images strongly depended on the extraction of handcrafted characteristics [17,18]. Throughout these promising scientific advances, the actual deployment of DCNN-based dermoscopic pictures has yielded amazing results. Still, future development of diagnosis accuracy is hampered by various obstacles, such as inadequate training data and imbalanced datasets, especially for rare and comparable lesion types. Regardless of the restrictions of the dataset, it is vital to maximize the performance of DCNNs for the correct Classification of skin lesions [14,19].

images strongly depended on the extraction of handcrafted characteristics [17,18]. Throughout these promising scientific advances, the actual deployment of DCNN-based dermoscopic pictures has yielded amazing results. Still, future development of diagnosis accuracy is hampered by various obstacles, such as inadequate training data and imbalanced datasets, especially for rare and comparable lesion types. Regardless of the re-Models such as CNN, and modified Resnet50, are used in this research. We found that the invented CNN model beats existing DCNNs in classification accuracy while testing their performance on the HAM10000 dataset. To select the best network for diverse medical imaging datasets, it may be necessary to conduct multiple experiments. Accordingly, the paper's primary contributions can be summarized in this way:


6. The recommended technique's overall effectiveness has been enhanced due to this change. Overfitting is prevented by using an alternative training process supported by applying various training strategies (e.g., batch size, learning rate, validation patience, and data augmentation).

This study provides an optimization strategy incorporating a CNN model a transfer learning model for detecting multiple skin lesions. Additionally, we utilized a revised form of Resnet-50 to train the weights of each Model before using it. Comparing the models' output using images of skin lesions from the HAM10000 dataset is necessary. The dataset has a class imbalance, necessitating an oversampling approach. The paper will proceed in accordance with this arrangement. Section 2 describes the relevant research work; after that, Section 3 illustrates the dataset and the proposed approach. The following Section 4 provides and analyzes the outcomes of the suggested technique described in Section 3; this study concludes with Section 5.

#### **2. Related Work**

The development of a CAD procedure for skin cancer has been the basis of several investigations [21,22]. CAD systems have followed the standard medical image analysis pipeline using classical machine learning approaches for skin lesion image processing [21]. In this pipeline, image preparation, fragmentation, extraction of features, and classifications have all been tried numerous times with little success. In skin cancer research, image processing, machine learning, CNN, and DL have all been used [23] in the past. Traditional image identification algorithms necessitate feature estimate and extraction, whereas deep learning can automatically exploit the images' deep nonlinear relationships [24,25]. CNN was the first DL model employed for skin lesion image processing. Some of the most recent deep learning studies are summarized in the following lines.

For instance, Haenssle et al. [26] analyzed a Google Inception V4 deep learning model against 58 dermatologists' diagnoses. The data collection includes one hundred patients' images (dermoscopic and digitalized) and medical records. Additional research presented by Albahar [24] generated an improved DL model for detecting malignant Melanoma. Model results were compared to dermatologist diagnoses from 12 German hospitals, where 145 dermatologists used the Model to arrive at their conclusions. Li et al. [27] reviewed CNN deep learning models with 99.5 percent of the time; residual learning and separable convolution are the greatest methods for constructing the most accurate Model. This level of precision, however, was only possible since the problem was binary in nature.

For automated Diagnosis, Pacheco et al. [25] developed a smartphone app that used images of skin lesions and clinical data to identify them. The study looked at the skin lesions of 1641 persons with six types of cancer. An experimental three-layer convolutional neural network, GoogleNet, ResNet, VGGNet, and MobileNet, was compared by researchers. Initially, images of lesions taken with smartphones were used as teaching aids, but later, images of both sorts of lesions were included (clinical descriptions and images of skin lesions). The original Model's accuracy was 0.69 percent, but clinical data increased that to 0.764 percent. To improve upon Pacheco's findings, a new study was proposed. Based on dermal cell images, a model-driven framework for melanoma diagnosis was created by Kadampur and Riyaee [27]. With the help of the HAM10000 dataset, several deep-learning models attained an area under the curve (AUC) of 0.99. To categorize malignant and benign skin lesions, two CNN models were employed by Jinnai et al. [28]. Results of the Model were compared to dermatologists' diagnoses and found to have a superior classification accuracy than dermatologists, according to the results.

Furthermore, Prassanna et al. [29] proposed a deep learning-based system for highlevel skin lesion segmentation and malignancy detection by building a neural network. It accurately recognizes the edge of a significant lesion and designs a mobile phone model using deep neural network transfer learning and fine-tuning to improve prediction accuracy. Another approach presented by Panja et al. [30] classifies skin cancer as melanoma or benign; feature extraction was used to retrieve damaged skin cell features using a CNN model

after segmenting skin images. In [31], researchers classify ISIC 2019 Dataset photos into eight classes. ResNet-50 was used to train the Model by evaluating initial parameter values and altering them using transfer training. Images outside these eight classifications are classified as unknown.

Skin cancer detection relied heavily on the transfer learning idea. According to Kassem et al. [32], a study utilizing the GoogleNet pre-trained model for eight categories of skin cancer lesions produced an accuracy of 0.949. This time, a dermatoscope, a medical device used to examine skin lesions, was used to test the proposed YOLOv2-SquuezeNet's segmentation and drawback performance. Using the equipment considerably improved the capacity to make an early diagnosis. Table 1 shows that several deep-learning models have been implemented to categorize skin cancer in current history.


**Table 1.** Methods, data, and results for skin cancer detection available nowadays.

#### **3. Research Methodology**

The authors of this study developed a smart classification algorithm and an automated skin lesion segmentation based on dermoscopic images. We used Resnet-50 and a CNN to perform machine learning in this case.

#### *3.1. Dataset Overview*

Skin Cancer MNIST: HAM10000 [20] provided the benchmark datasets used in this investigation. The CC-BY-NC-SA-4.0 licensed dataset is a reliable source of information for skin cancer diagnosis. Kaggle's public Imaging Archive was used to gather the data. A total of 10,015 JPEG skin cancer training images from two locations, one in Vienna, Austria, and the other in Queensland, Australia, were just compiled into a single dataset for training purposes. The Australian site used PowerPoint files and Excel databases to hold images and metadata. The Austrian site started collecting images with pre-digital cameras and preserved them in several formats. Based on the research, a variety of approaches are endorsed [31–40]. Using data from this benchmark, the Resnet-50 and the suggested CNN are trained to identify skin cancer in this study. In this dataset, all of the essential diagnostic categories for pigmented lesions were included, such as: akiec, benign keratosis-like lesions

(bkl), bcc, dermatofibroma (df), melanocytic nevi (nv), mel, and vascular lesions (vasc). The HAM10000 dataset is presented in the illustrative form Figure 2. tic categories for pigmented lesions were included, such as: akiec, benign keratosis-like lesions (bkl), bcc, dermatofibroma (df), melanocytic nevi (nv), mel, and vascular lesions (vasc). The HAM10000 dataset is presented in the illustrative form Figure 2.

A total of 10,015 JPEG skin cancer training images from two locations, one in Vienna, Austria, and the other in Queensland, Australia, were just compiled into a single dataset for training purposes. The Australian site used PowerPoint files and Excel databases to hold images and metadata. The Austrian site started collecting images with pre-digital cameras and preserved them in several formats. Based on the research, a variety of approaches are endorsed [31–40]. Using data from this benchmark, the Resnet-50 and the suggested CNN are trained to identify skin cancer in this study. In this dataset, all of the essential diagnos-

*Healthcare* **2022**, *10*, x FOR PEER REVIEW 5 of 19

**Figure 2.** Examples of HAM10000 Dataset. **Figure 2.** Examples of HAM10000 Dataset.

#### *3.2. Proposed Methodology 3.2. Proposed Methodology*

Figure 3 depicts the overall process of the suggested method, based on the dataset mentioned in this article, which was used to develop an automatic skin lesion classification model. Dermoscopic skin lesion images are utilized to aid in the Classification of skin cancer, and the proposed Model's entire operational approach displays the functional architecture of that module. After preprocessing, Classification and Resnet-50/CNN-based training are the primary steps in the given Model's functioning. ESRGAN is used to perform the initial preprocessing step, which includes image quality improvement. Ground truth images are then used to determine an augmented image's region of interest (ROI) for each segmented lesion. Lastly, the dermoscopy image is sent to the Resnet-50/CNN models for instantaneous skin lesion and smart classification training and exposure. An intelligent classification model and an automated procedure for segmenting skin lesions are used to create the following sections of the research study, which are described in depth on each stage in the process. Figure 3 depicts the overall process of the suggested method, based on the dataset mentioned in this article, which was used to develop an automatic skin lesion classification model. Dermoscopic skin lesion images are utilized to aid in the Classification of skin cancer, and the proposed Model's entire operational approach displays the functional architecture of that module. After preprocessing, Classification and Resnet-50/CNN-based training are the primary steps in the given Model's functioning. ESRGAN is used to perform the initial preprocessing step, which includes image quality improvement. Ground truth images are then used to determine an augmented image's region of interest (ROI) for each segmented lesion. Lastly, the dermoscopy image is sent to the Resnet-50/CNN models for instantaneous skin lesion and smart classification training and exposure. An intelligent classification model and an automated procedure for segmenting skin lesions are used to create the following sections of the research study, which are described in depth on each stage in the process.

#### 3.2.1. ESRGAN Preprocessing

It was important to improve the quality of dermoscopic images and eliminate multiple kinds of noise from skin lesion images in order to carry out the proposed strategy. Ensuring that the image is as clear as possible is critical to creating a reliable skin lesion categorization model. In this step, first, we perform ESRGAN to improve the overall accuracy of the image; after that, Augmentation of data is used to overcome the problem of class unbalanced; then, all the images are resized to 224 × 224 × 3, and, finally, normalization is performed.

SRGAN [43], Enhanced SRGAN, and other approaches can help improve skin lesions. The Enhanced Super Resolution GAN is an improved version of the Super Resolution GAN [44]. Regarding Model micro gradients, a Convolution trunk or a basic Residual Network is unnecessary. In addition, there is no batch normalization layer in the Model to smooth out the image. As a result, ESRGAN images can better resemble the sharp edges of image artifacts. ESRGAN employs a Relativistic Discriminator to decide whether an image is true or false [45]. The results are more accurate using this strategy. Relativistic Average Loss and Pixelwise Absolute Difference are used as loss functions in training

data. The abilities of the generator can be honed through a two-stage training process. Local minima can be avoided by reducing the pixelwise L<sup>1</sup> distance among the source and targeting high images. Second, the smallest artifacts are to be improved and refined in the second step. It is interpolated between the adversarially trained models and the L<sup>1</sup> loss for a photo-realistic reconstruction of the original scene. *Healthcare* **2022**, *10*, x FOR PEER REVIEW 6 of 19

**Figure 3.** An overall process of how to recognize skin cancer. **Figure 3.** An overall process of how to recognize skin cancer.

3.2.1. ESRGAN Preprocessing It was important to improve the quality of dermoscopic images and eliminate multiple kinds of noise from skin lesion images in order to carry out the proposed strategy. Ensuring that the image is as clear as possible is critical to creating a reliable skin lesion categorization model. In this step, first, we perform ESRGAN to improve the overall ac-In order to discriminate between super-resolved images and genuine photo images, a discriminator network was trained. Lesion images were improved by rearranging brightness values in the histogram of the original image using an adaptive contrast enhancement technique. As a result, the procedure in Figure 4 improves the appearance of the picture's borders and arcs while also raising the image's contrast. *Healthcare* **2022**, *10*, x FOR PEER REVIEW 7 of 19

curacy of the image; after that, Augmentation of data is used to overcome the problem of

a discriminator network was trained. Lesion images were improved by rearranging brightness values in the histogram of the original image using an adaptive contrast en-**Figure 4.** Proposed image-enhancement algorithm results; (**a**) image in its raw form; (**b**) an enhanced version of that image. **Figure 4.** Proposed image-enhancement algorithm results; (**a**) image in its raw form; (**b**) an enhanced version of that image.

#### hancement technique. As a result, the procedure in Figure 4 improves the appearance of the picture's borders and arcs while also raising the image's contrast. 3.2.2. Segmentation 3.2.2. Segmentation

as demonstrated in Figure 5.

Following the protocol for preparing images, ROI from the dermoscopy image is segmented. To generate ROI in each image, a ground truth mask, which was provided by the Following the protocol for preparing images, ROI from the dermoscopy image is segmented. To generate ROI in each image, a ground truth mask, which was provided

HAM10000 dataset for general purpose usage, would be applied to the enhanced image,

(**a**) (**b**) (**c**)

by the HAM10000 dataset for general purpose usage, would be applied to the enhanced image, as demonstrated in Figure 5. mented. To generate ROI in each image, a ground truth mask, which was provided by the HAM10000 dataset for general purpose usage, would be applied to the enhanced image, as demonstrated in Figure 5.

Following the protocol for preparing images, ROI from the dermoscopy image is seg-

**Figure 4.** Proposed image-enhancement algorithm results; (**a**) image in its raw form; (**b**) an enhanced

*Healthcare* **2022**, *10*, x FOR PEER REVIEW 7 of 19

(**a**) (**b**)

version of that image.

3.2.2. Segmentation

**Figure 5.** Samples of (**a**) original Image, (**b**) ground truth, and (**c**) the segmented ROI.

#### 3.2.3. Data Augmentation

We performed data augmentation on the training set before exposing the deep neural network to the original dataset images in order to boost the dataset's image number and address the issue of an imbalanced dataset. Adding more training data to deep learning models improves their overall performance. We can use the nature of dermatological images to apply many alterations to each image. The deep neural network does not suffer if the image is magnified, flipped horizontally/vertically, or rotated in a specific number of degrees. Regularizing the data and reducing overfitting are two goals of data augmentation, as well as addressing the dataset imbalance issue. The horizontal shift augmentation is one of the transformations used in this study; it adjusts the image pixels horizontally while maintaining the image dimension using an integer between zero and one indicating the step size for this process. Rotation is another transformation; a rotation angle between 0 and 180 is selected, and then the image is rotated randomly. The images were resized with a zoom range of 0.1, a rescale of 1.0/255, and a recommended input size of 244 × 244 × 3. In order to generate new samples for the network, all previous modifications are applied to the training set's images. Figure 6 demonstrates how adding slightly changed copies of either current data or new synthetic data produced from the existing data is the primary goal of data augmentation.

**Figure 5.** Samples of (**a**) original Image, (**b**) ground truth, and (**c**) the segmented ROI.

We performed data augmentation on the training set before exposing the deep neural network to the original dataset images in order to boost the dataset's image number and address the issue of an imbalanced dataset. Adding more training data to deep learning models improves their overall performance. We can use the nature of dermatological images to apply many alterations to each image. The deep neural network does not suffer if the image is magnified, flipped horizontally/vertically, or rotated in a specific number of degrees. Regularizing the data and reducing overfitting are two goals of data augmentation, as well as addressing the dataset imbalance issue. The horizontal shift augmentation is one of the transformations used in this study; it adjusts the image pixels horizontally while maintaining the image dimension using an integer between zero and one indicating the step size for this process. Rotation is another transformation; a rotation angle between 0 and 180 is selected, and then the image is rotated randomly. The images were resized with a zoom range of 0.1, a rescale of 1.0/255, and a recommended input size of 244\*244\*3. In order to generate new samples for the network, all previous modifications are applied to the training set's images. Figure 6 demonstrates how adding slightly changed copies of either current data or new synthetic data produced from the existing data is the primary

**Figure 6.** Samples of image augmentation for the same image. **Figure 6.** Samples of image augmentation for the same image.

3.2.3. Data Augmentation

goal of data augmentation.

Using data augmentation approaches, researchers can overcome the problem of inconsistent sample sizes and complex classifications. This dataset, the HAM dataset, clearly illustrates the term "imbalanced class", which refers to the unequal distribution of samples across distinct classes, as described in Table 2 and Figure 7. Following the augmentation approaches, the new dataset is shown in Figure 8. The classes are clearly balanced after using augmentation techniques on the dataset. **Table 2.** A balanced dataset resulting from applying Augmentation (oversampling) techniques. As part of the data expansion, segmented photos were included. Using data augmentation approaches, researchers can overcome the problem of inconsistent sample sizes and complex classifications. This dataset, the HAM dataset, clearly illustrates the term "imbalanced class", which refers to the unequal distribution of samples across distinct classes, as described in Table 2 and Figure 7. Following the augmentation approaches, the new dataset is shown in Figure 8. The classes are clearly balanced after using augmentation techniques on the dataset.

**Class Number of Training Images**  Akiec 5684 Bcc 5668 **Table 2.** A balanced dataset resulting from applying Augmentation (oversampling) techniques. As part of the data expansion, segmented photos were included.


**HAM10000 IMAGES**

**Figure 8.** Balanced dataset after applying augmentation techniques.

**Vasc 14%** **Akiec 15%**

DL approach is presented in the next sections.

This section describes the basic theory of the adopted approaches, and the proposed

**Bcc 14%**

**Mel 15%**

**Figure 7.** Unbalanced dataset before applying augmentation techniques. **Figure 7.** Unbalanced dataset before applying augmentation techniques.

**Bkl 15%**

**Df 12%**

> **Nv 15%**

3.2.4. Learning Models

**Figure 7.** Unbalanced dataset before applying augmentation techniques.

**Nv 67%**

**Df 1%**

**Bkl 11%** **Akiec 3% Bcc 5%**

Akiec Bcc Mel Vasc Nv Df Bkl

**HAM10000 IMAGES**

**Figure 8.** Balanced dataset after applying augmentation techniques. **Figure 8.** Balanced dataset after applying augmentation techniques.

#### 3.2.4. Learning Models 3.2.4. Learning Models

This section describes the basic theory of the adopted approaches, and the proposed DL approach is presented in the next sections. This section describes the basic theory of the adopted approaches, and the proposed DL approach is presented in the next sections.

Vasc 5570 Nv 5979 Df 4747 Bkl 5896 Total 39,430

> **Mel 11%**

> > **Vasc 2%**

## Model Training Using CNN

Dermoscopic images of a single skin lesion were utilized for training the Model using a CNN classifier. A suitable input set for CNN is made up of many skin cancers, such as melanomas and nonmelanomas, basal cell carcinomas and squamous cell carcinomas and Melanoma, Merkel cell carcinomas and cutaneous T-cell lymphomas, as well as Kaposi sarcoma.

As depicted in Figure 9, the proposed CNN architecture includes a proposed classification model that aids in strengthening the accuracy of the proposed mechanism's Classification. In terms of artificial neural networks, CNNs are the most advanced, thanks to their deep architectures-based architecture (ANNs). It was not until 1989 that LeCun et al. [46,47] presented the notion of CNN, an enhanced and more complex version of ANN with a deep architectural structure, as presented in Figure 9. The segmented ROIs are sent as source data to the convolutional layer of CNN when they are thrown into a convolution with a set of trainable filters to plot out the attributes.

Convolution, activation, pooling, and fully interconnected layers are all part of the basic structure of CNN as depicted in Figure 9. The proposed CNN model has four main layers and an output layer; each of these layers is composed of three convolution layers with a kernel size of three for the first two convolution layers and a kernel size of five for the final convolution layer. Stride equal to one for the first two convolution layers and stride equal to two for the final convolution layer are used; the relu activation function is used for all layers; and, finally, there are three maxpooling layers with a pool size of three and a stride of one. The convolution layer acts as a "filter", taking the observed pixel values from the input image and transforming them into a single value using the convolution process. When the convolution layer is applied, the original images will be reduced to a smaller matrix. In order to improve the filtered images, backpropagation training will be used. Down-sampling and shrinking the matrix size will help speed up training, as

*Healthcare* **2022**, *10*, x FOR PEER REVIEW 10 of 19

Model Training Using CNN

sarcoma.

the pooling layer has this purpose. After that, the classification results are output by the completely linked layer (a typical multilayer perceptron). with a deep architectural structure, as presented in Figure 9. The segmented ROIs are sent as source data to the convolutional layer of CNN when they are thrown into a convolution with a set of trainable filters to plot out the attributes.

Dermoscopic images of a single skin lesion were utilized for training the Model using a CNN classifier. A suitable input set for CNN is made up of many skin cancers, such as melanomas and nonmelanomas, basal cell carcinomas and squamous cell carcinomas and Melanoma, Merkel cell carcinomas and cutaneous T-cell lymphomas, as well as Kaposi

As depicted in Figure 9, the proposed CNN architecture includes a proposed classification model that aids in strengthening the accuracy of the proposed mechanism's Classification. In terms of artificial neural networks, CNNs are the most advanced, thanks to their deep architectures-based architecture (ANNs). It was not until 1989 that LeCun et al. [46,47] presented the notion of CNN, an enhanced and more complex version of ANN

**Figure 9.** Proposed CNN architecture. **Figure 9.** Proposed CNN architecture.

Convolution, activation, pooling, and fully interconnected layers are all part of the Model Training Using Modified Resnet-50

basic structure of CNN as depicted in Figure 9. The proposed CNN model has four main layers and an output layer; each of these layers is composed of three convolution layers with a kernel size of three for the first two convolution layers and a kernel size of five for the final convolution layer. Stride equal to one for the first two convolution layers and stride equal to two for the final convolution layer are used; the relu activation function is used for all layers; and, finally, there are three maxpooling layers with a pool size of three and a stride of one. The convolution layer acts as a "filter", taking the observed pixel values from the input image and transforming them into a single value using the convolution process. When the convolution layer is applied, the original images will be reduced to a The Fundamental architecture of the proposed system is founded on the Resnet-50 Model. DL models must account for a staggering amount of structures and hyperparameters (e.g., number of frozen layers, batch size, epochs, and learning rate, etc.). The effect of numerous hyperparameter settings on system functioning is investigated. The Resnet-50 [48] model is updated in this part to serve as a basis for a possible solution. A novel residual learning attribute of the CNN design was created in 2015 by He K. et al. [48]. A standard layer compensates for the residual unit with a missing connection. By connecting to the layer's output, the skip connection makes it possible for an input signal to pass throughout the network.

smaller matrix. In order to improve the filtered images, backpropagation training will be used. Down-sampling and shrinking the matrix size will help speed up training, as the pooling layer has this purpose. After that, the classification results are output by the completely linked layer (a typical multilayer perceptron). Model Training Using Modified Resnet-50 A 152-layer model was created due to the residual units, which achieved the 2015 LSVRC2015 competition. Its new residual architecture makes gradient flow and training easier and more efficient thanks to its novel residual structure. A mistake rate of less than 3.6 percent is among the best in the industry. Other variants of ResNet have 34, 50, or 101 layers. Figure 10 shows the original Resnet-50 Model and its modified variants, which we analyze in this study. Figure 10a shows the initial Resnet-50 Model.

The Fundamental architecture of the proposed system is founded on the Resnet-50 Model. DL models must account for a staggering amount of structures and hyperparameters (e.g., number of frozen layers, batch size, epochs, and learning rate, etc.). The effect Figure 10b demonstrates how the proposed two versions are built: We add a fully connected (FC) layer and two more FC layers, replacing the existing FC and softmax layers in both versions. This Model's first two layers were trained using the ImageNet dataset [49]. That is why at the beginning the additional layers' weights will be chosen at random. The weights of all models are then updated using backpropagation, the key algorithm for training neural network architecture. Figure 10b shows how Resnet-50's initial FC layer, which was already deleted and substituted by the new FC layer of size 512, was swapped with another FC layer of size three and a new softmax layer that was replaced with a novel softmax layer (Figure 10b). The system needs more FC layers for tiny datasets than that for larger ones [50,51].

analyze in this study. Figure 10a shows the initial Resnet-50 Model.

throughout the network.

of numerous hyperparameter settings on system functioning is investigated. The Resnet-50 [48] model is updated in this part to serve as a basis for a possible solution. A novel residual learning attribute of the CNN design was created in 2015 by He K. et al. [48]. A standard layer compensates for the residual unit with a missing connection. By connecting to the layer's output, the skip connection makes it possible for an input signal to pass

A 152-layer model was created due to the residual units, which achieved the 2015 LSVRC2015 competition. Its new residual architecture makes gradient flow and training easier and more efficient thanks to its novel residual structure. A mistake rate of less than 3.6 percent is among the best in the industry. Other variants of ResNet have 34, 50, or 101 layers. Figure 10 shows the original Resnet-50 Model and its modified variants, which we

**Figure 10.** Versions of the Resnet-50 model that were modified; (**a**) the initial pre-trained model; (**b**) the addition of one FC. **Figure 10.** Versions of the Resnet-50 model that were modified; (**a**) the initial pre-trained model; (**b**) the addition of one FC.

Figure 10b demonstrates how the proposed two versions are built: We add a fully connected (FC) layer and two more FC layers, replacing the existing FC and softmax layers in both versions. This Model's first two layers were trained using the ImageNet dataset [49]. That is why at the beginning the additional layers' weights will be chosen at random. The weights of all models are then updated using backpropagation, the key algorithm for training neural network architecture. Figure 10b shows how Resnet-50's initial FC layer, which was already deleted and substituted by the new FC layer of size 512, was swapped In the completely linked layer, all neurons are coupled to all other neurons in the layer above and below it. Grading is determined by an activation function that accepts the output from the final FC layer. One of the most popular classifiers in DNN is Softmax, which uses its equations to calculate the probability dissemination of the n output sets. Only the high computational cost of adding a single FC layer prevents this approach from being widely adopted.

One thousand twenty-four bytes make up the first FC layer; three bytes make up the third FC layer. We employ batch normalization to combat network overfitting; this takes place when a model does a great job of retaining information from its training data but lacks the ability to transfer that knowledge to novel testing data. To put it another way, this problem is more likely to arise when the training dataset is small. To account for the inherent unpredictability of the algorithm's numerous phases, deep neural networks (DNNs) always produce somewhat variable results [52]. Ensemble learning can be used to maximize the functioning of DNN algorithms. We present the "many-runs ensemble" as a means to achieve stacking generalization through numerous training iterations of the same framework.

## **4. Experimental Results**

#### *4.1. Training and Configuration of Resnet-50 and the Proposed CNN*

DL systems have been tested on the HAM10000 dataset to see how well they work and to see how they compare to current best practices. There are two groups of data: 90% for training (9016 images) and 10% for testing (984 images). A total of 10% of the training set is utilized for validation (992). All images were scaled to 227 × 227 × 3 and increased to 39,430 images in the training method. Linux PC with RTX3060 and 8 GB of RAM were used to test the TensorFlow Keras application on this machine. A random image set of 80 percent served as the basis of the suggested DL systems' training. After training, 10 percent of training data was used as a validation set in which the most accurate weight combinations were saved for future use. The HAM10000 dataset is used to pre-train the suggested framework, which employs the Adam optimizer and a learning rate technique that slows down learning when it becomes stagnant for a prolonged span of time (i.e., validation patience). The subsequent hyperparameters were fed into the Adam optimizer during the training process: Batch sizes range from 2 to 64 with a move of two times the former value; epochs are 50; patience is 10; and momentum is 0.9 for this simulation. An infection form dissemination approach known as "batching" rounds out our arsenal of anti-infective measures.

#### *4.2. Set of Criteria for Evaluation*

This study section provides an in-depth description of the evaluation metrics and their results. A popular metric for gauging classification efficiency is classifier accuracy (Ac). The number of instances (images) correctly classified divided by the dataset's total number of examples (images) is the equation's definition (1). When analyzing the efficiency of image categorization algorithms, precision (Pre) and recall (Rec) are the two most commonly utilized criteria. The greater the number of accurately labeled images, the greater the degree of precision in the Equation (2). The ratio of photographs in the database was successfully categorized to those associated numerically in Equation (3). Having a higher F-score indicates that the system is better at forecasting the future than if it has a lower one. A system's effectiveness cannot be measured solely on the basis of accuracy or recall. Explanation (4) shows how to calculate the F-score mathematically (Fsc). The last metric is Top N accuracy, where the model N's highest probability answers must match the expected softmax distribution to be considered "top N accuracy". A classification is considered correct if at least one of the N predictions matches the label being sought.

$$\mathbf{Ac} = \frac{\mathbf{T}^{\mathbf{P}} + \mathbf{T}^{\mathbf{n}}}{\mathbf{T}^{\mathbf{P}} + \mathbf{T}^{\mathbf{n}} + \mathbf{F}^{\mathbf{P}} + \mathbf{F}^{\mathbf{n}}} \tag{1}$$

$$\text{Pre} = \frac{\text{TP}}{\text{TP} + \text{FP}} \tag{2}$$

$$\text{Rec} = \frac{\text{TP}}{\text{TP} + \text{F}^n} \tag{3}$$

$$\text{Fsc} = 2 \ast \left( \frac{\text{Pre} \ast \text{Rec}}{\text{Pre} + \text{Rec}} \right) \tag{4}$$

True positives are denoted by the symbol (T<sup>p</sup> ) and are positive cases that were successfully predicted, while true negatives (T<sup>n</sup> ) are negative situations that were accurately predicted. False positives (F<sup>p</sup> ) are positive situations that were mistakenly predicted, and false negatives (F<sup>n</sup> ) are negative examples that were wrongly forecasted.

#### *4.3. Performance of Various DCNN Models*

Data from the HAM10000 skin lesion categorization challenge dataset are being used to train and evaluate a variety of DCNNs (including CNN and Resnet-50). The results of multiple assessments of the HAM100000 dataset for the suggested systems are shown using a 90–10 split between training and testing. In order to minimize the time it takes to complete the project, this division was decided. Models were trained for 50 epochs using 10% of the training set as a validation set, a batch size of 2 to 64, and learning rates ranging from 1 <sup>×</sup> <sup>10</sup><sup>4</sup> , 1 <sup>×</sup> <sup>10</sup><sup>5</sup> , and 1 <sup>×</sup> <sup>10</sup><sup>6</sup> for CNN and Resnet-50, respectively. Resnet-50 was further fine-tuned by freezing varying numbers of layers to reach the useful accuracy possible. A model ensemble was created by running a number of runs on the same Model with the same parameters. Because the weights are created randomly for each run, the accuracy varies from run to run. Only the highest run outcome is stored and illustrated in Tables 3 and 4, one for CNN and one for Resnet-50 training on HAM10000 datasets, respectively. It demonstrates that the best result obtained using CNN and Resnet-50 is

86% and 85.3%, respectively. Figures 11 and 12 demonstrate the confusion matrix using CNN and Resnet-50, respectively. By applying the proposed approach to the test set, the confusion matrix was obtained. According to the confusion matrix, the suggested technique can identify nv lesions with 97% accuracy (770 correctly classified image out of 790 total images), which is extremely desirable for real-world applications using the CNN model and 94% (749 correctly classified image out of 790 total images) using the modified version of Resnet-50.



**Table 4.** Best accuracy after fine-tuning using modified Resnet-50 transfer learning model.


**Figure 11.** Best confusion matrix of CNN. **Figure 11.** Best confusion matrix of CNN. **Figure 11.** Best confusion matrix of CNN.

**Figure 12.** Best confusion matrix of Resnet-50. **Figure 12.** Best confusion matrix of Resnet-50.

**Figure 12.** Best confusion matrix of Resnet-50.

the modified Resnet-50 model.

the modified Resnet-50 model.

Figure 13 shows two successful examples of classifying two images, one belonging

Figure 13 shows two successful examples of classifying two images, one belonging

class in the Ham10000 dataset is shown in Tables 5 and 6, which reveal the number of images used for testing in each class. According to the results, it is obvious that the Nv class has the biggest number of images with 795; its pre, rec, and Fsc are all very high, equal to 91 percent, 97 percent, and 94 percent, respectively, using the CNN model. The values of these parameters are 94 percent, 94 percent, and 94 percent, respectively, using

images used for testing in each class. According to the results, it is obvious that the Nv class has the biggest number of images with 795; its pre, rec, and Fsc are all very high, equal to 91 percent, 97 percent, and 94 percent, respectively, using the CNN model. The values of these parameters are 94 percent, 94 percent, and 94 percent, respectively, using

Figure 13 shows two successful examples of classifying two images, one belonging to the Nv class and the other to the Akiec class. A total number of images used for each class in the Ham10000 dataset is shown in Tables 5 and 6, which reveal the number of images used for testing in each class. According to the results, it is obvious that the Nv class has the biggest number of images with 795; its pre, rec, and Fsc are all very high, equal to 91 percent, 97 percent, and 94 percent, respectively, using the CNN model. The values of these parameters are 94 percent, 94 percent, and 94 percent, respectively, using the modified Resnet-50 model. *Healthcare* **2022**, *10*, x FOR PEER REVIEW 15 of 19

**Figure 13.** Example of testing classification phase. **Figure 13.** Example of testing classification phase.


**Table 5.** Detailed results for each class using CNN learning model. **Table 5.** Detailed results for each class using CNN learning model.

**Table 6.** Detailed results for each class using modified Resnet-50 learning model. **Table 6.** Detailed results for each class using modified Resnet-50 learning model.


Average 0.86 0.85 0.85 984 Using lesion images to help dermatologists diagnose infections more accurately and Using lesion images to help dermatologists diagnose infections more accurately and reduce their workload has now been proven feasible in real-world settings.

reduce their workload has now been proven feasible in real-world settings.

#### *4.4. Evaluation with Other Methods*

Efficacy to that of other approaches is conducted. Table 7 indicates that our technique outperforms other approaches in terms of efficiency and effectiveness. Overall, the proposed inception model achieves an 86 percent accuracy rate, transcending the current methods.



#### *4.5. Discussion*

As we discovered, other methods could not meet our degree of accuracy. One of three contributing elements is the general resolution enhancement of ESRGAN, which we believe is responsible for this. In addition, we deploy a variety of architectures, each with a varied ability to generalize and adapt to diverse types of data. Transfer learning architectures could not classify medical images more accurately due to a lack of distinctive features. Resnet-50's classification accuracy was worse than the proposed CNN when applied to medical images, even though it was better at identifying natural images. More generalizable qualities of CNN's shorter networks suggest that they can be used for a wider range of images. Deeper networks such as Resnet-50, on the other hand, can learn abstract properties that can be used in any sector. CNN features are more generalizable and adaptable for medical imaging because they lack semantic relevance to natural images (compared to Resnet-50). Fine-tuning networks, in turn, made the two models more accurate. CNN's accuracy improved the greatest compared to Resnet-50. Deep networks, as opposed to shallow ones, were found to be more likely to pick up significant information when trained on a smaller dataset. Shown in Figures 11 and 12 are the results of the indicated processes, which were adequate. Table 7 displays that ResNet and CNN, in references [55,56], yielded 77% and 78% accuracy, respectively. Researchers evaluated the accuracy of their model against the results of these two research projects that used the same dataset and trained their models using the same methods (Convolutional Neural Networks and Resnet-50) so that comparisons could be made easily. To witness its robustness further, the proposed CNN model outperforms two other referenced works [14,57] while being trained on a significantly smaller dataset (9016 images vs. 100,000 in the ImageNet Dataset).

#### **5. Conclusions**

Researchers devised a method for promptly and accurately diagnosing seven different types of cancer by analyzing skin lesions. The suggested method uses image-enhancing techniques to brighten the lesion image and remove noise. Preprocessed lesion medical imaging was used to train CNN and modified Resnet-50 to avoid overfitting and to boost the overall competence of the suggested DL approaches. The proposed approach was challenged using a dataset of lesion images known as the HAM10000 dataset. When employing CNN and a modified Resnet-50, the conception model had an accuracy rate of 85.3 percent and 85.98 percent ≈ 86 percent, respectively, comparable to the accuracy rate of professional dermatologists, as proposed. In addition, the research's originality and contribution lie in its use of ESRGAN as a pre-processing step with the various models (designed CNN and modified Resnet50) and in its contribution to the field. Compared to the pre-trained Model, our new Model performs similarly. Current models are outperformed by the proposed system, as demonstrated by comparison studies. Experiments on a big and complicated dataset, including future cancer cases, are required to demonstrate the efficacy of the suggested method. In the future, DenseNet, VGG, or AlexNet may be utilized to evaluate the cancer dataset. Lesion-less skin and lesioned skin are not always caused by skin cancer; it may also be a confounding factor in clinical diagnosis. In future, we will add this into the dataset to test the effectiveness of the model further.

**Author Contributions:** Conceptualization, W.G., G.A., M.H. and N.U.S..; methodology, W.G., G.A., M.H. and N.U.S.; software, W.G., G.A., M.H. and N.U.S.; validation, W.G., G.A., M.H. and N.U.S.; formal analysis, W.G., G.A., M.H. and N.U.S.; investigation, W.G., G.A., M.H. and N.U.S.; resources, W.G., G.A., M.H. and N.U.S.; data curation, W.G., G.A., M.H. and N.U.S.; writing—original draft preparation, W.G., G.A., M.H. and N.U.S.; writing—review and editing, W.G., G.A., M.H. and N.U.S.; visualization, W.G., G.A., M.H. and N.U.S.; supervision, W.G., G.A., M.H. and N.U.S.; project administration, W.G., G.A., M.H. and N.U.S.; funding acquisition, W.G., G.A. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was funded by the Deanship of Scientific Research at Jouf University under grant No (DSR2022-NF-04).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Publicly available datasets were analyzed in this study. These data can be found here: (https://www.kaggle.com/datasets/kmader/skin-cancer-mnist-ham10000?select= HAM10000\_metadata.csv (accessed on 13 April 2022)).

**Acknowledgments:** The authors extend their appreciation to the Deanship of Scientific Research at Jouf University for funding this work through Research Grant No (DSR2022-NF-04).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

