1. Introduction
Skin cancer is the most common cancer in the United States and worldwide [
1]. Although the majority of the works in the literature are focused on melanoma detection, the most common malign skin lesion is the non-melanoma skin cancer (NMSC). Over 95% of NMSC cases are Basal Cell Carcinoma (BCC) and cutaneous squamous cell carcinoma (SCC) [
2]. Specifically, BCC has an incidence higher than 70% [
3] among all skin cancer, it has the best validated clinical criteria for its diagnosis [
4], and it presents the higher variability in the presence of these dermoscopic criteria.
The detection of NMSC can be performed by visual inspection by a skilled dermatologist, but there are many benign lesions that can be confused with NMSC, leading to unnecessary biopsies, in a proportion of five biopsies versus one actual cancer case [
5].
The increase in the incidence of BCC is provoking an overload for dermatologists. In the Andalusian Health System, teledermatology is being implanted. Nowadays, 315 demands of teledermatology consultation per month are received and 210 receive diagnostic criteria of BCC. Thus, a Computer Aided Diagnosis (CAD) tool that assists general practitioner physicians and provides a prioritization in the teledermatology consultation would have great utility.
Different kinds of images have been traditionally used in order to classify NMSC automatically (spectroscopy, optical coherence tomography, etc.). However, the simplest one and most used is the digital dermoscopy, that is, a digital color photograph, enhanced by a dermoscope.
Lately, and due to the availability of databases due to the challenges proposed by ISIC [
6], the use of artificial intelligence methods, and in particular, the use of deep learning neural networks, have become very popular in dermatology. In this sense, this paper is focused on machine learning algorithms, in particular, deep learning ones, and using dermatoscopic images from ISIC challenges [
6].
Most of the works published have been focused on melanoma segmentation and classification [
7,
8]. On the contrary, much less work has been devoted to NMSC detection. Marka et al. performed a systematic analysis of existing methods for automatic detection of NMSC in 2019. They came to the conclusion that, although most of the methods attain an accuracy similar to the reported diagnostic accuracy of a dermatologist, all the methods require a clinical study to assess the validity of the methods in a real clinical scenario [
9]. There are three methods that attain the best classification metrics according to this study. Wahba et al., in 2017, reached 100% in all the metrics but the test set was only 10 images [
10]. The same authors, in 2018, tested their methods with an extended database, obtaining the same results [
11]. Møllersen et al. also achieved 100% sensitivity, but their specificity was 12% [
12]. Sarkar et al. applied deep neural networks to differentiate between BCC, SCC and benign lesions. They achieved an AUROC score of 0.997, 1 and 0.998, respectively [
13]. Pangti et al. analyzed the performance of a deep learning-based application for the diagnosis of BCC, as compared to dermatologist and non-dermatologist physicians [
14].
In Han et al., 12 skin diseases were classified, employing a deep learning algorithm. They used three databases and concluded that the tested algorithm performance is comparable to that obtained by 16 dermatologists. One of these skin diseases is BCC [
15]. Following this comparison, Carcagni et al. [
16] and Zhou et al. [
17] also proposed methods based on deep learning to perform a multiclassification of skin diseases. Carcagni et al. proposed an ensemble approach and compared it with the original Densenet-121, obtaining a better performance.
Sies et al. [
18] tested two market-approved tools, one employed a Machine Learning (ML) technique and one is based on Convolutional Neural Networks (CNN). Although they tested 1981 skin lesions, only 28 lesions were BCC. The ML algorithm detected only 5 in 28 BCC lesions, whereas the CNN-based algorithm detected 27 in 28 [
18]. Dorj et al. use a pre-trained AlexNet convolutional network to extract the features that feed an SVM classifier, in order to classify among four kinds of cancers, including BCC [
19].
Recent advances in the field of histopathological and microscopic image analysis, dedicated to the BCC detection, can be found at [
20,
21,
22], where the authors use deep learning techniques to detect, classify and identify its patterns. However, our approach covers BCC classification, focusing on distinguishing BCC from Nevus, and employing dermoscopic images. From a clinical point of view, it is very interesting to differentiate BCC from nevus, because both represent the most frequent skin lesions appearing at primary health centers, and a good detection of these types of lesions could lead to a more efficient clinical management, performing a first prioritization of the images that arrive from the Primary Health Center by teledermatology.
The main contribution of the paper is that it performs a thorough analysis of deep learning techniques, applied to BCC segmentation and classification.
In addition, to the best of our knowledge, there are no previous works that evaluate the influence of a previous segmentation in the classification of skin lesions with a deep neural network.
2. Materials and Methods
In order to segment and classify the lesion, several experiments have been conducted that try to evaluate how important the segmentation is for an ulterior classification. A comparison between deep learning methods and classical segmentation algorithms is presented.
Regarding the classification task, different deep learning architectures in the following two different classification scenarios are tested: BCC vs. Nevus, BCC vs. All lesions.
2.1. Lesion Segmentation
Skin lesion segmentation becomes a challenging task due to the presence of hair, bubbles, different illumination conditions, blurry boundaries, blood vessels, scars or different skin colors, thus, the segmentation step turns into a very delicate and complex process.
Over the years, many techniques that successfully overcome the segmentation challenges have been developed. Unsupervised segmentation methods, such as thresholding, edge-based, region-based or energy minimization-based ones, and supervised methods, such as support vector machines (SVM), Bayes-based, or deep learning-based segmentation methods (DLBSM) have been successfully tested over any kind of images [
23,
24].
Lately, regarding dermoscopic images, many works have been focused on deep learning-based methodologies [
25]. This kind of segmentation technique combines low-level feature information with high-level semantic information [
26] and takes the advantage of its learning capacities, focusing on its learning properties to identify structures that allow us to segment the image. DLBSM allow us to segment images with low contrast, different intensity distribution or images with artifacts [
27].
In this paper, we compare unsupervised with supervised segmentation techniques. In fact, we compare the performance of one unsupervised method based on energy minimization, and two segmentation methods based on deep learning. More specifically, the two supervised methods consist of the following: (1) A CNN as feature extractor combined with a classic segmentation method (thresholding), and (2) Semantic neural network (SegNet). These three methods were tested over ISIC-2017 database [
28].
2.1.1. Unsupervised Method: Energy Minimization Based Algorithm
Unsupervised methods do not require a labelled training dataset. One of the main advantages they have is the low computational cost [
29]. Another one is that they do not need a large database as no training is performed. In contrast, its performance could not be robust for low quality images, or some interaction with the user is needed to achieve a good performance.
There are many state-of-the-art unsupervised algorithms in the literature and some of them have been explored in this paper (edge-based active contour, region-based active contours, segmentation based on convex optimization). However, for the final analysis an energy minimization method was chosen because it is less dependent on the parameter setting. In this kind of algorithm, an energy measure, which includes region and boundary information, is minimized to solve the segmentation problem. Over recent years, energy minimization algorithms based on convex relaxation have been developed [
29,
30]. This paper presents an algorithm using convex relaxation based on a previous work by the authors [
31,
32]. The original idea was proposed by Papadakis and Rabin [
33]. It consists of posing the problem of segmentation as a problem of minimization of a convex energy function. In this energy function, the distance between the histograms of each region within the image and histogram models is minimized.
Two histogram models were defined for each dermoscopic image, one for the foreground (the lesion) and the other for the background (the skin). To generate these two histogram models, the algorithm requires a manual selection of a partial part of each region (
Figure 1).
2.1.2. Supervised Methods
Supervised methods need a training data set in order to fix the parameters of the classifier. Some of these segmentation methods are based on SVMs, Bayes classifier, decision trees (DTs) or artificial neural networks (ANN) [
34].
As supervised algorithms, two different methods were chosen. The first method has been chosen because it segments by employing the information provided by the deep features of a CNN. This fact has the advantage that a small training database is required and even a pre-trained CNN may be utilized. The second supervised segmentation is a fully convolutional neural network, which has been demonstrated to be effective in medical image segmentation. More specifically, SegNet has been chosen because it is state of the art in the field.
Segmentation from Feature Images of a CNN
A CNN possesses convolutional layers that provide a wide information about global and local features of an image. Hence, the deepest convolutional layers contain information of the global, abstract and conceptual features, whereas the lower convolutional layers give information about the local structure, which is relevant for the segmentation process [
35]. Likewise, convolutional layers can be used to obtain the image features [
36].
We used a VGG-16 pre-trained with ImageNet database. A set of images from the fourth convolutional layer of VGG-16 network was extracted. These images were normalized and filtered by applying a Gaussian filter with standard deviation equal to 2, before being added. Finally, a threshold using Otsu’s method and morphological operations (dilation and hole filling) was applied to obtain the final segmentation result. A scheme of this segmentation algorithm is presented in
Figure 2.
Semantic Segmentation with SegNet Deep Neural Network
Semantic Segmentation allows us to identify an object in an image by classifying each pixel into a labeled class, which is called pixel-wise labeling. As is described by Badrinarayanan et al. [
37], SegNet is a Fully Convolutional Network (FCN) architecture whose encoder is topologically similar to the convolutional layers from VGG-16, but without its fully connected layers. The convolutions are performed with a filter bank. The last layer of the decoder works as a soft-max classifier, which allows us to obtain the predicted segmentation labels for each pixel as output, where each label is associated with an existing class. SegNet admits as input a map of features or an image.
2.2. Lesion Classification
The classification part of the paper will try to differentiate between different types of skin lesions, being the motivation of the paper, the detection of BCC.
We present the following two types of classifications:
BCC vs. Nevus;
BCC vs. All, where the term “All” groups the following skin lesions: nevus, benign keratosis, dermatofibroma, melanoma, SCC, actinic keratosis and vascular lesion.
From a clinical point of view, it is very interesting to differentiate BCC from nevus, because both represent the most frequent skin lesions appearing at primary health centers, and a good detection of these types of lesions could lead to a more efficient clinical management.
In the two classification experiments, we have tested how the introduction of previously segmented images could affect the classification.
A wide number of experiments are conducted in order to check which configuration could be better to solve this difficult problem. To this purpose several classification approaches have been proposed, as follows:
The use of a VGG-16 neural network. VGG-16 consists of 16 convolutional layers and is very appealing because of its very uniform architecture [
38].
The use of a ResNet50 neural network. It is a convolutional neural network with 50 layers. It is a type of Residual Network and it first introduced the concept of skip connection [
39].
The use of an InceptionV3 neural network. InceptionV3 is another type of CNN developed by Google. It is 48 layers deep [
40].
The use of an ensemble of the three neural networks using the maximum argument. The ArgMax ensemble calculates, for each image, the probability of each class from each neural network, and it selects as the output class the one with highest probability among all the neural networks.
The use of an ensemble of the three neural networks using the mean. In this case the average of the three probabilities for each class belonging to each neural network is calculated. The output class selected is the one with the maximum average value.
In
Figure 3, the ArgMax ensemble configuration is shown. After training the three DNNs mentioned above, a vector with the probabilities belonging to each class for each DNN is obtained (
Vi,
i = 1,2,3). Each in this vector has dimension
m × 1, where
m is the number of classes. Finally, a new vector is formed
V = [
V1,
V2,
V3] of dimensions 3
m × 1. The class with the highest probability, denoted by
n, is chosen as the predicted class of the lesion.
4. Discussion
There are few papers devoted to the detection of BCC in the literature [
10,
11,
12,
13], and less that apply deep neural networks to solve this problem. One of the main reasons is the lack of public databases with BCC lesions, with contours delineation or with labelled dermoscopic criteria.
From a clinical point of view, it is very convenient to distinguish between BCC and nevus, due to the high incidence of these two types of lesions. Specifically, in primary health centers, it would be desirable to have an automatic tool, in order to help the non-specialist in the diagnosis and to establish a good priority in the attendance at the dermatology services. To the best of our knowledge this is the first time that a classification between BCC and nevus has been performed.
Most works devoted to segment skin lesions claim that an accurate segmentation is necessary to achieve a proper extraction of features and consequent lesion characterization [
47]. However, in this paper, we demonstrated that, when using deep learning methods, it is not advantageous to include a segmentation before classifying the lesion. Actually, we get worse results when segmenting the lesion previously to the classification step, showing that, when using a large database, the previous segmentation of the lesion does not improve the classification results. This suggests that the healthy skin surrounding the lesion may contain information significant for the classification. In this sense, other works, such as the one by Teixeira et al., support this statement [
48].
The main limitation of our method is the lack of explainability of the classification. An explanation of the classification, by providing the automatic detection of dermoscopic criteria of BCC, would considerably improve the utility of the method for physicians. To this purpose, we are working on developing a database with the dermoscopic criteria of BCC and a system for the automatic detection of these dermoscopic criteria.
As future research, a clinical study to assess the validity of the methods in a real clinical scenario would be desirable.
5. Conclusions
In this paper, two analyses have been performed. Firstly, a comparison between an unsupervised segmentation method and two supervised segmentation methods, based on deep learning, has been carried out. Secondly, the identification of BCC amongst other types of skin lesions has been performed in the following two different scenarios: with a previous segmentation of the lesion and without segmenting the lesion. To this second task, different deep neural networks have been tested.
Experiments to compare the different segmentation methods show that SegNet architecture has attained the best behavior, obtaining 94% accuracy.
The ISIC 2019 public database [
6] has been used to carry out the classification task. A 98% accuracy, 0.84% sensitivity and 0.96% specificity, for distinguishing BCC from nevus, and a 95% accuracy, 0.68% sensitivity and 0.97% specificity, classifying BCC vs. all lesions, have been obtained. Furthermore, the proposed algorithm outperforms the winner of the ISIC 2019 challenge in almost all the metrics, when lesions are classified into eight classes.
In summary, this paper adds important comparison studies, applied to the analysis of BCC, that have not been performed previously. These studies are of interest, because BCC is the skin cancer of highest incidence. First, an analysis of the utility of BCC segmentation to improve classification is carried out, driving to the conclusion that previous segmentation does not improve the classification. Secondly, a tool for the discrimination between BCC and nevus, which is the most common pigmented lesion, is provided. Finally, we have demonstrated that an ensemble of well-known CNN can attain results that can compete with the best methods in the ISIC challenge.