Image-Based Classical Features and Machine Learning Analysis of Skin Cancer Instances

Almutairi, Aeshah; Khan, Rehan Ullah

doi:10.3390/app13137712

Open AccessArticle

Image-Based Classical Features and Machine Learning Analysis of Skin Cancer Instances

by

Aeshah Almutairi

and

Rehan Ullah Khan

^*

Department of Information Technology, College of Computer, Qassim University, Buraydah 51452, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(13), 7712; https://doi.org/10.3390/app13137712

Submission received: 6 May 2023 / Revised: 22 June 2023 / Accepted: 27 June 2023 / Published: 29 June 2023

Download

Browse Figures

Versions Notes

Abstract

:

Skin conditions influence people of all ages and genders and impose an enormous strain on worldwide public health. For efficient management and medical treatment, skin disorders must be accurately categorized. However, the conventional method of classifying skin conditions can be arbitrary and time-consuming, delaying diagnosis and treatment. In this research, we examine the application of traditional machine learning models and conventional image characteristics for the classification of skin cancer based on picture features. Specifically, we employ six feature extraction approaches, which we model using six classical classifiers. To evaluate our approach, we address skin cancer detection as both a seven-class problem and a two-class problem comprising 21 permutations of skin cancer instances. Our experimental results demonstrate that Random Forest achieves the highest performance, followed by Support Vector Machines. Additionally, our analysis reveals that the Edge Histogram and Fuzzy Opponent Histogram feature sets perform best in learning the skin cancer model. Our comprehensive evaluation of various models provides practitioners with valuable insights when selecting appropriate models for similar problems. Our findings demonstrate that acceptable detection performance can be achieved even with simple feature extraction and non-deep classifiers. We argue that classical features are not only easier and faster to extract than deep features but can also be combined with classical machine learning models to save time and valuable resources.

Keywords:

feature extraction; machine learning; classifiers; skin cancer; skin diseases; deep learning

1. Introduction

Skin diseases influence people of all ages, races, and genders and are a prevalent and substantial worldwide public health burden. The World Health Organization (WHO, Geneva, Switzerland) lists skin conditions as one of the top 10 global causes of disability. The accurate categorizing of skin conditions is essential for effective management and treatment. Classifying skin conditions using traditional techniques can be subjective and time-consuming, which can delay diagnosis and treatment. With several studies demonstrating its potential for accurate diagnosis and classification of various skin diseases like melanoma, psoriasis, and eczema, the use of machine learning models has recently shown great promise in improving the precision and efficacy of skin disease classification [1,2,3].

In 2018, approximately 300,000 new instances of melanoma were recorded globally, making it the most prevalent cancer in both men and women. In addition to melanoma, nearly 1 million cases of Basal Cell Carcinoma (BCC) and Squamous Cell Carcinoma (SCC) were reported in 2018 [4]. More skin cancers are diagnosed annually in the United States than all other cancers combined, according to [5]. Fortunately, if detected early, the probability of recovery is substantially boosted. According to [4], the 5-year survival rate for melanoma is 99%. The patient has a 20% probability of survival if the cancer spreads to other organs. Owing to the fact that early indications of skin cancer are not always apparent, diagnostic outcomes largely depend on the dermatologist’s ability [6]. An automated diagnostic system is a useful instrument for less-skilled medical personnel in order to make more accurate diagnoses. However, the naked-eye diagnosis of skin cancer is highly subjective and rarely generalizable [7]. Thus, it is necessary to develop a more precise, less expensive, and faster-to-diagnose automated classification system for skin cancer [8]. In addition, the implementation of such automated diagnosis technologies can significantly reduce skin cancer mortality, which is good for both patients and healthcare systems [9].

Artificial intelligence (AI) techniques have revolutionized disease diagnosis by leveraging machine learning and deep learning algorithms to analyze medical images. These algorithms can detect patterns and abnormalities that may not be easily discernible to the human eye. For example, in the field of radiology, AI models have demonstrated remarkable accuracy in detecting and classifying abnormalities in chest X-rays, such as lung nodules indicative of lung cancer [10]. Similarly, AI algorithms have been developed to identify signs of diabetic retinopathy from retinal images, enabling early diagnosis and treatment [11]. By providing automated and accurate analysis of medical images, AI systems help improve diagnostic efficiency, reduce human error, and enhance patient outcomes.

AI systems offer valuable support in healthcare management by optimizing resource allocation, improving patient flow, and aiding in clinical decision making. Through the analysis of vast amounts of patient data, AI algorithms can predict hospital readmissions, identify patients at risk of developing complications, and recommend appropriate interventions. For example, AI-based early warning systems can analyze vital signs and clinical data to detect signs of sepsis, a life-threatening condition, allowing for timely intervention and improved patient outcomes [12]. Additionally, AI-powered decision support systems can assist clinicians by providing evidence-based recommendations for diagnosis and treatment, taking into account relevant patient data and medical guidelines. By streamlining healthcare processes and improving decision making, AI contributes to more efficient and effective healthcare delivery.

AI plays a crucial role in treatment planning and precision medicine by analyzing large datasets and providing personalized treatment recommendations. By analyzing patient electronic health records, genomics data, and medical literature, AI algorithms can identify patterns and correlations that enable the prediction of disease progression and the selection of optimal treatment options for individual patients. For instance, AI models have been developed to predict the response of cancer patients to different treatment regimens, allowing clinicians to tailor therapies to specific patient characteristics [13]. This approach holds the potential to improve treatment outcomes, minimize adverse effects, and optimize therapy effectiveness. The contributions of AI in healthcare are constantly evolving, and ongoing research and advancements continue to push the boundaries of its applications. By leveraging the power of AI, the healthcare industry aims to enhance diagnostic accuracy, optimize treatment strategies, and improve overall patient care.

The work in [14] demonstrates the use of a deep neural network for skin cancer classification, achieving performance comparable to dermatologists in distinguishing between malignant melanomas and benign skin lesions. The authors of [15] compare the diagnostic performance of a deep learning convolutional neural network (CNN) with 58 dermatologists in recognizing melanoma based on dermoscopic images. The results demonstrate the potential of deep learning models for accurate melanoma detection. In Reference [16], the accuracy of human readers and machine learning algorithms, including deep learning models, is compared for the classification of pigmented skin lesions. It highlights the potential of machine learning algorithms in improving diagnostic accuracy. Reference [17] proposes a deep learning approach for automated melanoma recognition in dermoscopy images using very deep residual networks. The results demonstrate the effectiveness of deep learning models in achieving accurate detections. Reference [18] describes the ISIC 2017 Skin Lesion Analysis Towards Melanoma Detection Challenge, where deep learning algorithms were applied to classify skin lesions. The study provides insights into the latest developments and challenges in using deep learning for skin. Reference [19] presents a deep transfer learning method for the classification of skin cancer. The study highlights the advantages of leveraging pre-trained models and fine-tuning them for the specific task of skin cancer classification. The proposed approach shows promising results and has the potential to contribute to improved diagnostic accuracy in the field of dermatology. In Reference [20], the authors explore the application of transfer learning algorithms to compare different methods of stroke classification in the field of computer science. The paper provides an overview of the experimental setup and methodology used for the comparison. The results and findings of the study are discussed, emphasizing the effectiveness of transfer learning in stroke classification.

The aim of this article is to suggest an exhaustive set of experiments that utilize the skin cancer image dataset for the purpose of exploring numerous expressive parameters. By examining the image instances in detail, we conducted a thorough investigation into the extraction of conventional features from the images and subsequently assessed their effectiveness using traditional machine learning models. We also targeted seven diseases evaluations, including Actinic Keratosis/Intraepithelial Carcinoma (AKIE), Basal Cell Carcinoma (BCC), Benign Keratosis-like Lesions (BLK), Dermatofibroma (DF), Melanoma (MEL), Nevus (NV), and Vascular Lesions (VASC).

As such, in this article, we employ six feature extraction approaches, which we model using six classical classifiers. We address skin cancer detection as both a seven-class problem and a two-class problem comprising 21 permutations of skin cancer instances. Our experimental results demonstrate that Random Forest achieves the highest performance, and the Edge Histogram (EH) and Fuzzy Opponent Histogram (FoH) feature sets perform best in learning the skin cancer model. The experimental evaluation demonstrates that acceptable detection performance can be achieved even with simple feature extraction and non-deep classifiers. Additionally, the positive outcomes show the effectiveness and precision of the machine learning paradigm in automatically identifying cases of skin cancer from imagery of abnormal skin using a small dataset. Research on skin cancer is still going on despite the limited information that is now available in the hopes of finding biomarkers that are already visible in skin cancer images and might be utilized for the detection and diagnosis of skin cancer specifically as well as other diseases generally.

The rest of the paper is organized as follows: Section 2 presents the literature in the concerned domain. Section 3 presents in detail the methodology employed with discussion of classifiers used for modeling the instances. Section 4 discusses results and the evaluation paradigm and finally presents the rationale of the results and evaluation. The Section 5 concludes the article.

2. Literature Review

Diseases of the skin are one of the most widespread human illnesses. It can have detrimental effects on human skin and health as a whole. Whether it is a small blemish or a major tear, it can lead to the development of dangerous or malignant forms, which can be fatal. In spite of these repercussions, the majority of people believe this illness to be trivial. As skin disorders have become a huge worldwide burden, there is an urgent need to bring them under control. In relation to the number of patients, the number of dermatologists (who treat skin disorders) is incredibly low [21].

Shuchi Bhadula [22] selects five separate machine learning approaches that may be used to various datasets of skin illnesses in order to determine the particular type of skin ailment. These machine learning algorithms were able to identify between lichen planus, sjs-ten, and acne, three different forms of skin problems. The dataset was first partitioned and preprocessed. In total, 80% of the data was employed for the training dataset and 20% for the test dataset. For the aforementioned three diseases, the test dataset was separated into three unique groups, namely Class A, Class B, and Class C. Five machine learning algorithms, including Kernel Support Vector Machines, Naive Bayes, Random Forest, convolutional neural network (CNN), and logistic regression, were then employed to detect skin disorders. Ten iterations of each method were performed on the same dataset, and the outcomes were compared. CNN’s training accuracy of 99.05% and testing accuracy of 96% were considered to be the highest following a comparison.

Haijing Wu utilized an additional CNN approach released in [23] to develop an artificial intelligence dermatological diagnosis helper (AIDDA). In this method, a CNN model was trained using a dataset of 4740 clinical pictures, and its performance was assessed using clinical images verified by specialists and categorized according to three separate diagnoses. The source of the photos in the collection is “The Second Xiangya Hospital, Central South University, China.” Using five-fold cross validation, the algorithm’s effectiveness was confirmed. Then, 20% of the cases were used for validation, and all images used for validation and training were examined by dermatological professionals. A framework for deep learning for the network was developed using Pytorch. The resultant model was able to categorize every image into one of three categories. The model’s accuracy and sensitivity are 95.60% and 94.40%, respectively.

There are several diseases that might lead to skin cancer. Malignant melanoma is the deadliest of these cancers. It might be dangerous if it goes undetected or spreads across the skin. Standard symptoms include an asymmetrical area, color variation, and a diameter more than 6mm. Sudhakar Singh and Shabana Urooj created an Artificial Neural Network (ANN) model for the categorization of melanoma images, which is beneficial for diagnosing skin cancer caused by malignant melanoma [24]. Images were preprocessed before to segmentation in this model. A modified version of the Sobel edge detection technique was implemented for segmentation. After segmentation, desired attributes were extracted from segmented images and selected. These characteristics were then subdivided into n sub-feature spaces, and these sub-features were ultimately employed for classification with an ANN. Four various types of algorithms were evaluated for their precision. Bayesian regularization backpropagation training approaches outperformed the remaining three [24] algorithms.

Using Django, a second convolutional neural network for classifying skin cancer was developed in [25]. This CNN story classified benign vs, malignant skin tumors. This CNN’s dataset comprises of 3297 images of skin cancer retrieved from the Kaggle website. The categorization of skin cancer into two groups and the use of two separate architectural models, each with a different number of parameters, resulted in the establishment of two distinct groups. The initial architecture model consisted of 6,427,745 parameters and was trained for 10 epochs with 200 iterations each epoch. This architectural model’s accuracy ranged from 85 to 95%. The second architectural model comprises 2,796,666 parameters and was trained for 10 epochs with between 100 and 200 iterations for each epoch. Due to the reduced number of parameters, the accuracy of this model was significantly less than that of the original architectural model. The first architectural model to classify skin cancer had the highest accuracy at 93%, while the second had the highest accuracy at 73%. Hence, the number of parameters influences the model’s precision [25].

Machine learning (ML) has shown promising results in improving skin cancer detection, classification, and outcome prediction. Numerous studies have demonstrated that ML algorithms can accurately diagnose skin cancer using image analysis methods. These findings indicate that ML has great potential to assist dermatologists in accurately detecting and classifying skin cancers, as well as predicting patient outcomes. Esteva et al. [14], for example, used a CNN to classify skin lesions as benign or malignant with a diagnostic accuracy of 91%, whereas Brinker et al. [26] used deep learning algorithms to achieve a classification accuracy of 90% for skin lesions classified as cancerous or benign. ML has also been used to find novel risk variables and predict outcomes, according to Schmitt et al. [27], while Bui et al. [28] employed decision tree algorithms to identify various subtypes of melanoma. More study is needed, however, to fully comprehend the potential of ML in this domain and to guarantee that these technologies are successfully integrated into clinical practice.

The complexity of diseases, their manifestations, and the underlying mechanisms involved are indeed highly diverse. Some diseases present clear and distinguishable symptoms, facilitating accurate diagnosis, while others pose challenges due to overlapping symptoms or atypical presentations, leading to lower accuracy rates in diagnosis. Understanding the factors that influence disease diagnosis accuracy is crucial for improving healthcare outcomes. One significant factor affecting accuracy is the availability and quality of data used for analysis. Limited or biased datasets can have a negative impact on accuracy, particularly for diseases with lower representation in the data. Biases may arise from various sources, such as underrepresentation of certain demographic groups or specific regions, leading to less reliable diagnostic outcomes for those populations. The variability in clinical presentation among diseases further complicates accurate diagnosis. Different diseases can exhibit similar symptoms, making it challenging for clinicians to differentiate between them. This can result in misdiagnosis or delayed diagnosis, which can have serious consequences for patients. Efforts to enhance accuracy involve developing more precise diagnostic criteria and improving clinical decision support systems that consider a wider range of factors for accurate disease classification. In addition to the clinical challenges, variations in diagnostic practices across healthcare settings or countries can contribute to differences in accuracy rates. Different regions or institutions may employ varying protocols, guidelines, or diagnostic tools, leading to discrepancies in diagnosis. These variations can affect not only the accuracy of individual diagnoses, but also the comparability of disease prevalence and outcomes between different populations or regions. The choice of imaging methodologies, diagnostic tests, algorithms, or machine learning models can introduce biases or limitations that impact disease classification accuracy. The performance of these tools heavily depends on the quality and diversity of the data used to train them. Biases present in the training data can be inadvertently learned by the models, leading to biased predictions and lower accuracy rates. Ensuring the development and validation of these tools on diverse and representative datasets is essential for minimizing biases and improving accuracy. Moreover, the design of studies investigating disease diagnosis, including factors such as sample size and statistical analysis techniques, can influence observed accuracy rates. Small sample sizes or inappropriate statistical analysis approaches may lead to inflated or underestimated accuracy figures. Ensuring rigorous study designs with adequate sample sizes and appropriate statistical methodologies is crucial for obtaining reliable and meaningful accuracy estimates.

3. Methodology

The skin disease dataset used in this study is obtained from Kaggle [29], a data science community that hosts a variety of publicly available datasets. Specifically, we used the dataset provided by Smiti Singhal on skin disease classification, which includes images of seven different types of skin diseases, namely AKIE, BCC, BLK, DF, MEL, NV, and VASC. Therefore, the research focuses on creating and examining traditional features and machine learning models. Binary Patterns Pyramid (BPP), Color Layout (CL), Edge Histogram (EH), Fuzzy Opponent Histogram (FoH), JPEG coefficient (JPEG), and Pyramid Histogram of Oriented Gradients (PHOG) are the traditional features extracted from the images that are used in analysis. Bayesian Network (BN), Naive Bayesian (NB), Support Vector Machines (SVM), K Nearest Neighbor (KNN), Decision Tree (J48), and Random Forest (RF) are the classifiers used for model assessment. The permutations in Figure 1 are used to evaluate the features and classifiers.

The evaluation procedure is shown in Figure 2. The feature extraction module receives the input images, and the BPP, CL, EH, FoH, JPEG, and PHOG feature extraction algorithms are used to extract the features. This stage produces a feature set that the classifiers can use to assess input data. The classifier that is chosen (in the ML algorithm block) constructs a model using the characteristics that were retrieved in the previous phase. As soon as a model is learnt, 10-fold cross-validation makes sure that it performs more robustly and can be used with real-world data.

3.1. Random Forest (RF)

Random Forest is a machine learning algorithm used for categorization and regression tasks. To enhance model performance and handle complicated algorithms, it integrates many classifiers. The technique is based on a voting mechanism that considers the forecasts of several decision trees, each of which is trained on a random subset of characteristics. This lessens overfitting and enhances the generalizability of the model. Large dataset support, noise and outlier resistance, and excellent accuracy are just a few of the benefits of Random Forest [30,31].

3.2. J48

J48, an ML method, is used to identify and indicate statistical classifiers in categorical and continuous data using a tree-based decision technique [32]. J48 operates on the basis of decision trees or recommendations derived from them. The pruning method improves J48’s accuracy [33]. J48 generates new sample values by combining numerous characteristics from the data samples.

3.3. K-Nearest Neighbors (K-NN)

The K-NN model, which is based on feature similarity, is used to solve categorization and regression questions by performing computations on the fly and predicting the similarity between training and input samples. K-NN is a pattern recognition and supervised learning algorithm that uses the k value (distance/neighborhood) for sample classification [34,35].

3.4. Support Vector Machine (SVM)

The SVM classifiers are supervised learning models that have been used for classification as well as regression. SVM builds a model after training that allocates new instances to their associated classes or regression values. This is accomplished by categorizing the data into various classes depending on their characteristics. The studies cited as [36,37] give more insight into SVM and its application in machine learning.

3.5. Naïve Bayes (NB)

The NB method provides classification accuracy by assuming that attributes are conditionally independent. NB is used in text categorization challenges, as well as spam filters, text analysis, machine vision, and healthcare diagnostics [38,39].

3.6. Bayesian/Belief Network (BN)

The BN is a sort of graphical machine learning model that consists of a set of random variables represented as nodes and their accompanying conditional probabilities. The structure of these networks is a Directed Acyclic Graph (DAG). Except for their parental nodes, nodes and features in BN are assumed to be independent of all other nodes [40].

4. Experimental Evaluation

This section contains the dataset and experimental analysis of the skin cancer framework. The performance of several ML algorithms is investigated and compared. The assessment employs 10-fold cross-validation. The dataset is available on Kaggle [29].

The 10-fold cross-validation method divides the data into 10 groups, 9 of which are allocated to training by developing a model, and the last set is tested. These processes are repeated 10 times to get a likely mix of training and testing cases. The absolute accuracy is calculated as an evaluation metric by taking the average of all replicate results. BPP, CL, EH, FoH, JPEG, and PHOG characteristics were analyzed using traditional image features retrieved from the photos. The classifiers employed in model assessment are BN, NB, SVM, KNN, J48, and RF. These characteristics and classifiers are chosen and assessed based on their overall reliable performance for similar tasks, as shown in Figure 1.

4.1. Classification of Skin Cancer Instances

In this section, we present the evaluation of the seven classes together as a seven-class problem. The evaluation is based on 10-fold cross-validation of the seven classes. Table 1 and Figure 3 and Figure 4 shows the evaluation results in different representations.

Table 1 shows the accuracy of six different classifiers (BN, NB, SVM, KNN, J48, and RF) on six types of features (BP, CL, EH, FoH, JPEG-C, and PHoG) extracted from images from the Skin Disease Classification dataset. The accuracy is the percentage of correctly classified images out of the total number of images. The higher the accuracy, the better the performance of the classifier.

From the table, we can see that RF has the highest accuracy for all feature types, ranging from 71.5% to 81.85%. This means that RF can distinguish between infected images better than the other classifiers. The lowest accuracy for RF is achieved with CL features, which are color histograms that capture the color distribution of the images. The highest accuracy for RF is achieved with BP features, which are binary patterns that capture the texture information of the images.

KNN, which has an accuracy range of 67.4% to 74.45%, is the second best classifier. KNN is a straightforward classifier that gives a picture a label based on the labels of its closest neighbors in the feature space. The lowest accuracy for KNN is achieved with CL features as well, while the highest accuracy is achieved with JPEG-C features, which are compressed versions of the original images.

The third best classifier is SVM, which has an accuracy range of 67.2% to 71.05%. SVM is a classifier that finds a hyperplane that separates the images into two classes with the maximum margin. The lowest accuracy for SVM is achieved with CL features again, while the highest accuracy is achieved with BP features.

The fourth best classifier is J48, which has an accuracy range of 63.7% to 71.05%. J48 is a classifier that builds a decision tree based on splitting criteria that maximize information gain. The lowest accuracy for J48 is achieved with CL features once more, while the highest accuracy is achieved with BP and EH features. EH features are edge histograms that capture the edge orientation and frequency of the images.

The fifth best classifier is NB, which has an accuracy range of 34.4% to 58%. NB is a classifier that assumes conditional independence between the features and applies Bayes’ theorem to calculate the posterior probabilities of the classes. The lowest accuracy for NB is achieved with JPEG-C features, while the highest accuracy is achieved with EH features.

The worst classifier is BN, which has an accuracy range of 36.25 to 60.9%. BN is a classifier that models the joint probability distribution of the features and classes using a directed acyclic graph. The lowest accuracy for BN is achieved with JPEG-C features as well, while the highest accuracy is achieved with FoH features. FoH features are Fourier histograms that capture the frequency spectrum of the images.

From the Table 1, the RF outperforms all other classifiers on all feature types for skin cancer classification. We can also see that CL features are not very effective for this task, as they have low accuracy for all classifiers. On the other hand, BP features are very effective for this task, as they have high accuracy for most classifiers.

Figure 3 shows the graph showing the average accuracy. The results show the average performance of six different classifiers (BN, NB, SVM, KNN, J48, and RF) on six types of features (BP, CL, EH, FoH, JPEG-C, and PHoG) extracted from images of skin cancer for the seven classes. The performance is measured by accuracy, which is the percentage of correctly classified instances out of the total number of instances. The higher the accuracy, the better the performance of the feature type.

From the results, we can see that EH has the highest average performance of 66.8%, followed by BP with 64.95%, PHoG with 63.6%, CL with 61.9%, JPEG-C with 61.18%, and FoH with 58.98%. This means that EH features are the most effective for the classification among the six feature types. EH features are edge histograms that capture the edge orientation and frequency of the images. They can distinguish between seven diseases based on their different edge patterns.

We can see in Figure 4 that different feature types have different levels of effectiveness for skin cancer classification. We can also see that edge-based features (EH) are more effective than color-based (CL), texture-based (BP), compression-based (JPEG-C), or frequency-based (FoH) features for this task. As such, the EH has better performance for all the seven diseases with all the classifiers and, thus, it is selected for evaluation of the individual diseases.

4.2. Individual Disease Analysis

Table 2 displays the different combination for skin cancer disease. The experimental evaluation for the seven diseases with EH feature set is based on the following arrangement. The evaluation permutation table shows the different combinations of skin cancer diseases that can be evaluated using the machine learning algorithm. There are 21 different combinations of the seven skin cancer diseases, and each combination represents a different evaluation scenario. Each combination of skin cancer diseases in the table represents a unique evaluation scenario that can help assess the accuracy and performance of the machine learning algorithm in classifying different skin cancer diseases.

It is important to evaluate the machine learning algorithm using different combinations of skin cancer diseases to ensure that the algorithm performs well in different scenarios and is not biased towards specific diseases. We can detect any limits or flaws in the algorithm and enhance its effectiveness by assessing it using different combinations of skin cancer diseases.

In Table 3, the performance evaluation shows the accuracy scores for each of the skin cancer disease combinations. From these accuracy scores, we can observe that the machine learning algorithm performs well in some, while its accuracy is relatively low in other combinations. The highest accuracy scores are achieved when classifying Nevus (NV) and Basal Cell Carcinoma (BCC) against other skin cancer diseases.

The algorithm performs relatively poor in distinguishing between certain skin cancer disease combinations, such as DF vs. VASC, BCC vs. VASC, and AKIE vs. BLK, where the accuracy scores range from 65.62% to 73.79%. These lower accuracy scores suggest that the algorithm may struggle to differentiate between certain skin cancer diseases, and further investigation and improvement may be needed to enhance its performance.

Overall, the performance evaluation table provides useful insights into the accuracy and limitations of the machine learning algorithm when classifying different skin cancer diseases and can guide future research and development in this area. Figure 5 shows the complete evaluation of all the permutations for each classifier.

Figure 6 present the average permutation score ranges from 65% to 94%, according to the effectiveness of the machine learning model on various permutations of skin cancer cases. The instances of DF vs. NV and NV vs. VASC had the highest average permutation score of 94%, demonstrating that the model was able to diagnose these skin malignancies effectively with a high degree of stability and reliability.

However, in BCC vs. BLK and BLK vs. MEL, the cases showed the lowest performance, with an average permutation score of 65%. This indicates that the model has trouble reliably differentiating between these different kinds of skin cancer and may require further improvement.

Figure 7 shows the average accuracy of six different classifiers (BN, NB, SVM, KNN, J48, and RF) on 21 permutations of skin disease types. The accuracy is the percentage of correctly classified images out of the total number of images. The higher the accuracy, the better the performance of the classifier. The 21 permutations of skin disease types are obtained by randomly selecting two types of skin diseases from a list of seven types (AKIE, BCC, BLK, DF, MEL, NV and VASC) and creating a binary classification problem for each pair.

We can see that RF has the highest average accuracy of 84.41%, followed by SVM with 83.55%, KNN with 80.68%, J48 with 78.79%, BN with 75.48%, and NB with 69.15%. This means that RF is the best classifier for skin disease classification among the six classifiers. RF is a machine learning algorithm that combines multiple decision trees to create a more accurate and robust classifier.

RF is followed by SVM, which is a classifier that finds a hyperplane that separates the images into two classes with the maximum margin. SVM is slightly worse than RF, but still better than the other classifiers. Then, SVM is followed by KNN, which is a simple classifier that assigns a label to an image based on the labels of its nearest neighbors in the feature space. KNN is close to SVM in performance, but not as good as RF.

Then, we have the J48 classifier, which is a classifier that builds a decision tree based on splitting criteria that maximize information gain. J48 is worse than KNN and SVM, but better than BN and NB. Next is the BN classifier, which is a classifier that models the joint probability distribution of the features and classes using a directed acyclic graph. BN is better than NB, but worse than the other classifiers.

The worst classifier is NB, which is a classifier that assumes conditional independence between the features and applies Bayes’ theorem to calculate the posterior probabilities of the classes. NB is worse than all other classifiers for this task.

4.3. Rationale of Classical Features and Classical Models

Our comprehensive evaluation of various models has yielded valuable insights for practitioners seeking to select appropriate models for similar problems. Our findings demonstrate that even with simple feature extraction and non-deep classifiers, acceptable detection performance can be achieved. Moreover, we argue that classical features are not only easier and faster to extract than deep features, but they can also be combined with classical ML models to save time and valuable resources. In addition to being computationally efficient, these models can be trained and retrained on legacy software and hardware, making them highly applicable, implementable, and acceptable for practical purposes within the existing infrastructure of hospitals and related institutions.

One of the significant benefits of our approach is its potential to facilitate the adoption of ML models in medical settings where traditional ML models may not be practical. We can accelerate the implementation of AI solutions without making large expenditures in new hardware or software by utilizing classical characteristics and ML models. This strategy not only facilitates the adoption of ML models, but also aids in making the most use of already available infrastructure and resources.

Furthermore, our study emphasizes how critical it is to choose the right models for a given set of issues. With the help of our findings, practitioners may determine which models are best for particular use cases and forego the time-consuming and expensive trial-and-error method of testing with various models.

In conclusion, our analysis has shown the effectiveness of conventional feature extraction and non-deep classifiers for identifying comparable issues, and we think that this information would be helpful for practitioners looking to implement ML models in clinical contexts. We can get equivalent outcomes while conserving time, resources, and avoiding hardware and software changes by integrating traditional features with traditional ML models.

4.4. Variations in Evaluation

There can be several reasons for the variations in accuracy rates observed in pairwise disease comparisons within a study. Below are some possible factors.

Diseases vary greatly in their complexity, manifestations, and underlying mechanisms. Some conditions from images may have well-defined symptoms, making them easier to distinguish accurately. On the other hand, diseases with overlapping symptoms in images or atypical presentations can pose challenges for accurate diagnosis, leading to lower accuracy rates. The accuracy of disease diagnosis heavily relies on the availability and quality of data used for analysis. If the dataset used for the study is limited in terms of size or diversity, it may not capture the full spectrum of variations and complexities within different diseases. Insufficient or biased data can negatively impact the accuracy rates, particularly for diseases with lower representation in the dataset. Some diseases exhibit significant variability in their clinical presentation, making accurate diagnosis more challenging. Consequently, this variability can result in lower accuracy rates for pairwise disease comparisons. Differences in diagnostic practices across different healthcare settings or countries can also contribute to variations in accuracy rates. The specific methodologies employed to take images of the cancer images can have an impact on the accuracy. The choice of diagnostic tests, algorithms, or machine learning models can introduce biases or limitations that affect the accuracy of disease classification. Additionally, the study design, sample size, and statistical analysis techniques can all influence the observed accuracy rates.

5. Conclusions

This article has provided a thorough analysis of how to classify skin cancer using traditional image characteristics and traditional machine learning models. The experimental findings of the study demonstrate that the EH and FoH feature sets operate optimally in learning the skin cancer model, whereas RF and SVM produce the greatest performance in identifying skin cancer. The study’s findings are valuable for practitioners looking to choose suitable models for similar problems. It challenges the belief that only complex deep learning models can produce accurate results. The study demonstrates that acceptable detection performance can be achieved using simple feature extraction techniques and non-deep classifiers. Although the study acknowledges limitations like sample size and specific dataset characteristics, its impact on improving the classification of skin cancer is substantial. The research provides practitioners with useful knowledge and enables them to make more informed decisions when implementing machine learning models in medical settings. The study’s outcomes also offer a practical direction for healthcare practitioners interested in using ML models. By combining classical features with classical machine learning models, comparable results can be achieved while optimizing time and resource utilization. This approach eliminates the need for costly hardware and software upgrades typically associated with deep learning frameworks, making it more accessible for medical facilities with limited budgets. In nutshell, the study’s findings shed light on the potential of simple feature extraction and non-deep classifiers in achieving effective detection performance across domains. By leveraging these insights, practitioners can enhance the classification of skin cancer and make significant advancements in the field of medical machine learning, benefiting both patients and healthcare providers. In general, the constraints and limitations of using ML in general to detection diseases and limitation of this article in particular are as follows.

The complexity, manifestations, and underlying mechanisms of diseases vary greatly. Some diseases are easily distinguishable due to well-defined symptoms, while others with overlapping symptoms or atypical presentations pose challenges for accurate diagnosis, resulting in lower accuracy rates. The accuracy of disease diagnosis depends on the availability and quality of data used for analysis. Limited or biased datasets can negatively impact accuracy, especially for diseases with lower representation. Variability in clinical presentation among diseases can make accurate diagnosis more difficult and lead to lower accuracy rates for comparing diseases. Variations in diagnostic practices across healthcare settings or countries can also contribute to accuracy rate differences. The choice of imaging methodologies, diagnostic tests, algorithms, or machine learning models can introduce biases or limitations affecting disease classification accuracy. Study design, sample size, and statistical analysis techniques also influence the observed accuracy rates.

Author Contributions

Conceptualization, A.A. and R.U.K.; methodology, A.A.; software, A.A.; validation, A.A. and R.U.K.; formal analysis, R.U.K.; investigation, R.U.K.; resources, R.U.K.; data curation, A.A.; writing—original draft preparation, A.A. and R.U.K.; writing—review and editing, A.A. and R.U.K.; visualization, A.A. and R.U.K.; supervision, R.U.K.; project administration, A.A.; funding acquisition, A.A. and R.U.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The researchers would like to thank the Deanship of Scientific Research, Qassim University for funding publication of this research.

Conflicts of Interest

The authors declare no conflict of interest.

References

Goyal, M.; Menon, V. Artificial Intelligence in Dermatology: A Primer for Clinicians. Ind. J. Dermatol. 2020, 65, 451–459. [Google Scholar] [CrossRef]
Han, J.; Liu, Y.; Bai, C.; Chen, Y.; Wang, X.; Huang, K. Skin lesion classification using ensemble of deep neural networks. Patt. Recogn. Lett. 2021, 141, 70–76. [Google Scholar] [CrossRef]
Lee, J.E.; Park, T.J.; Lee, J.H. Computer-Aided Diagnosis for Dermatological Images. J. Kor. Med. Assoc. 2019, 62, 331–337. [Google Scholar] [CrossRef]
ACS. Cancer Facts & Figures 2018. In Cancer Facts & Figures; American Cancer Society (ACS): Atlanta, GA, USA, 2018; pp. 1–71. [Google Scholar]
Rogers, H.W.; Weinstock, M.A.; Feldman, S.R.; Coldiron, B.M. Incidence Estimate of Nonmelanoma Skin Cancer (Keratinocyte Carcinomas) in the US Population, 2012. JAMA Dermatol. 2015, 151, 1081–1086. [Google Scholar] [CrossRef] [PubMed]
Sheha, M.A.; Mabrouk, M.S.; Sharawy, A. Automatic Detection of Melanoma Skin Cancer Using Texture Analysis. Int. J. Comput. Appl. 2012, 42, 22–26. [Google Scholar] [CrossRef]
Massone, C.; Di Stefani, A.; Soyer, H.P. Dermoscopy for Skin Cancer Detection. Curr. Opin. Oncol. 2005, 17, 147–153. [Google Scholar] [CrossRef] [PubMed]
Hoang, L.; Lee, S.H.; Lee, E.J.; Kwon, K.R. Multiclass Skin Lesion Classification Using a Novel Lightweight Deep Learning Framework for Smart Healthcare. Appl. Sci. 2022, 12, 2677. [Google Scholar] [CrossRef]
Anas, M.; Gupta, K.; Ahmad, S. Skin Cancer Classification Using K-Means Clustering. Int. J. Techn. Res. Applic. 2017, 5, 62–65. [Google Scholar]
Ardila, D.; Kiraly, A.F.; Bharadwaj, S.J.; Choi, B.D.; Reicher, J.J.; Peng, G.E.; Tse, D.; Etemadi, M.E.; Yeung, C. End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography. Nat. Med. 2019, 25, 954–961. [Google Scholar] [CrossRef]
Peng, V.G.L.; Coram, M.; Stumpe, M.C.; Wu, D.; Narayanaswamy, A.; Venugopalan, S.; Widner, K.; Madams, T.; Cuadros, J.; Kim, R.; et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 2016, 316, 2402–2410. [Google Scholar]
Rajkomar, A.; Dean, J.; Kohane, I. Machine learning in medicine. N. Engl. J. Med. 2018, 380, 1347–1358. [Google Scholar] [CrossRef] [PubMed]
Saria, S.; Goldenberg, A. Subtyping: What it is and its role in precision medicine. IEEE Intell. Syst. 2015, 30, 70. [Google Scholar] [CrossRef]
Esteva, A.; Kuprel, B.; Novoa, R.A.; Ko, J.; Swetter, S.M.; Blau, H.M.; Thrun, S. Dermatologist-level classification of skin cancer with deep neural networks. Nature 2017, 542, 115–118. [Google Scholar] [CrossRef] [PubMed]
Haenssle, H.A.; Fink, C.; Schneiderbauer, R.; Toberer, F.; Buhl, T.; Blum, A.; Kalloo, A.; Hassen, A.B.H.; Thomas, L.; Enk, A.; et al. Man against machine: Diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists. Ann. Oncol. 2018, 29, 1836–1842. [Google Scholar] [CrossRef]
Tschandl, P.; Rosendahl, C.; Kittler, H.; Cameron, J.; Kittler, K.; Wolff, C.G.D.; Zalaudek, I. Comparison of the accuracy of human readers versus machine-learning algorithms for pigmented skin lesion classification: An open, web-based, international, diagnostic study. Lancet Oncol. 2019, 20, 938–947. [Google Scholar] [CrossRef]
Yu, L.; Chen, H.; Dou, Q.; Qin, J.; Heng, P. Automated melanoma recognition in dermoscopy images via very deep residual networks. IEEE Trans. Med. Imag. 2017, 36, 994–1004. [Google Scholar] [CrossRef]
Codella, N.C.; Gutman, D.; Celebi, M.E.; Helba, B.; Marchetti, M.A.; Dusza, S.W.; Kalloo, A.; Liopyris, K.; Mishra, N.; Kittler, H.; et al. Skin lesion analysis toward melanoma detection: A challenge at the 2017 International Symposium on Biomedical Imaging (ISBI), hosted by the International Skin Imaging Collaboration (ISIC). In Proceedings of the 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), Washington, DC, USA, 4–7 April 2018; pp. 168–172. [Google Scholar] [CrossRef]
Al-saedi, D.K.A.; Savaş, S. Classification of Skin Cancer with Deep Transfer Learning Method. In Proceedings of the IDAP-2022: International Artificial Intelligence and Data Processing Symposium, İnönü, Turkey, 11–12 September 2022; pp. 202–210. [Google Scholar] [CrossRef]
Alhatemi, R.A.J.; Savaş, S. Transfer Learning-Based Classification Comparison of Stroke. In Proceedings of the IDAP-2022: International Artificial Intelligence and Data Processing Symposium, İnönü, Turkey, 11–12 September 2022; pp. 192–201. [Google Scholar]
Gulati, S.; Rosepreet Kaur, B. Serving the Dermatologists: Skin Diseases Detection. In Information and Communication Technology for Sustainable Development; Springer: Singapore, 2020; pp. 799–822. [Google Scholar]
Bhadula, S.; Kanade, S.; Gupta, S. Machine Learning Algorithms based Skin Disease Detection. Int. J. Adv. Comp. Sci. Applic. 2021, 12, 98–104. [Google Scholar] [CrossRef]
Wu, H.; Yin, H.; Chen, H.; Sun, M.; Liu, X.; Yu, Y.; Tang, Y.; Long, H.; Zhang, B.; Zhang, J.; et al. A deep learning, image based approach for automated diagnosis for inflammatory skin diseases. Ann. Transl. Med. 2020, 8, 9. [Google Scholar] [CrossRef]
Singh, S.; Shabana, U. Analysis of Chronic Skin Diseases using Artificial Neural Network. Int. J. Comp. Applic. 2018, 179, 7–13. [Google Scholar] [CrossRef]
Kolkur, S.; Seema, M.; Kalbande, D.; Kharkar, V. Convolution Neural Network for Feature Extraction in Skin Disease Detection. J. Adv. Res. Appl. Artif. Intell. Neural Netw. 2018, 2, 8–12. [Google Scholar]
Brinker, T.J.; Hekler, A.; Enk, A.H.; Gatzka, M.; von Kalle, C.; Ellwanger, U. Deep learning outperformed 136 of 157 dermatologists in a head-to-head dermoscopic melanoma image classification task. Eur. J. Cancer 2020, 119, 86–94. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Schmitt, J.V.; Miot, H.A.; Actis, R.L. Machine learning in dermatology: Past, present, and future. J. Am. Acad. Dermatol. 2020, 82, 1499–1510. [Google Scholar]
Bui, T.L.; Han, J.; Chen, S.C. Decision trees in dermatology. J. Am. Acad. Dermatol. 2018, 78, 1230–1237. [Google Scholar]
Singhal, S. Skin Disease Classification. Kaggle. Available online: https://www.kaggle.com/code/smitisinghal/skin-disease-classification (accessed on 1 March 2023).
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
Liaw, A.; Wiener, M. Classification and regression by random Forest. R News 2002, 2, 18–22. [Google Scholar]
Raj, S.S.; Nandhini, M. Ensemble human movement sequence prediction model with Apriori based Probability Tree Classifier (APTC) and bagged J48 on machine learning. J. King Saud Univ. Comp. Inform. Sci. 2018, 33, 408–416. [Google Scholar] [CrossRef]
Venkatesan, E.V.; Velmurugan, T. Performance analysis of decision tree algorithms for breast cancer classification. Ind. J. Sci. Technol. 2015, 8, 1–8. [Google Scholar]
Xu, D.; Wang, Y.; Peng, P.; Beilun, S.; Deng, Z.; Guo, H. Real-time road traffic state prediction based on kernel-KNN. Transport. A Transp. Sci. 2020, 16, 104–118. [Google Scholar] [CrossRef]
Shi, W.; Du, J.; Cao, X.; Yu, Y.; Cao, Y.; Yan, S.; Ni, C. IKULDAS: An improved kNN-based UHF RFID indoor localization algorithm for directional radiation scenario. Sensors 2019, 19, 968. [Google Scholar] [CrossRef] [Green Version]
Laptin, Y.P.; Likhovid, A.P.; Vinogradov, A.P. Approaches to construction of linear classifiers in the case of many classes. Pattern Recogn. Image Anal. 2010, 20, 137–145. [Google Scholar] [CrossRef]
Zhuravlev, Y.I.; Laptin, Y.; Vinogradov, A. Minimization of empirical risk in linear classifier problem. In New Trends in Classification and Data Mining; ITHEA: Sofia, Bulgaria, 2010; pp. 9–16. [Google Scholar]
Kamiran, S. Naive Bayes and Text Classification: A Comprehensive Study. arXiv 2013, arXiv:1305.6143. [Google Scholar]
Raschka, S. Naïve Bayes and text classification I—Introduction and theory. arXiv 2014, arXiv:1410.5329. [Google Scholar]
Iqbal, K.; Yin, X.-C.; Hao, H.-W.; Ilyas, Q.M.; Ali, H. An overview of Bayesian network applications in uncertain domains. Int. J. Comp. Theory Eng. 2015, 7, 416–427. [Google Scholar] [CrossRef] [Green Version]

Figure 1. The proposed flow showing the set of experiments in the form of the permutations of the diseases and applicable classifiers for the corresponding feature set used.

Figure 2. Proposed framework for evaluation of skin cancer instances.

Figure 3. The evaluation of features and classifiers for seven skin cancer diseases.

Figure 4. The average accuracy of seven classifiers for the particular feature set.

Figure 5. The complete evaluation of all the permutations for each classifier.

Figure 6. The average permutation of 21 cases.

Figure 7. The average accuracy of six different classifiers.

Table 1. Evaluation of the seven diseases with the classical classifiers and corresponding features.

Classifier	BP	CL	EH	FoH	JPEG-C	PHoG
BN	48.3	54.6	53	60.9	36.25	47.28
NB	44.15	46.2	58	45.61	34.4	48.53
SVM	71.05	67.2	69	68.33	70.45	67.15
KNN	73.25	67.4	73	74.08	74.45	72.83
J48	71.05	63.7	69	71.45	71.38	66.88
RF	81.85	71.5	79	79.1	80.15	78.93

Table 2. The different combinations for the evaluation of seven diseases.

	AKIE	BCC	BLK	DF	MEL	NV
AKIE
BCC	AKIE vs. BCC
BLK	AKIE vs. BLK	BCC vs. BLK
DF	AKIE vs. DF	BCC vs. DF	BLK vs. DF
MEL	AKIE vs. MEL	BCC vs. MEL	BLK vs. MEL	DF vs. MEL
NV	AKIE vs. NV	BCC vs. NV	BLK vs. NV	DF vs. NV	MEL vs. NV
VASC	AKIE vs. VASC	BCC vs. VASC	BLK vs. VASC	DF vs. VASC	MEL vs. VASC	NV vs. VASC

Table 3. Performance evaluation of 21 permutations.

	AKIE	BCC	BLK	DF	MEL	NV
AKIE
BCC	66.04
BLK	71.06	65.62
DF	73.79	73.9	83.59
MEL	74.15	72.31	65.31	84.82
NV	90.94	86.38	80.53	94.07	81.33
VASC	84.18	76.6	80.41	70.12	82.63	94.34

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Almutairi, A.; Khan, R.U. Image-Based Classical Features and Machine Learning Analysis of Skin Cancer Instances. Appl. Sci. 2023, 13, 7712. https://doi.org/10.3390/app13137712

AMA Style

Almutairi A, Khan RU. Image-Based Classical Features and Machine Learning Analysis of Skin Cancer Instances. Applied Sciences. 2023; 13(13):7712. https://doi.org/10.3390/app13137712

Chicago/Turabian Style

Almutairi, Aeshah, and Rehan Ullah Khan. 2023. "Image-Based Classical Features and Machine Learning Analysis of Skin Cancer Instances" Applied Sciences 13, no. 13: 7712. https://doi.org/10.3390/app13137712

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Image-Based Classical Features and Machine Learning Analysis of Skin Cancer Instances

Abstract

1. Introduction

2. Literature Review

3. Methodology

3.1. Random Forest (RF)

3.2. J48

3.3. K-Nearest Neighbors (K-NN)

3.4. Support Vector Machine (SVM)

3.5. Naïve Bayes (NB)

3.6. Bayesian/Belief Network (BN)

4. Experimental Evaluation

4.1. Classification of Skin Cancer Instances

4.2. Individual Disease Analysis

4.3. Rationale of Classical Features and Classical Models

4.4. Variations in Evaluation

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI