Skip to Content
Applied SciencesApplied Sciences
  • Article
  • Open Access

11 November 2022

A Linear Discriminant Analysis and Classification Model for Breast Cancer Diagnosis

,
,
and
1
Department of Computer Science, College of Pure and Applied Sciences, Landmark University, Omu-Aran 251103, Nigeria
2
Department of Electrical Engineering and Computer Science, Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
3
MICT SETA 4IR Center of Excellence, Durban University of Technology, Durban 4000, South Africa
*
Author to whom correspondence should be addressed.

Abstract

Although most cases are identified at a late stage, breast cancer is the most public malignancy amongst women globally. However, mammography for the analysis of breast cancer is not routinely available at all general hospitals. Prolonging the period between detection and treatment for breast cancer may raise the likelihood of proliferating the disease. To speed up the process of diagnosing breast cancer and lower the mortality rate, a computerized method based on machine learning was created. The purpose of this investigation was to enhance the investigative accuracy of machine-learning algorithms for breast cancer diagnosis. The use of machine-learning methods will allow for the classification and prediction of cancer as either benign or malignant. This investigation applies the machine learning algorithms of random forest (RF) and the support vector machine (SVM) with the feature extraction method of linear discriminant analysis (LDA) to the Wisconsin Breast Cancer Dataset. The SVM with LDA and RF with LDA yielded accuracy results of 96.4% and 95.6% respectively. This research has useful applications in the medical field, while it enhances the efficiency and precision of a diagnostic system. Evidence from this study shows that better prediction is crucial and can benefit from machine learning methods. The results of this study have validated the use of feature extraction for breast cancer prediction when compared to the existing literature.

1. Introduction

Breast cancer is a serious and frequent reproductive cancer that primarily affects women and is caused by breast tumors. A breast tumor is a nipple discharge, lump, or a change in skin texture around the nipple area caused by the irregular growth of tissues in the breast. Tumors are classified as benign or malignant. A breast lump is a worldwide virus that affects the lives of women in the age bracket of 25–50. Doctors have found hormonal, lifestyle, and environmental factors that may raise an individual’s chances of having one. More than 5% of affected breast cancer patients were connected to familial gene mutations that have been passed down through the generations and also to obesity, aging, and hormonal abnormalities after menopause [1].
As a result, there is no prevention mechanism for breast cancer, although early discovery can greatly improve the diagnosis [2]. Furthermore, the treatment expenses might be significantly reduced as a result of this. However, because cancer symptoms might be uncommon at times, early detection is challenging. Mammograms and self-breast examinations are essential for detecting any early anomalies before the malignancy progresses. As a result, tumor diagnosis requires the automation of diagnostic systems [3].
Numerous studies have sought to utilize machine-learning algorithms to detect cancer survivability in humans, and they have established that these systems are more effective in detecting cancer diagnosis. Various machine-learning approaches have since been created for the analysis and treatment of breast cancer, as well as many aided breast cancer diagnosis approaches to improve diagnostic accuracy and lessen the number of bereavements [4].
Breast cancer is a predominant ailment that primarily distresses women, and its initial detection will hasten treatment. The basic objective of breast cancer treatment is to accurately forecast the presence of cancer and define the cancer category to regulate how to treat the disease. Nevertheless, determining the kind of breast cancer is one of the typical challenges in health-related investigations. The proper classification of breast cancer would lead to its early discovery, finding, behavior, and elimination, if possible. In addition, the precise classification of benign tumors helps save patients from receiving unneeded therapies [5].
Throughout the past few years, numerous establishments have amassed huge stores of data obtained from a variety of sources, stored in various formats. These data could be utilized in various application sectors, including medical, agronomy, and climate forecasting. These ever-increasing quantities of data exceed the capacity of conventional methods for evaluating and seeking hidden patterns and information for decision-making. By utilizing machine-learning methods, it is possible to examine data retrieved from medical data repositories [6]. For the effectiveness of disease prediction, algorithms and their use in knowledge discovery from health information sources are invaluable resources. Numerous studies have used the application of machine-learning algorithms for breast cancer prediction. Machine-learning algorithms are widely used in the advancement of breast cancer prediction models to promote operative decision-making. In current years, machine learning (ML) procedures have been executed in biomedicine to aid the fight against breast cancer [7]. It is a complex and time-consuming process to extract information from information to help the scientific analysis of breast cancer. Utilizing machine learning and feature extraction methods has substantially altered the breast cancer diagnosis process [8].
These methods were successful but have encountered slight fallbacks in areas such as accuracy and efficiency. This suggests using machine-learning techniques by improving the linear discriminant analysis algorithm for fetching latent components, support vector machine and random forest classifiers are further utilized to develop the prediction model for breast cancer analysis to aid in the medical field. Evidence from this investigation shows that better prediction is crucial. The study’s results report the benefit of using feature extraction in the prediction of breast cancer, providing fresh insight into the efficiency of linear discriminant analysis (LDA) and classification for breast cancer prediction. This helps to enhance classification performance in terms of performance evaluations such as accuracy, precision, and recall.
The following five sections are discussed in this paper: In the first section, we provide an overview of the study. Research on categorizing breast cancer is presented in Section 2. The research materials and techniques are discussed in Section 3, the research results are presented in Section 4, and the conclusion, a quick overview with criticisms of the results, is presented in Section 5.

3. Materials and Methods

The system in this research has two primary stages: feature extraction and categorization. As an input, the dataset loads, it is then given to a feature extraction method called linear discriminant analysis, which extracts the relevant data from the dataset and classifies it using random forest and support vector machines, with the results being assessed and compared using performance metrics. Figure 1 shows the proposed overall system design of the proposed work.
Figure 1. Proposed Workflow.

3.1. Dataset

The Wisconsin (diagnostic) Breast Cancer Dataset obtained from Kaggle will be used for this study. Breast cancer is a condition characterized by the uncontrolled growth of breast cells. Instances of breast cancer might vary greatly. Which breast cells become malignant determines the subtype of breast cancer. The Health Wisconsin Diagnostic Breast Cancer (WDBC) Dataset is used to determine if a tumor is cancerous or benign. https://www.kaggle.com/datasets/uciml/breast-cancer-wisconsin-data/discussion/62297 (accessed on 1 February 2022). The definition of attributes: Identification number; Diagnosis (1–32) (M = malignant, B = benign). There are 569 instances and 32 attributes. ID/diagnosis (30 input attributes with real-world values).

3.2. Improved Linear Discriminant Analysis

Linear discriminant analysis is a method for simplifying classification problems in supervised machine learning. Using it requires distinguishing between two or more categories; thus, it is employed for modeling set differences. It is a tool for bringing a feature from one dimension down to a lower-dimensional space. The method’s goal is to ensure optimal class separability by transforming characteristics from a high-dimensional space into a lower-dimensional space in which the proportion of between-class variation to within-class variance is maximized [31,32]. Because of its widespread use in the preprocessing of machine-learning classification applications and its ability to transform features into a lower dimensional space by minimizing the ratio of the between-class variance to the within-class variance, linear discriminant analysis (LDA) is the primary feature extraction technique used in this work. An enhancement of LDA (Algorithm 1) is presented herein by simply entering the sample’s independent variables and getting the class means and prior probability calculated. We compute the average vector dimensions for each group in the dataset. To determine the scattering matrix requires Eigenvectors (e1, e2, …, ed) and their corresponding Eigenvalues (1, 2, …, d), a dk matrix W can be created by sorting the Eigenvectors by decreasing Eigenvalues and picking the k Eigenvectors with the highest Eigenvalues. Then, the data is reconstructed in a new subspace using the Eigenvector matrix W produced above. Its representation in matrices is as concise as Y = XW. It estimates the pooled covariance matrix and computes the covariance matrices for each group. Following that, it determines the LDA discriminant function and labels the classes.
Algorithm 1. Improved Linear Discriminant Analysis
  • E n = ,   x U   j   |   y j = c n ,   j = 1 ,   · · ·   ,   m 1   ,   n = 1 ,   2 //class explicit subcategories
  • µ i = m e a n ( E i   ) ,   i = 1 ,   2 //class means
  • C = ( µ 1 µ 2 ) ( µ 1 µ 2 ) U //among class scatter mediums
  • Z n = E n 1 m n µ U   i     ,   n = 1 ,   2 //midpoint class conditions
  • T n = Z U   n   Z n   ,   n = 1 ,   2 //class scatter conditions
  • T = T 1 + T 2 //in-class scatter medium
  • λ 1 ,   x = e i g e n ( T 1 C ) //calculate central eigenvector m

3.3. Random Forest Classifiers

Random forest classifiers are made up of numerous separate decision trees that act as a team. Every tree contained in the random forest predicts a class, with the highest votes and develops the model’s forecast. While some trees are inappropriate, numerous others will be precise, letting the forest transfer in the correct way [33]. The resulting points are the necessities for a positive random forest (Algorithm 2):
  • The structures must have open signals for models formed with them to outperform random guesses.
  • The separate trees’ predictions (and thus errors) must have minimal correlations.
Random forest pseudocode is divided into stages, namely, random forest generation pseudocode and prediction/forecast from the formed random forest classifier pseudocode.
Algorithm 2. Random Forest [34]
  • Unsystematically select “o” features from the total “p” features.
  • Where o << p
  • Between the “o” features, estimate the node “p” with the finest divided point.
  • Divide the node breaking it down into offspring nodes with the finest division.
  • Continue 1 to 3 phases till the “l” sum of nodes is obtained.
  • Establish forest by redoing procedures 1 through 4 for “n” sum times to generate “q” sum of trees.
The random forest method begins by selecting “k” features at random from a total of “m” features. Then, it uses the best split strategy to discover the root node utilizing the randomly picked “k” features in the resulting phase and calculates the offspring nodes utilizing the same finest divisible method as before. Then, the first three stages need to be run until it generates a tree with a source node and the goal as a leaf node. In conclusion, phases 1–4 should then be replicated to generate “n” trees at random. The random forest is made up of these randomly produced trees.

3.4. Support Vector Machine

Support vector machine (Algorithm 3) reduces simplification errors. It is an all-inclusive and adjustable machine-learning model that can handle linear and nonlinear classification, regression, and outlier detection [35].
Support vector machine models can classify any novel text after feeding them groups of labeled training data for each class. The benefit is that more complex relationships between your data components can be kept track of without having to do complex transformations [36]. The disadvantage is that it takes much longer to train because of the increased computational expense. A support vector machine can be “taught” to detect fraudulent credit card activity by comparing thousands of records of actual and hypothetical credit card transactions. An SVM may be trained to identify handwritten digits by analyzing many scanned images of handwritten 0 s, 1 s, and other characters. Now more than ever, SVMs are finding widespread use in a wide variety of productive biological settings. Instinctive classification of gene expression profiles is another communal biomedical use of support vector machines. Theoretically, a gene expression profile derived from a peripheral fluid or tumor sample might be evaluated by a support vector machine to provide a diagnosis or prognosis. In biology, SVMs are used to categorize protein and DNA sequences, microarray expression patterns, and mass spectra [37].
Algorithm 3. Support Vector Machine [38]
  • Competitor TW = {Neighboring couple from differing classes}
  • While irreverent facts
  • do
  • Fetch violator
  • Competitor TW = competitor TW u violator
  • If alpha p < 0 for additional of c to t, then
  • Competitor TW = Competitor TW\p
  • Reiterate till points are trimmed
  • End
  • if
  • End
  • While
  • End

3.5. Performance Evaluations

Accuracy is the number of correct forecasts provided by the model over predictions of all kinds in the categorization tasks. Accuracy is a good metric in the almost equilibrated target variable classes of the data.
Accuracy   = T P + T N T P + F P + F N + T N
Precision is a metric that shows us how much the forecasts are right.
Precision   = T P T P + F P
The fraction of true positives that are accurately identified as positives is measured by sensitivity.
Sensitivity   = T P T P + F N
Specificity is defined as the percentage of genuine negatives that are accurately detected as opposed to positive, known as selectivity or real negative rate (TNR).
Specificity   = T N F P + T N
The F1 score measures the accuracy of a test, defined as the harmonic mean of precision and recall.
F 1   score   = 2 T P ( 2 T P + F P + F N )

3.6. Research Tools

This study suggests developing the implementation using Python on an icore7 processor, with 1.1 GHz speed, 4 GB RAM, 20 GB hard disk, and Windows 7 OS.

4. Results and Discussions

In this study, a feature extraction technique and two machine-learning classifiers were executed on the Jupyter platform. Precisely, this section presents the outcomes of the proposed model. This study implements LDA, a dimensionality reduction technique, with the classification procedures SVM and random forest on the raw data, which is the Wisconsin Breast Cancer Dataset, and passes that raw data without LDA through the previously mentioned classifiers. The data consists of 569 instances and 32 attributes. A common practice in machine learning is to divide the available data into a train, test, and validation set. In this instance, 80% of the loaded dataset was used for training and 20% was used for testing. We then employed a technique called feature scaling, which is used to standardize the range of the data’s independent variables or features.

4.1. Feature Extraction

A technique of feature extraction, linear discriminant analysis (LDA), was implemented, which reduced the dimensionality of the data and extracted the most important columns and identified the least important columns then removed them, which meant ten (10) least important columns were dropped. After passing the data through the feature extraction technique (LDA), which reduced the data to give the most important columns of the dataset, the new data are then passed to the machine-learning classifiers, random forest, and SVM.

4.2. Random Forest Classifier

In a random forest classifier, many individual decision trees collaborate to harvest a single output. The random forest is a collection of trees, each of which makes a classification prediction; the classification with the most votes determines the model’s prediction. In the first step after LDA, this classifier is applied to the newly acquired data. Confusion matrices for the random forest categorized portion are displayed in Figure 2. The percentages add up to 65 for the true positive rate (TP), two for false positives (FP), three for false negatives (FN), and 44 for true negatives (TN). Confusion matrices for data fed directly into the random forest classifier without previously being processed by LDA are displayed in Figure 3; these matrices reveal a true positive rate (TP) of 64, a false positive rate (FP) of three, a false negative rate (FN) of three, and a true negative rate (TN) of 44.
Figure 2. Confusion Matrix for Wisconsin Breast Cancer Dataset with LDA Using Random Forest (TP = 65, FP = 2 FN = 3 TN = 44).
Figure 3. Confusion Matrix for Wisconsin Breast Cancer Dataset Without LDA Using Random Forest (TP = 64, FP = 3 FN = 3 TN = 44).
The results of the performance of the dataset after being passed through the random forest classifier, where the classifier achieved accuracy = 95.61, precision = 97.0, recall/sensitivity = 95.6, F1 score = 96.3, specificity = 95.7.
The decision boundary of a support vector machine is optimized to cut down on over-generalization. It is a flexible machine-learning model that handles linear and nonlinear classification, regression, and outlier detection with ease. When the LDA-refined dataset is complete, it is sent into a support vector machine classifier.
Confusion matrices for the SVM-classified subcomponent are displayed in Figure 4. The sensitivity is 66, the specificity is one, the false negatives are three, and the specificity of the real negatives is 44. As can be seen in Figure 4, when the dataset was fed directly into the SVM classifier without first being run via LDA, the results indicated a true positive rate (TP) of 63, a false positive rate (FP) of five, a false negative rate (FN) of four, and a true negative rate (TN) of 43. Figure 5 shows the results of SVM without LDA.
Figure 4. Confusion Matrix Results for Wisconsin Breast Cancer Dataset with LDA Using SVM (TP = 66 FP = 1 FN = 3 TN = 44).
Figure 5. Confusion Matrix for Wisconsin Breast Cancer Dataset Without LDA Using SVM (TP = 62, FP = 5 FN = 4 TN = 43).
After applying the SVM classifier to the dataset, the results are depicted in Figure 5; these show an accuracy of 97%, a precision of 99%, a recall of 96%, a sensitivity of 97%, an F1 score of 97%, and a specificity of 97%.
In this research, data classification is carried out using random forest and SVM, therefore the data are sent into these classifiers after having been subjected to feature extraction using LDA. Table 1 displays the evaluation metrics used to analyze the experimental confusion matrices.
Table 1. Performance Metrics Table for LDA+ classifiers and classifiers without LDA.
Numerous studies were proposed in this study and Table 1 shows their evaluations, with LDA and SVM outperforming with 96.4% accuracy. Table 2 shows comparisons of the results obtained by several other related studies.
Table 2. Comparison with Related Works.
The accuracy of this study’s prediction method was commensurate with its predecessors (as shown in Table 2). The accuracy rate in determining breast cancer was the goal of the proposed methodology and this goal was met when compared to other predictions in the literature.

5. Conclusions

In this study, the machine-learning classifiers of random forest and support vector machines were applied to a 562 instance and 39 attribute dataset of breast cancer cases in Wisconsin, obtained from Kaggle. The dataset was pre-processed and then separated into a training set and a testing set. To create a reliable predictive model, we first used the training set, which included 80% of the data, to find the best possible combination of variables, and then we used the testing set, which included the remaining 20% of the data, to conduct a fair evaluation of the final model’s performance on the training dataset. After applying linear discriminant analysis, a feature extraction technique for dimensionality reduction that selectively extracted the features needed to provide improved performance to the Wisconsin Breast Cancer Dataset, the new dataset was run through the classifiers random forest and support vector machine, with the former achieving an accuracy result of 95.6% and the latter of 96.4%; the results were then compared to prior related works. The findings of this study can aid in the detection of breast cancer. If breast cancer can be diagnosed and treated early on, it could save the lives of thousands of women and men every year. Several machine-learning methods, including naive Bayes, k Nearest Neighbor, and decision tree, are explored in this study, paving the way for further development of breast cancer diagnosis. LDA feature extraction improves breast cancer prediction systems. The suggested model builds a new dataset based on extracted features and improves performance. This paper suggests researching feature extraction strategies on datasets and computing important information to improve prediction accuracy. The LDA’s efficiency enhances the need for further feature extraction approaches to improve the model’s prediction accuracy and performance with malignant datasets. Answering these questions may yield more significant findings.

Author Contributions

Conceptualization, M.O.A. (Micheal Olaolu Arowolo) and M.O.A. (Marion Olubunmi Adebiyi); methodology, M.O.A. (Micheal Olaolu Arowolo), M.D.M., M.O.A. (Marion Olubunmi Adebiyi); software, M.O.A. (Micheal Olaolu Arowolo), M.D.M.; validation, M.O.A. (Marion Olubunmi Adebiyi), O.O.O.; formal analysis, M.O.A. (Marion Olubunmi Adebiyi), M.O.A. (Micheal Olaolu Arowolo); investigation, M.O.A. (Micheal Olaolu Arowolo), M.O.A. (Marion Olubunmi Adebiyi); resources, M.O.A. (Micheal Olaolu Arowolo), M.O.A. (Marion Olubunmi Adebiyi), M.D.M., O.O.O.; writing—original draft preparation, M.D.M., M.O.A. (Micheal Olaolu Arowolo); writing—review and editing, M.D.M., M.O.A. (Micheal Olaolu Arowolo) and M.O.A. (Marion Olubunmi Adebiyi); supervision, O.O.O.; project administration, O.O.O. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Labrèche, F.; Goldberg, M.S.; Hashim, D.; Weiderpass, E. Breast Cancer. In Occupational Cancers; Springer International Publishing: Cham, Switzerland, 2020; pp. 417–438. [Google Scholar]
  2. Hailu, T.; Berhe, H.; Hailu, D. Awareness of Breast Cancer and Its Early Detection Measures among Female Students, Northern Ethiopia. Int. J. Public Health Sci. 2016, 5, 213. [Google Scholar] [CrossRef]
  3. Akram, M.; Iqbal, M.; Daniyal, M.; Khan, A.U. Awareness and Current Knowledge of Breast Cancer. Biol. Res. 2017, 50, 33. [Google Scholar] [CrossRef]
  4. Kourou, K.; Exarchos, T.P.; Exarchos, K.P.; Karamouzis, M.V.; Fotiadis, D.I. Machine Learning Applications in Cancer Prognosis and Prediction. Comput. Struct. Biotechnol. J. 2015, 13, 8–17. [Google Scholar] [CrossRef] [PubMed]
  5. Egwom, O.J.; Hassan, M.; Tanimu, J.J.; Hamada, M.; Ogar, O.M. An LDA–SVM Machine Learning Model for Breast Cancer Classification. BioMedInformatics 2022, 2, 345–358. [Google Scholar] [CrossRef]
  6. Way, G.P.; Sanchez-Vega, F.; La, K.; Armenia, J.; Chatila, W.K.; Luna, A.; Sander, C.; Cherniack, A.D.; Mina, M.; Ciriello, G.; et al. Machine Learning Detects Pan-Cancer Ras Pathway Activation in The Cancer Genome Atlas. Cell Rep. 2018, 23, 172–180.e3. [Google Scholar] [CrossRef] [PubMed]
  7. Banegas-Luna, A.J.; Peña-García, J.; Iftene, A.; Guadagni, F.; Ferroni, P.; Scarpato, N.; Zanzotto, F.M.; Bueno-Crespo, A.; Pérez-Sánchez, H. Towards the Interpretability of Machine Learning Predictions for Medical Applications Targeting Personalised Therapies: A Cancer Case Survey. Int. J. Mol. Sci. 2021, 22, 4394. [Google Scholar] [CrossRef]
  8. Fogliatto, F.S.; Anzanello, M.J.; Soares, F.; Brust-Renck, P.G. Decision Support for Breast Cancer Detection: Classification Improvement Through Feature Selection. Cancer Control 2019, 26, 107327481987659. [Google Scholar] [CrossRef]
  9. Aishwarja, A.I.; Eva, N.J.; Mushtary, S.; Tasnim, Z.; Khan, N.I.; Islam, M.N. Exploring the Machine Learning Algorithms to Find the Best Features for Predicting the Breast Cancer and Its Recurrence. In Proceedings of the International Conference on Intelligent Computing & Optimization, Hua Hin, Thailand, 30–31 December 2021; pp. 546–558. [Google Scholar]
  10. Asri, H.; Mousannif, H.; Al Moatassime, H.; Noel, T. Using Machine Learning Algorithms for Breast Cancer Risk Prediction and Diagnosis. Procedia Comput. Sci. 2016, 83, 1064–1069. [Google Scholar] [CrossRef]
  11. Bazazeh, D.; Shubair, R. Comparative Study of Machine Learning Algorithms for Breast Cancer Detection and Diagnosis. In Proceedings of the 2016 5th International Conference on Electronic Devices, Systems and Applications (ICEDSA), Ras Al Khaimah, United Arab Emirates, 6–8 December 2016; IEEE: Manhattan, NY, USA, 2016; pp. 1–4. [Google Scholar]
  12. Agarap, A.F.M. On Breast Cancer Detection. In Proceedings of the 2nd International Conference on Machine Learning and Soft Computing—ICMLSC ’18, Phu Quoc Island, Vietnam, 2–4 February 2018; ACM Press: New York, NY, USA, 2018; pp. 5–9. [Google Scholar]
  13. Sharma, S.; Aggarwal, A.; Choudhury, T. Breast Cancer Detection Using Machine Learning Algorithms. In Proceedings of the 2018 International Conference on Computational Techniques, Electronics and Mechanical Systems (CTEMS), Belgaum, India, 21–22 December 2018; IEEE: Manhattan, NY, USA, 2018; pp. 114–118. [Google Scholar]
  14. Nindrea, R.D.; Aryandono, T.; Lazuardi, L.; Dwiprahasto, I. Diagnostic Accuracy of Different Machine Learning Algorithms for Breast Cancer Risk Calculation: A Meta-Analysis. Asian Pac. J. Cancer Prev. 2018, 19, 1747–1752. [Google Scholar] [CrossRef]
  15. Tomar, D.; Agarwal, S. Hybrid Feature Selection Based Weighted Least Squares Twin Support Vector Machine Approach for Diagnosing Breast Cancer, Hepatitis, and Diabetes. Adv. Artif. Neural Syst. 2015, 2015, 265637. [Google Scholar] [CrossRef]
  16. Madhavi, B.; Reddy, R. Detection and Diagnosis of Breast Cancer Using Machine Learning Algorithm. Int. J. Adv. Sci. Technol. 2019, 28, 228–237. [Google Scholar]
  17. Dhahri, H.; Al Maghayreh, E.; Mahmood, A.; Elkilani, W.; Faisal Nagi, M. Automated Breast Cancer Diagnosis Based on Machine Learning Algorithms. J. Healthc. Eng. 2019, 2019, 4253641. [Google Scholar] [CrossRef] [PubMed]
  18. Bhise, S.; Gadekar, S.; Gaur, A.S.; Bepari, S.; Deepmala Kale, D.S.A. Breast Cancer Detection Using Machine Learning Techniques. Int. J. Eng. Res. Technol. 2021, 10. [Google Scholar] [CrossRef]
  19. Silva, J.; Lezama, O.B.P.; Varela, N.; Borrero, L.A. Integration of Data Mining Classification Techniques and Ensemble Learning for Predicting the Type of Breast Cancer Recurrence. In Proceedings of the International Conference on Green, Pervasive, and Cloud Computing, Uberlândia, Brazil, 26–28 May 2019; pp. 18–30. [Google Scholar]
  20. Jadhav, S.; Channe, H. Comparative Study of K-NN, Naive Bayes and Decision Tree Classification Techniques. Int. J. Sci. Res. 2013, 5, 1842–1845. [Google Scholar]
  21. Macaulay, B.O.; Aribisala, B.S.; Akande, S.A.; Akinnuwesi, B.A.; Olabanjo, O.A. Breast Cancer Risk Prediction in African Women Using Random Forest Classifier. Cancer Treat. Res. Commun. 2021, 28, 100396. [Google Scholar] [CrossRef]
  22. Ak, M.F. A Comparative Analysis of Breast Cancer Detection and Diagnosis Using Data Visualization and Machine Learning Applications. Healthcare 2020, 8, 111. [Google Scholar] [CrossRef]
  23. Vaka, A.R.; Soni, B.; Reddy, S. Breast Cancer Detection by Leveraging Machine Learning. ICT Express 2020, 6, 320–324. [Google Scholar] [CrossRef]
  24. Abdar, M.; Zomorodi-Moghadam, M.; Zhou, X.; Gururajan, R.; Tao, X.; Barua, P.D.; Gururajan, R. A New Nested Ensemble Technique for Automated Diagnosis of Breast Cancer. Pattern Recognit. Lett. 2020, 132, 123–131. [Google Scholar] [CrossRef]
  25. Kousalya, K.; Krishnakumar, B.; Shanthosh, C.I.; Sharmila, R.; Sneha, V. Diagnosis of Breast Cancer Using Machine Learning Algorithms. Int. J. Adv. Sci. Technol. 2020, 29, 970–974. [Google Scholar]
  26. El-Nabawy, A.; El-Bendary, N.; Belal, N.A. A Feature-Fusion Framework of Clinical, Genomics, and Histopathological Data for METABRIC Breast Cancer Subtype Classification. Appl. Soft Comput. 2020, 91, 106238. [Google Scholar] [CrossRef]
  27. El-Nabawy, A.; Belal, N.A.; El-Bendary, N. A Cascade Deep Forest Model for Breast Cancer Subtype Classification Using Multi-Omics Data. Mathematics 2021, 9, 1574. [Google Scholar] [CrossRef]
  28. Jessica, E.O.; Hamada, M.; Yusuf, S.I.; Hassan, M. The Role of Linear Discriminant Analysis for Accurate Prediction of Breast Cancer. In Proceedings of the 2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC), Singapore, 20–23 December 2021; IEEE: Manhattan, NY, USA, 2021; pp. 340–344. [Google Scholar]
  29. Polaka, I.; Bhandari, M.P.; Mezmale, L.; Anarkulova, L.; Veliks, V.; Sivins, A.; Lescinska, A.M.; Tolmanis, I.; Vilkoite, I.; Ivanovs, I.; et al. Modular Point-of-Care Breath Analyzer and Shape Taxonomy-Based Machine Learning for Gastric Cancer Detection. Diagnostics 2022, 12, 491. [Google Scholar] [CrossRef] [PubMed]
  30. Naji, M.A.; El Filali, S.; Aarika, K.; Benlahmar, E.H.; Abdelouhahid, R.A.; Debauche, O. Machine Learning Algorithms For Breast Cancer Prediction And Diagnosis. Procedia Comput. Sci. 2021, 191, 487–492. [Google Scholar] [CrossRef]
  31. Tharwat, A.; Gaber, T.; Ibrahim, A.; Hassanien, A.E. Linear Discriminant Analysis: A Detailed Tutorial. AI Commun. 2017, 30, 169–190. [Google Scholar] [CrossRef]
  32. Zhang, D.; Jing, X.-Y.; Yang, J. Linear Discriminant Analysis. Biometric Image Discrim. Technol. 2011, 41–64. [Google Scholar] [CrossRef]
  33. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  34. Cateni, S.; Vannucci, M.; Vannocci, M.; Colla, V. Variable Selection and Feature Extraction Through Artificial Intelligence Techniques. Multivar. Anal. Manag. Eng. Sci. 2013, 6, 103–118. [Google Scholar] [CrossRef]
  35. Awad, M.; Khanna, R. Support Vector Machines for Classification. In Efficient Learning Machines; Apress: Berkeley, CA, USA, 2015; pp. 39–66. [Google Scholar]
  36. Cervantes, J.; Garcia-Lamont, F.; Rodríguez-Mazahua, L.; Lopez, A. A Comprehensive Survey on Support Vector Machine Classification: Applications, Challenges and Trends. Neurocomputing 2020, 408, 189–215. [Google Scholar] [CrossRef]
  37. Arowolo, M.O.; Adebiyi, M.O.; Nnodim, C.T.; Abdulsalam, S.O.; Adebiyi, A.A. An Adaptive Genetic Algorithm with Recursive Feature Elimination Approach for Predicting Malaria Vector Gene Expression Data Classification Using Support Vector Machine Kernels. Walailak J. Sci. Technol. 2021, 18, 9849. [Google Scholar] [CrossRef]
  38. Huang, M.-W.; Chen, C.-W.; Lin, W.-C.; Ke, S.-W.; Tsai, C.-F. SVM and SVM Ensembles in Breast Cancer Prediction. PLoS ONE 2017, 12, e0161501. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.