Next Article in Journal
Prediction of Parkinson’s Disease Depression Using LIME-Based Stacking Ensemble Model
Next Article in Special Issue
Modified Artificial Hummingbird Algorithm-Based Single-Sensor Global MPPT for Photovoltaic Systems
Previous Article in Journal
A Periodically Rotating Distributed Forcing of Flow over a Sphere for Drag Reduction
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Dynamic Candidate Solution Boosted Beluga Whale Optimization Algorithm for Biomedical Classification

1
Faculty of Computers and Information, Minia University, Minia 61519, Egypt
2
Information Technology Department, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia
*
Author to whom correspondence should be addressed.
Mathematics 2023, 11(3), 707; https://doi.org/10.3390/math11030707
Submission received: 26 December 2022 / Revised: 26 January 2023 / Accepted: 28 January 2023 / Published: 30 January 2023

Abstract

:
In many fields, complicated issues can now be solved with the help of Artificial Intelligence (AI) and Machine Learning (ML). One of the more modern Metaheuristic (MH) algorithms used to tackle numerous issues in various fields is the Beluga Whale Optimization (BWO) method. However, BWO has a lack of diversity, which could lead to being trapped in local optimaand premature convergence. This study presents two stages for enhancing the fundamental BWO algorithm. The initial stage of BWO’s Opposition-Based Learning (OBL), also known as OBWO, helps to expedite the search process and enhance the learning methodology to choose a better generation of candidate solutions for the fundamental BWO. The second step, referred to as OBWOD, combines the Dynamic Candidate Solution (DCS) and OBWO based on the k-Nearest Neighbor (kNN) classifier to boost variety and improve the consistency of the selected solution by giving potential candidates a chance to solve the given problem with a high fitness value. A comparison study with present optimization algorithms for single-objective bound-constraint optimization problems was conducted to evaluate the performance of the OBWOD algorithm on issues from the 2022 IEEE Congress on Evolutionary Computation (CEC’22) benchmark test suite with a range of dimension sizes. The results of the statistical significance test confirmed that the proposed algorithm is competitive with the optimization algorithms. In addition, the OBWOD algorithm surpassed the performance of seven other algorithms with an overall classification accuracy of 85.17% for classifying 10 medical datasets with different dimension sizes according to the performance evaluation matrix.

1. Introduction

The availability of various medical data leads us to wonder if there are any effective and efficient ways to analyze these data and derive possibly novel and practical knowledge. Diagnosing various diseases presents one of the most-severe challenges for data analytics. The researchers focus their work in various ways, including creating high-accuracy prediction models, extracting if–then rules, and experimenting with new cut-off values for significant input variables. Diagnostic errors can result from a lack of sufficient information for an accurate diagnosis, a disruption in patient–clinician communication, a delayed or inaccurate diagnosis, or even a high degree of diagnostic complexity in a constrained amount of time. Clinicians may misdiagnose a patient, which may impact the treatment’s result. The surroundings can also cause diagnostic mistakes and affect the equipment employed for the diagnosis [1].
A new trend has emerged in recent years: using computational intelligence to diagnose medical conditions [2]. Intelligent data classification activities can be used to categorize various medical condition diagnosis techniques. The categorization strategies can be split into two groups based on the number of continually distributed groups. Binary classification (two-class task) is the initial classification distribution, which only distinguishes the data between the two classes. The second classification separates data from more than two classes and is called multi-classification (multi-class task) [3].
Machine Learning (ML) techniques are being used to help clinicians make decisions to improve healthcare services, such as reducing diagnostic errors and giving patients the right treatments [4]. Different ML techniques have been employed to diagnose various illnesses, such as Artificial Neural Networks (ANNs), which have achieved an accuracy of 95.63% in preventing diarrhea. In contrast, diarrhea is one of the leading causes of death worldwide [5]. Using clinical data, a Decision Tree (DT) accurately diagnoses and categorizes the thyroid with a 97.35% accuracy rate [6]. Random Forest (RF) has achieved 82.7% accuracy in identifying multiple indicators for the breast cancer survival rate, including the stage of the disease and the size of the tumor [7], and Support Vector Machine (SVM) achieves an accuracy rate of more than 90% on the prediction of influenza [8]. A hybrid algorithm enhances two or more ML algorithms. These hybrid models can achieve remarkable results and address issues that an individual algorithm cannot [9]. Better results may also be obtained using other ML techniques such as Latent Dirichlet Allocation (LDA) [10], Recurrent Neural Networks (RNNs) [11], Naive Bayes (NB) [12], and Logistic Regression (LR) [13].
Medical disease identification is a rapidly expanding area of research in the science of AI. Many efforts have been made in recent years to improve medical disease identification since mistakes and issues with disease diagnosis can result in seriously incorrect medical therapy. In the biomedical field, MH algorithms have been used frequently to diagnose medical disorders and promise improved perception and disease prediction accuracy [14,15].
Multiple medical tasks can now be improved with AI [16]. AI algorithms can find patterns in a dataset to achieve diagnostic tools. AI has already demonstrated good diagnostic accuracy in other medical specialties. In some specialized domains, it can match present diagnostic capabilities. AI could be employed in psychiatry for diagnostic purposes to support ongoing patient assessments or medication recommendations. AI has also been investigated for enhancing categorization and diagnosis capacities. It has also been employed in identifying suicide risk and the diagnosis of mood disorders [17].
AI can be employed for a variety of purposes. Algorithms for ML are often categorized as either supervised or unsupervised learning. Supervised learning was the primary class of ML algorithms employed in the included studies. Patterns in a dataset connected to a result can be found using supervised learning techniques. Regression and classification are the two divisions of supervised algorithms. Data can be categorized into many groups using classification techniques. Patients can be categorized into many groups using patterns. Classification tasks can be carried out using DT, SVM, and RF. To forecast quantitative data, regression methods are used. This class includes regression methods such as the most Least Absolute Shrinkage and Selection Operator (LASSO) and LR [16].
Overfitting refers to fitting an AI model based on data noise or inaccuracy rather than the true relationship. It is one of AI’s drawbacks. Small data samples or having too many parameters for the data can contribute to overfitting. One method for reducing overfitting is cross-validation. Using this method, the dataset is divided into several groups, then into training and validation data. Because of this, a different dataset is used to train and evaluate the statistical model for each group. This method lessens the possibility of having an overly optimistic estimate [18].
Other strategies, such as the dropout rate, are also applied to reduce overfitting. Dropout is a regularization method for Neural Networks (NNs) to lessen overfitting and enhance generalization [19]. During training, it is intended to arbitrarily ignore the neurons (and connections they form) in the NN. The same input data can result in various outputs because the NN architecture changes at every inference. The idea is that it forces the units to be stronger and less codependent. Cross-validation and dropout differ primarily in the source of randomization. In cross-validation, the data are randomly split into training and validation sets, whereas in dropout, the neural units are randomly removed. The majority of recent research demonstrating AI’s ability to detect mental health disorders on social media has been published.
In this paper, we developed an improved BWO called OBWOD by integrating the Opposition-Based Learning (OBL) strategy and Dynamic Candidate Solution (DCS) with the original BWO. Based on the proposed OBWO, the classifier kNN was established for medical diagnosis. OBWOD was compared with the Whale Optimization Algorithm (WOA) [20], the Hunger Games Search (HGS) [21], the Sine Cosine Algorithm (SCA) [22], Harris Hawks Optimization (HHO) [23], the weIghted meaN oF vectOrs (INFO) [24] algorithm, Moth–Flame Optimization (MFO) [25], and the original BWO [26] algorithm based on the kNN classifier. The performance of the proposed OBWOD method was evaluated using the CEC’22 test suite. Then, OBWOD was combined with the kNN classifier to choose the best features in 10 biomedical datasets with varying dimensions. Performance measures assessed how well the developed model performed. The contributions of this paper are summarized as follows:
  • BWO has a lack of diversity, which could lead to being trapped in local optima.
  • OBL can enhance the exploration of BWO by exploring more potential regions that produce BWO.
  • DCS and OBWO boost variety and improve the consistency of the selected solution by giving potential candidates a chance to solve the given problem with a high fitness value, which produces the OBWOD approach.
  • In comparison to the other seven MH algorithms, OBWOD is used to resolve global optimization issues based on the CEC’22 test suite.
  • The OBWOD approach based on the kNN classifier was built for biomedical classification tasks and estimated on ten disease datasets with different dimension sizes extracted from the UCI repository.
  • The experimental results demonstrated the superiority of OBWOD over seven other competitors according to the performance evaluation matrix.
The remainder of this paper is organized as follows. A summary of relevant work is included in Section 2, while Section 3 discusses the BWO algorithm and other methodologies. The proposed OBWOD approach is explained in Section 4, and the experimental assessment and discussion are presented in Section 5. The conclusions and future research are covered in Section 6.

2. Literature Reviews

In [27], Hameed et al. proposed an ML technique to support the initial clinical diagnosis by using 3473 registers of patients older than 6 years who were treated for possible influenza and then submitted a sample to a PCRRT test to confirm the diagnosis. Support vector machine was the superior ML technique, with a sensitivity of 0.9715 and a specificity of 0.9285.
In addition, in [28], Bhattacharya et al. proposed the first use of ML for identifying Hypertrophic Cardiomyopathy (HC) patients with Ventricular Arrhythmias (VArs) using clinical variables, employing the LR and NB classifiers. While resolving the imbalance in the clinical data, the proposed model outperformed other Sudden Cardiac Death (SCD) prediction algorithms already in use (C-index).
In [29], Ouyang et al. proposed a unique online attention module using a 3 D Convolutional Neural Network (CNN) to concentrate on the infection regions in the lungs. The uneven distribution of the infection zones’ diameters between COVID-19 and CAP should be noted; this is partially a result of the quick progression of COVID-19 following the beginning of the symptoms. The results demonstrated that the COVID-19 images can be recognized by this method with an AUC of 0.944 and an accuracy of 87.5%. With this performance, the proposed algorithm may help radiologists diagnose COVID-19 from CAP, especially at the beginning of the COVID-19 outbreak.
Mohan et al. [30] proposed a new technique for predicting cardiovascular diseases: Hybrid RF with a Linear Model (HRFLM). The approach uses a hybrid model made up of RF and a linear model based on clinical data. The model’s performance was better than previous ML and soft-learning-based models. It had an 88.7% accuracy rate.
For the dimension reduction of datasets, Ghazal and Taleb [31] proposed that Advanced Harmony Searching Optimization (AHSO) can improve the ovarian cancer diagnosis method. The model put out in this study is also capable of an early cancer diagnosis with high accuracy and a low root-mean-squared error (RMSE). The RMSE, SOM, and NN methods demonstrated detection and precision of ovarian cancer of 94% and 0.029, respectively. An effective classification method with a lower failure rate was developed using optimization (AHSO).
In addition, Calp and Hanefi proposed SVM combined with a Cognitive Development Optimization Algorithm (SVM-CoDOA) for general medical diagnosis in [32]. An SVM, trained by the CoDOA, makes up the system. Optimization algorithms must be used to train and enhance ML approaches and demonstrate the efficacy of the SVM-CoDOA hybrid formation.
For concentrating on features, in [33], Rahman et al. proposed the LASSO, mutual information, and recursive feature elimination techniques in addition to three classification algorithms (kNN, RF, and NB). These were carried out in this study using a 10-fold cross-validation approach. They also used RF and the bagging ensemble technique to enhance the result. Through recursive feature elimination with bagging, the proposed model achieved an accuracy of 85.18%.
It is critical to have a lightweight and reliable ML solution for categorization because there is a vast amount of labeled or unlabeled data. Several optimizers have been implemented to improve the inclusive performance of ML models [34,35]. ML algorithms have used an increasing amount of medically unstructured data to anticipate intuitions. However, drawing much intuition from that data is difficult. ML researchers have therefore used modern optimizers and cutting-edge feature selection strategies to overcome and improve the performance accuracy [36]. This study aimed to determine how ML and optimization algorithms affect medical diagnosis by providing an overview of the usage of ML and optimization algorithms to diagnose various diseases through a thorough evaluation of research papers, as shown in Table 1.

3. Preliminaries

3.1. K-Nearest Neighbor

KNN, another name for nearest neighbor classification, is based on the idea that the patterns closest to a target pattern X , for which we are looking for a label, provide important information. KNN gives the most K-nearest ways in the data space of a class label. We must be able to define a similarity measure in the data space [50,51]. The calculation is given by Equation (1).
X X j p = i = 1 q X i X i j p 1 p
This, for p = 2, corresponds to the Euclidean distance. The issue is how to select K or for what neighborhood size the best categorization result is achieved. This issue is sometimes referred to as a model. Selection and various methods, such as cross-validation, pick the optimal model and parameters. We set k = 5 after looking through the literature [52,53].

3.2. Opposition-Based Learning

OBL’s idea was first presented by Tizhoosh [54]. The primary concept behind this optimization is to simultaneously analyze an estimate and its corresponding opposing estimate closer to the overall optimum to discover a better candidate solution. It has been used in numerous soft computing fields relatively quickly. Consider the points x [ y , z ] and y , z R . It is possible to find the opposite point of x represented by x o p indicated by (2).
x o p = y + z x
Many researchers use the idea of opposing numbers to improve their understanding of, ability to use, and optimization of MH algorithms [55].

3.3. Dynamic Candidate Solution

This section presents the dynamic aspects of potential dynamic arithmetic solutions. Exploration and exploitation are the two critical stages of MH algorithms, and maintaining a healthy balance between the two is crucial for the algorithm. Each solution renews its positions dynamically from the best-obtained solution during the optimization process in the proposed dynamic version to emphasize exploration and exploitation [56], where, as shown by Equations (3) and (4), the introduction of the DCS function results from the influence of the candidate solution’s lowering percentage and where its value fell during each iteration.
D C S ( 0 ) = 1 i t M a x i t
D C S ( i t + 1 ) = D C S ( i t ) 0.99
where i t is the current iteration and M a x i t is the maximum number of iterations.

3.4. Beluga Whale Optimization Algorithm

BWO [26] is a swarm-based algorithm for solving optimization problems that draws inspiration from beluga whales’ activities such as swimming, hunting for prey, and whale fall. To improve BWO’s capability of convergence, the exploitative phase uses the Levy flight function. We first created BWO motivated by beluga whales’ behaviors such as swimming, hunting, and whale fall. The following establishes the BWO mathematical model. Beluga whales are considered the search agents because of the population-based process of BWO, and each beluga whale is a candidate solution that is updated during optimization. The matrix of search agent positions is represented by Equation (5).
X = X 11 X 12 X 1 D X 21 X 22 . . . X 2 D X 31 X 32 . . . X 3 D . . . . . . . . . . . . X N 1 X N 2 X N D
where N is the beluga whale population size and D is the dimension. The balancing factor B f described by Equation (6) determines whether the BWO algorithm switches from exploration to exploitation.
B f = B 0 ( 1 T / 2 T m a x )
  • Exploration phase: Beluga whales’ swimming behavior is considered when establishing the BWO exploring phase. Beluga whales can engage in social–sexual behaviors in various postures, as evidenced by the behaviors observed in beluga whales kept in human care, such as a pair of closely spaced beluga whales swimming in a coordinated manner. As a result, beluga whales’ positions are updated as shown in Equation (7).
    X i j T + 1 = X i , p j T + X r , p 1 T X i , p j T 1 + r 1 s i n 2 π r 2 , j = e v e n X i j T + 1 = X i , p j T + X r , p 1 T X i , p j T 1 + r 1 c o s 2 π r 2 , j = o d d
    where T is the current iteration, X i j T + 1 denotes the next iteration, r 1 and r 2 denote random numbers between (0, 1), and j indicates the new position of the i t h beluga whale in the j t h dimension.
  • Exploitation phase: The Levy flight strategy is employed in BWO’s exploitative phase to improve convergence. With the Levy flight technique as our supposition, the mathematical model is shown in Equation (8).
    X i T + 1 = r 3 X b e s t T r 4 X i T + C 1 L F X r T X i T
    where C 1 is calculated by Equation (9), Levy flight function L F is calculated by Equation (10), and  σ is calculated by Equation (11).
    C 1 = 2 r 4 1 T T m a x
    L F = 0.05 u σ υ 1 β
    σ = 1 + β s i n π β 2 1 + β / 2 β 2 β 1 2
    where the default constant was set to 1.5 and u and υ are random values with a normal distribution calculated by Equations (12) and (13) [57].
    u = r a n d n 1 , d i m σ
    v = r a n d n 1 , d i m
  • Whale fall phase: We used the whale fall probability from the population’s individuals as our arbitrary premise to simulate slight changes in the groups to represent the behavior of whale fall in each iteration. We presumed that these beluga whales have relocated or have been fired at and dropped into the deep ocean. The locations of beluga whales and the magnitude of a whale fall are used to determine the updated position to maintain a steady population size, as calculated by Equation (14).
    X i T + 1 = r 5 X i T r 6 X r T + r 7 X s t e p T
    where X s t e p is calculated by Equation (15).
    X s t e p = u b l b e x p C 2 T / T m a x
    where C 2 is calculated by Equation (16).
    C 2 = 2 W f n
    where W f is calculated by Equation (17).
    W f = 0.1 0.05 T / T m a x

4. The Proposed OBWOD Approach

The OBWOD optimization algorithm is explained in this section. Firstly, a discussion of the BWO method’s drawbacks is provided. Then, numerous crucial concepts including initialization, updating, and evaluation, which are all components of the OBWOD algorithm, are next covered. After that, the OBWOD termination process is described.

4.1. Drawbacks of the Original BWO

The BWO method has a drawback called premature convergence, which prevents the algorithm from discovering the best solution since it traps the program in a local optimum. To choose a better generation of candidate solutions for the core BWO, thereby accelerating the search process and improving the learning approach and to increase the variety and strengthen the consistency of the chosen solution, the OBWOD combines DCS and OBWO based on the kNN classifier. The improvement is made by giving potential candidates a chance to solve the problem with a high fitness value. The three phases of the proposed method, including the exploration, exploitation, and whale fall phases, are shown in a flowchart of the OBWOD algorithm in Figure 1. The process of classifying 10 medical datasets using the proposed OBWOD algorithm is shown in Figure 2.

4.2. Fitness Function

The fitness function ( f o b j ) evaluates how closely a specific solution adheres to the ideal solution to the desired problem. It determines how appropriate a solution is, as shown in Equation (18).
F i t i = α E r r i + β d i / D
whereas b e t a = 1 − a l p h a and a l p h a = 0.7. The ratio of the number of chosen features ( d i ) to the classification error rate ( E r r i ) is balanced by the factor α .

4.3. The Major Stages of the OBWOD

We will now demonstrate how the OBWOD algorithm works. The premature convergence issue is resolved by OBWOD, which also enhances BWO’s capacity for local search. Algorithm 1 provides the pseudo-code for the OBWOD algorithm.
The following three steps comprise the proposed OBWOD solution, as follows.

4.3.1. Initialization Phase

Due to the population-based nature of BWO, beluga whales are regarded as the search agents, and each one is a potential solution that is modified throughout the optimization. The fitness function ( f o b j ) is computed by Equation (18), and the initialization process (beluga whales X) is generated by Equation (5).

4.3.2. Solution Update Phase

The solution update phase contains two major steps, as shown:
  • The first step: To select a better generation of candidate solutions for the basic BWO, OBL helps to speed up the search process and improve the learning approach by using Equation (2). We allowed each population to evolve following an OBWOD position update for all individuals to increase the solution accuracy, accelerate convergence, and avoid becoming stuck in the local optimum. The exploratory sub-population increases its location to enlarge the search area and increase its global exploration potential. The exploitative sub-population performs a deep local search close to the current best solution to speed up convergence and improve the quality of the solution. The tiny population revises its position in response to OBL, which produces new individuals to replace those who are less fit and further broadens the population’s genetic variation.
  • The second step: We used DCS to increase the variety and improve the consistency of the chosen solution by giving potential candidates a chance to solve the given problem with a high fitness value by using Equations (3) and (4) in the whale fall phase instead of using Equation (14). The fitness values of the new population are then calculated to find the optimum solution. This process is repeated until the termination condition (i.e., the maximum number of iterations).

4.3.3. Classification Phase

The OBWOD approach returns the best candidate solution in the prior stage. The original data are only used to keep features that are 1 in P * . We used the hold-out classification approach, which randomly divides the dataset into two parts: 20% for the testing set and 80% for the training set. It is critical to emphasize that each experiment was run 20 times in isolation to obtain relevant results. After searching the literature for a plausible parallel, the number of classifiers (k = 5) was decided [52,53].

4.4. The Computational Complexity of OBWOD

The initialization, fitness assessment, and updating of the beluga whales are the three processes that comprise the computational complexity of BWO, which is a crucial parameter to evaluate its effectiveness. Remember that the initialization process’s computational complexity for beluga whales is O ( n ) and using OBL does not increase the computational complexity. The computational cost of the exploration and exploitation phase was estimated to be O ( n . T m a x ) , where n is the number of iterations and T m a x is the maximum number of iterations. The whale fall probability W f and balance factor B f , which have an approximate value of O ( 0.1 . n . T m a x ) , have an impact on the computational complexity during the whale fall phase. As a result, the computational complexity of OBWOD was estimated to be around O ( n . ( 1 + 1.1 . T m a x ) ) .
Algorithm 1 The pseudo-code of the proposed OBWOD algorithm.
  • Inputs: Define the algorithm parameters, including population N and maximum iteration T M a x .
  •  
  • Outputs: Best solution P .
  •  
  • Initialize T M a x and N by applying OBL using Equation (2) and evaluating the fitness values.
  •  
  • while (T < T M a x ) do
  •  
  •     Obtain the probability of D C S by Equations (3) and (4), and balance factor B f by Equation (6)
  •  
  •     for (each beluga whale ( X i ) ) do
  •  
  •         if  B f ( i ) > 0.5 then
  •  
  •            Generate p j (j = 1,2,…,d) randomly from the dimension
  •  
  •            Choose a beluga whale X r randomly
  •  
  •            Update the new position of i by Equation (7)
  •  
  •         elseIf  B f ( i ) < = 0.5
  •  
  •            Update C 1 , and calculate the Levy flight function.
  •  
  •            Update the new position of i by Equation (8)
  •  
  •         end if
  •         Check the constraints of new positions, and evaluate f o b j .
  •  
  •     end for
  •     for (each beluga whale ( X i ) ) do
  •  
  •         if  B f ( i ) < = DCS then
  •  
  •            Update the step factor C 2 by Equation (16)
  •  
  •            Calculate X s t e p
  •  
  •            Update the new position of i by Equation (14)
  •  
  •            Check the constraints of new positions, and evaluate f o b j .
  •  
  •         end if
  •  
  •     end for
  •  
  •     Find the current best position, and update P .
  •  
  •     T++.
  •  
  • end while
  •  
  • Return P * .

5. Experimental Evaluation and Discussion

The numerical data produced by the OBWOD approach and other competing algorithms based on the kNN classifier were statistically examined using the Friedman test. The performance of the OBWOD algorithm and other optimization algorithms was reasonably compared using the Friedman test for the m e a n and S T D of the best solutions. The effectiveness of the OBWOD algorithm was evaluated using the CEC’22 single-objective benchmark functions and 10 disease datasets with various feature sizes taken from the UCI repository [58]. The final row is ranked in each of the results tables using Friedman’s rank. In the following tables, the labels μ and σ refer to the mean and standard deviation of the function values, respectively. The bold type represents the best values.

5.1. Algorithm Configurations and Datasets

5.1.1. Parameter Settings

Table 2 demonstrates the parameter sets employed in constrained optimization algorithms. Numerous researchers have used these factors [59]. Table 3 summarizes our experiments. To find all possible super-quality solutions, OBWOD, BWO, HHO, HGS, SCA, WOA, MFO, and INFO have to be adequately repeated. A total of 20 runs were performed for this purpose, with each run having a maximum of 100 iterations.

5.1.2. Dataset Description

An error in diagnostics may occasionally result from a human error or an incorrect interpretation of the data. This article offers a practical computer-aided diagnosis technique with intelligent learning models to avoid these issues. A computer-dependent functional simulation is proposed to improve predictive effectiveness. The University of California, Irvine, repository (UCI) [58] is conducting experimental research using 10 disease datasets with different dimension sizes, as shown in Figure 2 with more detail in Table 4.
The training dataset had data cleaning performed on it. We substituted missing values for improved computation, whether null, nil, or NA [60]. The null data were removed for such attributes that were found. An attribute is eliminated from the dataset if most of its values are missing, as shown in Figure 2.
Table 4 contains comprehensive details about the used datasets.

5.1.3. Performance Matrix

To determine which optimization approach was the most effective, we ran each under identical conditions, as given in Table 3.
It is the most straightforward approach to gauge how well a classification problem is performing when the result can include two or more different types of classes. A performance matrix is just a table having two dimensions, actual and predicted, as well as True Positives ( T P s), True Negatives ( T N s), False Positives ( F P s), and False Negatives ( F N s) for each dimension. Any project must have an ML algorithm evaluation. The model might achieve satisfactory results when measured against one metric, such as the accuracy score, but unsatisfactory results when measured against another, such as the specificity score. Most of the time, we used classification accuracy to gauge the effectiveness of our model, but more is needed to evaluate it fairly. This paper discusses the many evaluation measures that are accessible, as follows:
  • Mean accuracy ( μ A c c ): For classification algorithms, the accuracy metric is the most-typical performance metric. One way to describe it is the proportion of correct predictions to all predictions made, as shown in Equation (19).
    A c c u r a c y = T P + T N ( T P + T N + F P + F N )
    μ A c c is calculated by Equation (20).
    μ A c c = 1 M j = 1 M A c c * j
  • Mean best fitness ( μ F i t ): The fitness metric measures the algorithm’s effectiveness and, as stated in Equation (18), connects the decrease in the FS ratio to the reduction in the classification error rate. Equation (21) shows that the lower value represents the best fitness.
    μ F i t n e s s = 1 M j = 1 M F i t n e s s * j
  • Mean feature selection size ( μ F S ): This metric ( μ F S ), which is denoted by Equation (22), indicates the average size of FS.
    μ F S = 1 M j = 1 M f * j
    According to Equation (23), we determined the overall FS ratio by dividing the FS size f * by the total size of the features F in the original dataset.
    O v e r a l l F S = 1 M j = 1 M f * j F
  • Mean sensitivity ( μ S E ): The proposed model’s sensitivity gauges how well it can identify positive events. Another name for it is the recall or True Positive Rate ( T P R ). Sensitivity ( S E ) is used to assess the performance of models since it enables us to count the number of occurrences that the model was able to classify as accurately as positive, as shown in Equation (24).
    S e n s i t i v i t y = T P ( T P + F N )
    The μ S E metric is calculated as shown in Equation (25).
    μ S E = 1 M j = 1 M S E * j
  • Mean specificity ( μ S P ): As indicated in Equation (26), specificity indicates the percentage of T N s that the model accurately detects. This implies that an additional percentage of T N s was predicted to be positive and may be referred to as F P s. The True Negative Rate ( T N R ) is another name for this percentage. Specificity (actual negative rate)would always equal one. Low specificity indicates that the model mislabels as many negative findings as positive, whereas high specificity means that the model accurately detects the most unfavorable results.
    S p e c i f i c i t y = T N ( T N + F P )
    The μ S P metric is calculated as shown in Equation (27).
    μ S P = 1 M j = 1 M S P * j
  • Mean precision ( μ P P V ): The precision is measured as the proportion of categorized T P s to all positive samples (either correctly or incorrectly). The precision gauges how accurately the model classifies a sample as positive, as shown in Equation (28).
    P r e c i s i o n = T P ( T P + F P )
    Equation (29) is used to calculate the μ P P V metric.
    μ P P V = 1 M j = 1 M P P V * j
  • Standard Deviation (STD): The STD σ over the many executions determines the result variances for each optimization algorithm. It is calculated using Equation (30).
    σ x = 1 M j = 1 M ( S * j μ x ) 2
    With the aid of σ x , all measurements in the performance matrix were calculated.
  • Mean time consumption ( μ T i m e ) : Using μ T i m e , each optimization algorithm’s average consumption time (in seconds) was determined, as indicated in Equation (31).
    μ T i m e = 1 M j = 1 M T i m e * j

5.2. Experimental Series 1: Global Optimization Using CEC’22 Benchmark Test Functions

Quantitative measures from the CEC’22 benchmark set [61] were used to assess the performance of the proposed OBWOD method. The proposed OBWOD algorithm and the other optimization algorithms obtained optimal solutions, and μ and σ were the quantitative measures applied to those solutions. To ensure a fair assessment, the proposed OBWOD outputs were contrasted with those of seven other algorithms, including the original BWO algorithm, HHO, HGS, SCA, WOA, MFO, and INFO.

5.2.1. Description of the CEC’22 Test Suite

Every year, the CEC competitions are held to evaluate optimization stochastic search techniques in custom-built test environments. These benchmark functions were created from a collection of widely used benchmark functions, including those from Ackley, Griewank, Rastrigin, Rosenbrock, Schwefel, and many more [61]. Single-objective optimization algorithms are the foundation for more complex methods such as multi-objective, niching, and constrained optimization algorithms. Therefore, improvements to single-objective optimization techniques are essential since they have the potential to affect other domains as well. One source of motivation for these algorithmic breakthroughs is the feedback from experiments with single-objective benchmark functions, which act as the basic building blocks for more complex tasks. As algorithms advance, more challenging functions must be created. The interaction between approaches and problems drives innovation; hence, we made the CEC’22 Special Session on Real-Parameter Optimization to strengthen this symbiosis. The same search ranges were set for all test functions with the wording “Search range: [ 100 , 100 ] D ”, as seen in Table 5.

5.2.2. Statistical Results’ Analysis

The evaluation criteria for the CEC’22 were used to evaluate OBWOD on complex problems and further demonstrate the proposed algorithm’s efficacy. The performance of OBWOD was compared to that of seven optimization algorithms: INFO, HHO, HGS, SCA, MFO, WOA, and the original BWO. The results of OBWOD and the other algorithms for CEC’22 test functions from CEC-01 to CEC-10 are shown in Table 6. According to Friedman’s rank, the OBWOD received the best average values for all functions, while SCA and WOA obtained the best STD values for the CEC-02 function. WOA also obtained the best STD values for the CEC-01, CEC-08, and CEC-10 functions.

5.2.3. Convergence Behavior Analysis

The proposed algorithm OBWOD was evaluated on complex problems, specifically the CEC’22 evaluation criteria, to demonstrate the method’s efficacy further. The convergence curves of the proposed OBWOD algorithm, HHO, HGS, SCA, WOA, MFO, INFO, and the original BWO for the CEC’22 test functions are shown in Figure 3. The proposed algorithm attained a stable position for all functions. This behavior demonstrates the convergence of the given algorithm. The proposed OBWOD algorithm also quickly achieved the lowest average of the best solutions for most functions. Because of its quick convergence to the near-optimal solution, the proposed OBWOD algorithm is a promising tool for solving problems that require fast computing, such as online optimization issues.

5.3. Experimental Series 2: Biomedical Classification Tasks

Eight optimization algorithms were executed in the same computing environment, as shown in Table 3. The results were validated by the Friedman test utilizing the performance assessment metrics for categorization. With the same parameter settings as those indicated in Table 2, the result of OBWOD was compared with those of the other seven optimization algorithms (BWO, HHO, HGS, SCA, WOA, MFO, and INF) based on the kNN classifier.
The following section compares the experimental results of the OBWOD-based kNN classifier with those of the seven other optimization algorithms (HHO, HGS, SCA, WOA, MFO, INFO, and the original BWO; see Table 2), based on the kNN classifier on ten disease datasets with various dimensionality sizes, which are presented in Table 4. The Friedman test was used to validate the results statistically. Remember that each run consisted of 100 iterations and 20 runs. The measures listed in Section 5.1.3 served as the foundation for the comparative assessment.

5.3.1. Best Fitness and Convergence Evaluation

A fitting function was used in each generation to assess every beluga whale. The beluga whale with the best fitness function value was then selected, as shown in Equation (21). After 100 generations and 20 runs, the value of the fitness function, as depicted in Table 7, decreased. It should be noted that the best fitness of the proposed OBWOD algorithm for classifying the 10 disease datasets with different dimension sizes was 1.40, with the mean and best fitness values converging. The OBWOD approach outperformed the other seven optimization algorithms with greater mean fitness values. In contrast, the second-best optimization technique (BWO) showed lower fitness values for the mean best fitness that competed. Convergence is a stable position at the end of the process when no more changes or improvements are expected if optimization is a process that creates candidate solutions. A failure mode for an optimization algorithm is known as premature convergence, where the process ends at a stable point, but does not necessarily represent the best answer. The OBWOD approach is superior to the other seven optimization algorithms based on the kNN classifier, as shown in Figure 4.

5.3.2. Accuracy and Boxplots’ Evaluation

The accuracy of OBWOD and the other seven optimization algorithms based on the kNN classifier is compared in Table 8 under a typical scenario. On all disease datasets, the OBWOD technique beat the other algorithms, reaching classification accuracy on the Prostate Tumors dataset of 100% and an accuracy on the Leukemia2, Parkinson’s, and Immunotherapy datasets ranging from 90% to 99.8%. Meanwhile, INFO, HHO, HGS, and MFO achieved the best results on the Prostate Tumors dataset with an accuracy of 100%.
The Friedman rank accuracy of the OBWOD technique was the greatest (7.20), followed by that of the INFO approach, which was at (6.40). A single box that provides a visual representation of the five components of a dataset box and whiskers plot and just a boxplot are other names for it. Some of these are the minimum, first quartile (Q1), median, third quartile (Q3), and maximum. This helps describe measurements of dispersion and central tendencies in a dataset. By using a boxplot analysis, the properties of the data distribution can be seen in Figure 5.
There are various distributions and datasets where you will require more information than just the measures of central tendency (median, mean, and mode). You must have information regarding the data’s variability or dispersion. A boxplot is a graph that effectively conveys the distribution of the values in the data. Despite their seemingly being simplistic compared to a histogram or density plot, boxplots have the benefit of taking up less room, which helps compare distributions over numerous groups or datasets. Figure 5 shows that the OBWOD approach produced the highest boxplots for all disease datasets with varying feature sizes.

5.3.3. Analyzing Qualitative Information

  • Feature Selection (FS) evaluation:
    FS, which involves reducing duplicate or unnecessary features from disease datasets, can make classifiers more efficient, quick, and accurate. Regarding medical decisions, FS has an obvious advantage in increasing understandability. The advantages of choosing a subset of all attributes are several; among them, it makes data understanding and visualization more accessible. It decreases the time needed for training and usage and the measurement and storage requirements. The OBWOD algorithm can be fed data with lower dimensionality and produce a more accurate result by applying FS.
    Table 9 shows that the OBWOD approach outperformed the other optimizers on seven datasets. Meanwhile, the WOA achieved the best results on the Arrhythmia dataset. The second-best optimization method (HHO) had a rank of 3.40, while the highest rank of the FS ratio in the OBWOD was 2.50.
  • Sensitivity (SE) evaluation:
    Sensitivity is a measure of the percentage of positive cases that were misclassified as positive T P . This suggests that a different proportion of positive cases will occur, but will be misdiagnosed as negative F P s. An F N rate can also be used to illustrate this. The sensitivity and F P rates added together would equal one.
    Table 10 shows that OBWOD achieved the best results on seven datasets. OBWOD achieved 100% on the Arrhythmia, Leukemia2, and Prostate Tumors datasets, ranging from 99.7% to 99.9% on the CKD, Parkinson’s, and Lymphography datasets.
    Notably, the Leukemia2 and Prostate Tumors datasets were classified with 100% sensitivity by OBWOD, INFO, HHO, HGS, and MFO (Table 10). Regarding Friedman’s rank sensitivity, OBWOD came in first with a score of 7.30, followed by HHO (6.50).
  • Specificity (SP) evaluation:
    The percentage of T N s projected as negatives is known as specificity. This proposes that a different percentage of T N will occur. The specificity plus the F P rate would always add up to one.
    Table 11 shows that the OBWOD approach outperformed the other seven optimization algorithms on all datasets. The OBWOD achieved 100% on the Leukemia2 and Prostate Tumors datasets, ranging from 99.7% to 99.9% on the Primary Tumor, Parkinson’s, and Immunotherapy datasets.
    Notably, the Leukemia2 and Prostate Tumors datasets were classified with 100% specificity by OBWOD, HHO, HGS, and MFO (Table 11). Regarding Friedman’s rank specificity, the OBWOD came in first with a score of 7.30, followed by HHO (6.50).
  • Precision (PPV) evaluation:
    To evaluate the effectiveness of ML models or overall AI solutions, precision is a frequently used statistic. It aids in understanding how accurate the predictions made by models are. The percentage of accurate positive predictions is how precision is calculated. Table 12 shows that the OBWOD approach outperformed the other seven algorithms on all datasets. The OBWOD achieved 100% on the Prostate Tumors dataset and achieved a range from 91.9% to 99.9% on the Primary Tumor, Parkinson’s, and Immunotherapy datasets.
    Notably, the Prostate Tumors dataset was classified with 100% precision by OBWOD, HHO, HGS, and MFO (Table 12). Regarding Friedman’s rank precision, OBWOD came in first with a score of 7.65, followed by HHO (7.15).
  • Time consumption evaluation:
    Making a precise estimate of the amount of time and money needed to train a proposed model is crucial. This is particularly true when you use a cloud environment to train your model on a sizable amount of data. Knowing the length of the training period will help you make crucial decisions if you are working on a proposed approach.
    As shown in Table 13, the OBWOD approach outperformed all experiments, including the other seven algorithms, except for the time consumption test.
    The statistics also showed that OBWOD did an excellent job balancing exploration and extraction. Due to its premature convergence issue, BWO outperformed all other methods in terms of time consumption. This prevented us from discovering the best candidate solution since the algorithms became stuck in a local optimum, as seen in Figure 3.
    BWO had the highest Friedman’s rank of consumption time and was the fastest optimizer (1). The results also demonstrate that OBWOD ranked eighth.
  • Friedman test: To objectively compare the performance of the OBWOD algorithm with that of the other seven algorithms, Friedman’s non-parametric test [62] was used for the mean and STD of the best solutions in Table 6, Table 7, Table 8, Table 9, Table 10, Table 11 and Table 12.
    In Table 6, for the Friedman test, the worst-performing algorithm is rated highest, and the one with the best performance is listed lowest. Conversely, in Table 8 and Table 10, Table 11 and Table 12, the algorithm that performs the best is rated first, and the algorithm that performs the worst is ranked last.

5.4. Discussion

The purpose of this section is to go over the various algorithms for biomedical classification. Table 14 summarizes the comparison of OBWOD and the MH algorithms on biomedical classification and proves its superiority. This study’s goal was to suggest an effective search technique for the FS problem that takes into account both low- and high-dimensional datasets. The study suggested integrating OBL into the BWO exploration phase and enhancing the BWO exploitation strategy with DCS. The experimental analysis and comparative study proved that the proposed methodology is effective.
The proposed OBWOD approach provides the following advantages:
  • OBWOD is well able to resolve global optimization issues based on the CEC’22 test suite. OBWOD generates optimization solutions with better fitness values than the other seven MH algorithms (see Table 6). The proposed hybridization also proves to enhance the convergence ability of the algorithm; see Figure 3.
  • The datasets used for this study’s analysis range in feature size from 8 to 10510 features, offering a sufficient testing environment for an optimization technique. Here, OBWOD has the highest rank of the FS ratio of 2.50, on average, on all datasets, which is better than the other seven MH algorithms; see Table 9.
  • As for accuracy, Table 8 and Figure 5 show that OBWOD chose the subset of features that enabled the learning method k-NN to achieve an average accuracy of 85.17% across all classification datasets.
  • Analyzing the qualitative information of the performance evaluation metrics produced proves the superiority of OBWOD (see Table 10 for the overall metrics measuring sensitivity (93.87%), Table 11 for specificity (78%), and Table 12 for precision (76%)).
  • Any chance to improve OBWOD can be easily implemented because of its straightforward architecture.
  • OBWOD proved its superiority compared to the other optimization algorithms on biomedical classification, as illustrated in Table 14.
Along with the advantages, the proposed OBWOD also has some limitations, which are detailed below:
  • The features used by OBWOD may change each time it is run because it is an optimization strategy based on randomization. As a result, there is no assurance that the features subset chosen in one run will be present in another.
  • Because OBWOD was developed from BWO, it is computationally more expensive than the other seven MH algorithms (see Table 13).
  • Due to kNN’s simplicity, it was used as a learning algorithm. However, kNN has several drawbacks, such as the fact that it is a slow learner and noisy data can make it vulnerable.

6. Conclusions and Future Work

OBWOD was proposed in the paper. The framework of OBWOD’s search operations consists of two stages: Opposition-Based Learning (OBL) and Dynamic Candidate Solution (DCS) methodologies to balance exploration and exploitation properly. The proposed OBWOD departs from the local optimum. The exploratory population will improve its exploration capacity during the population evolution. The exploitative sub-population will hasten convergence and raise the precision of the solutions. Based on OBL, the tiny subpopulation will improve the population diversity. OBWOD resolved global optimization issues based on ten benchmark functions from CEC-01 to CEC-10. According to the Friedman rank, the CEC’22 test suite was used to gauge the program’s capacity to resolve challenging global optimization issues and prove its superiority compared to the other seven optimization algorithms. For ten disease datasets with various feature sizes, we applied the proposed OBWOD approach. The results of OBWOD and other optimizers’ classification accuracy tests were superior to those of the Friedman test. Compared to well-known optimization algorithms based on the kNN classifier, including the BWO, HHO, HGS, SCA, WOA, MFO, and INFO algorithms, the proposed OBWOD algorithm proved to be superior. OBWOD was proposed, which was better than BWO in terms of convergence speed and solution accuracy.
Future research can use the OBWOD algorithm to classify biomedical data using different classifiers, such as SVM and NNs. These studies can evaluate the classification precision and generality of the chosen features in various contexts. Other real-world issues can be addressed using the proposed OBWOD approach.

Author Contributions

E.H.H.: supervision, software, methodology, conceptualization, formal analysis, investigation, visualization, writing—review and editing. A.S.: supervision, methodology, conceptualization, investigation, visualization, formal analysis, writing—review and editing. All authors read and approved the final paper. All authors have read and agreed to the published version of the manuscript.

Funding

Ministry of Education and King Abdulaziz University, DSR, Jeddah, Saudi Arabia.

Institutional Review Board Statement

This article does not contain any studies with human participants or animals performed by any authors.

Data Availability Statement

Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.

Acknowledgments

This research work was funded by Institutional Fund Projects under grand no. (IFPIP: 532-611-1443). The authors gratefully acknowledge technical and financial support provided by the Ministry of Education and King Abdulaziz University, DSR, Jeddah, Saudi Arabia.

Conflicts of Interest

The authors declare that there is no conflict of interest.

References

  1. Houssein, E.H.; Mohamed, R.E.; Ali, A.A. Machine learning techniques for biomedical natural language processing: A comprehensive review. IEEE Access 2021, 9, 140628–140653. [Google Scholar] [CrossRef]
  2. Houssein, E.H.; Saber, E.; Ali, A.A.; Wazery, Y.M. Centroid mutation-based Search and Rescue optimization algorithm for feature selection and classification. Expert Syst. Appl. 2022, 191, 116235. [Google Scholar] [CrossRef]
  3. Chang, P.C.; Lin, J.J.; Liu, C.H. An attribute weight assignment and particle swarm optimization algorithm for medical database classifications. Comput. Methods Programs Biomed. 2012, 107, 382–392. [Google Scholar] [CrossRef]
  4. Houssein, E.H.; Abdelminaam, D.S.; Hassan, H.N.; Al-Sayed, M.M.; Nabil, E. A hybrid barnacles mating optimizer algorithm with support vector machines for gene selection of microarray cancer classification. IEEE Access 2021, 9, 64895–64905. [Google Scholar] [CrossRef]
  5. Abubakar, I.R.; Olatunji, S.O. Computational intelligence-based model for diarrhea prediction using Demographic and Health Survey data. Soft Comput. 2020, 24, 5357–5366. [Google Scholar] [CrossRef]
  6. Ioniţă, I.; Ioniţă, L. Prediction of thyroid disease using data mining techniques. BRAIN. Broad Res. Artif. Intell. Neurosci. 2016, 7, 115–124. [Google Scholar]
  7. Ganggayah, M.D.; Taib, N.A.; Har, Y.C.; Lio, P.; Dhillon, S.K. Predicting factors for survival of breast cancer patients using machine learning techniques. BMC Med. Inform. Decis. Mak. 2019, 19, 48. [Google Scholar] [CrossRef] [Green Version]
  8. Marquez, E.; Barrón, V. Artificial intelligence system to support the clinical decision for influenza. In Proceedings of the 2019 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC), Ixtapa, Mexico, 13–15 November 2019; pp. 1–5. [Google Scholar]
  9. Abdelrahim, M.; Merlos, C. Hybrid machine learning approaches: A method to improve expected output of semi-structured sequential data. In Proceedings of the 2016 IEEE Tenth International Conference on Semantic Computing (ICSC), Laguna Hills, CA, USA, 4–6 February 2016; pp. 342–345. [Google Scholar]
  10. Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent dirichlet allocation. J. Mach. Learn. Res. 2003, 3, 993–1022. [Google Scholar]
  11. Mandic, D.; Chambers, J. Recurrent Neural Networks for Prediction: Learning Algorithms, Architectures and Stability; Wiley: Hoboken, NJ, USA, 2001. [Google Scholar]
  12. Zhang, H. The optimality of naive Bayes. Aa 2004, 1, 3. [Google Scholar]
  13. Peng, C.Y.J.; Lee, K.L.; Ingersoll, G.M. An introduction to logistic regression analysis and reporting. J. Educ. Res. 2002, 96, 3–14. [Google Scholar] [CrossRef]
  14. Houssein, E.H.; Emam, M.M.; Ali, A.A.; Suganthan, P.N. Deep and machine learning techniques for medical imaging-based breast cancer: A comprehensive review. Expert Syst. Appl. 2021, 167, 114161. [Google Scholar] [CrossRef]
  15. Houssein, E.H.; Emam, M.M.; Ali, A.A. Improved manta ray foraging optimization for multi-level thresholding using COVID-19 CT images. Neural Comput. Appl. 2021, 33, 16899–16919. [Google Scholar] [CrossRef]
  16. Rajkomar, A.; Dean, J.; Kohane, I. Machine learning in medicine. N. Engl. J. Med. 2019, 380, 1347–1358. [Google Scholar] [CrossRef] [PubMed]
  17. Gao, S.; Calhoun, V.D.; Sui, J. Machine learning in major depression: From classification to treatment outcome prediction. CNS Neurosci. Ther. 2018, 24, 1037–1052. [Google Scholar] [CrossRef] [Green Version]
  18. Bey, R.; Goussault, R.; Grolleau, F.; Benchoufi, M.; Porcher, R. Fold-stratified cross-validation for unbiased and privacy-preserving federated learning. J. Am. Med. Inform. Assoc. 2020, 27, 1244–1251. [Google Scholar] [CrossRef] [PubMed]
  19. Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
  20. Mirjalili, S.; Lewis, A. The whale optimization algorithm. Adv. Eng. Softw. 2016, 95, 51–67. [Google Scholar] [CrossRef]
  21. Yang, Y.; Chen, H.; Heidari, A.A.; Gandomi, A.H. Hunger games search: Visions, conception, implementation, deep analysis, perspectives, and towards performance shifts. Expert Syst. Appl. 2021, 177, 114864. [Google Scholar] [CrossRef]
  22. Mirjalili, S. SCA: A sine cosine algorithm for solving optimization problems. Knowl.-Based Syst. 2016, 96, 120–133. [Google Scholar] [CrossRef]
  23. Heidari, A.A.; Mirjalili, S.; Faris, H.; Aljarah, I.; Mafarja, M.; Chen, H. Harris hawks optimization: Algorithm and applications. Future Gener. Comput. Syst. 2019, 97, 849–872. [Google Scholar] [CrossRef]
  24. Ahmadianfar, I.; Heidari, A.A.; Noshadian, S.; Chen, H.; Gandomi, A.H. INFO: An Efficient Optimization Algorithm based on Weighted Mean of Vectors. Expert Syst. Appl. 2022, 195, 116516. [Google Scholar] [CrossRef]
  25. Mirjalili, S. Moth-flame optimization algorithm: A novel nature-inspired heuristic paradigm. Knowl.-Based Syst. 2015, 89, 228–249. [Google Scholar] [CrossRef]
  26. Zhong, C.; Li, G.; Meng, Z. Beluga whale optimization: A novel nature-inspired metaheuristic algorithm. Knowl.-Based Syst. 2022, 251, 109215. [Google Scholar] [CrossRef]
  27. Hameed, N.; Shabut, A.; Hossain, M.A. A Computer-aided diagnosis system for classifying prominent skin lesions using machine learning. In Proceedings of the 2018 10th Computer Science and Electronic Engineering (CEEC), Colchester, UK, 19–21 September 2018; pp. 186–191. [Google Scholar]
  28. Bhattacharya, M.; Lu, D.Y.; Kudchadkar, S.M.; Greenland, G.V.; Lingamaneni, P.; Corona-Villalobos, C.P.; Guan, Y.; Marine, J.E.; Olgin, J.E.; Zimmerman, S.; et al. Identifying ventricular arrhythmias and their predictors by applying machine learning methods to electronic health records in patients with hypertrophic cardiomyopathy (HCM-VAr-risk model). Am. J. Cardiol. 2019, 123, 1681–1689. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  29. Ouyang, X.; Huo, J.; Xia, L.; Shan, F.; Liu, J.; Mo, Z.; Yan, F.; Ding, Z.; Yang, Q.; Song, B.; et al. Dual-sampling attention network for diagnosis of COVID-19 from community acquired pneumonia. IEEE Trans. Med. Imaging 2020, 39, 2595–2605. [Google Scholar] [CrossRef]
  30. Mohan, S.; Thirumalai, C.; Srivastava, G. Effective heart disease prediction using hybrid machine learning techniques. IEEE Access 2019, 7, 81542–81554. [Google Scholar] [CrossRef]
  31. Ghazal, T.M.; Taleb, N. Feature optimization and identification of ovarian cancer using internet of medical things. Expert Syst. 2022, 39, e12987. [Google Scholar] [CrossRef]
  32. Calp, M.H. Medical diagnosis with a novel SVM-CoDOA based hybrid approach. arXiv 2019, arXiv:1902.00685. [Google Scholar]
  33. Rahman, F.; Mahmood, M. A Dynamic Approach to Identify the Most Significant Biomarkers for Heart Disease Risk Prediction Utilizing Machine Learning Techniques. In Proceedings of the International Conference on Bangabandhu and Digital Bangladesh, Dhaka, Bangladesh, 30 December 2021; Springer: Berlin/Heidelberg, Germany, 2022; pp. 12–22. [Google Scholar]
  34. Hashim, F.A.; Hussain, K.; Houssein, E.H.; Mabrouk, M.S.; Al-Atabany, W. Archimedes optimization algorithm: A new metaheuristic algorithm for solving optimization problems. Appl. Intell. 2021, 51, 1531–1551. [Google Scholar] [CrossRef]
  35. Hashim, F.A.; Houssein, E.H.; Hussain, K.; Mabrouk, M.S.; Al-Atabany, W. Honey Badger Algorithm: New metaheuristic algorithm for solving optimization problems. Math. Comput. Simul. 2022, 192, 84–110. [Google Scholar] [CrossRef]
  36. Houssein, E.H.; Hosney, M.E.; Mohamed, W.M.; Ali, A.A.; Younis, E.M. Fuzzy-based hunger games search algorithm for global optimization and feature selection using medical data. Neural Comput. Appl. 2022, 1–25. [Google Scholar] [CrossRef]
  37. Fajri, D.M.N.; Mahmudy, W.F.; Anggodo, Y.P. Optimization of FIS Tsukamoto using particle swarm optimization for dental disease identification. In Proceedings of the 2017 International Conference on Advanced Computer Science and Information Systems (ICACSIS), Jakarta, Indonesia, 28–29 October 2017; pp. 261–268. [Google Scholar]
  38. Ahmad, G.N.; Fatima, H.; Ullah, S.; Saidi, A.S.; Imdadullah. Efficient medical diagnosis of human heart diseases using machine learning techniques with and without GridSearchCV. IEEE Access 2022, 10, 80151–80173. [Google Scholar] [CrossRef]
  39. Jalali, S.M.J.; Ahmadian, M.; Ahmadian, S.; Khosravi, A.; Alazab, M.; Nahavandi, S. An oppositional-Cauchy based GSK evolutionary algorithm with a novel deep ensemble reinforcement learning strategy for COVID-19 diagnosis. Appl. Soft Comput. 2021, 111, 107675. [Google Scholar] [CrossRef]
  40. Hsu, C.H.; Chen, X.; Lin, W.; Jiang, C.; Zhang, Y.; Hao, Z.; Chung, Y.C. Effective multiple cancer disease diagnosis frameworks for improved healthcare using machine learning. Measurement 2021, 175, 109145. [Google Scholar] [CrossRef]
  41. Cai, X.; Li, X.; Razmjooy, N.; Ghadimi, N. Breast cancer diagnosis by convolutional neural network and advanced thermal exchange optimization algorithm. Comput. Math. Methods Med. 2021, 2021, 5595180. [Google Scholar] [CrossRef]
  42. Bharti, R.; Khamparia, A.; Shabaz, M.; Dhiman, G.; Pande, S.; Singh, P. Prediction of heart disease using a combination of machine learning and deep learning. Comput. Intell. Neurosci. 2021, 2021, 8387680. [Google Scholar] [CrossRef]
  43. Ghosh, D.; Ghosh, E. Breast Mammography-based Tumor Detection and Classification Using Fine-Tuned Convolutional Neural Networks. Int. J. Adv. Res. Radiol. Imaging Sci. 2022, 1. [Google Scholar]
  44. Nadakinamani, R.G.; Reyana, A.; Kautish, S.; Vibith, A.; Gupta, Y.; Abdelwahab, S.F.; Mohamed, A.W. Clinical Data Analysis for Prediction of Cardiovascular Disease Using Machine Learning Techniques. Comput. Intell. Neurosci. 2022, 2022, 2973324. [Google Scholar] [CrossRef]
  45. Houssein, E.H.; Emam, M.M.; Ali, A.A. An optimized deep learning architecture for breast cancer diagnosis based on improved marine predators algorithm. Neural Comput. Appl. 2022, 34, 18015–18033. [Google Scholar] [CrossRef] [PubMed]
  46. Li, G.; Jimenez, G. Optimal diagnosis of the skin cancer using a hybrid deep neural network and grasshopper optimization algorithm. Open Med. 2022, 17, 508–517. [Google Scholar] [CrossRef]
  47. Reddy, K.V.V.; Elamvazuthi, I.; Abd Aziz, A.; Paramasivam, S.; Chua, H.N.; Pranavanand, S. Prediction of Heart Disease Risk Using Machine Learning with Correlation-based Feature Selection and Optimization Techniques. In Proceedings of the 2021 7th International Conference on Signal Processing and Communication (ICSC), Noida, India, 25–27 November 2021; pp. 228–233. [Google Scholar]
  48. Mehta, P.; Petersen, C.A.; Wen, J.C.; Banitt, M.R.; Chen, P.P.; Bojikian, K.D.; Egan, C.; Lee, S.I.; Balazinska, M.; Lee, A.Y.; et al. Automated detection of glaucoma with interpretable machine learning using clinical data and multimodal retinal images. Am. J. Ophthalmol. 2021, 231, 154–169. [Google Scholar] [CrossRef] [PubMed]
  49. Polat, K.; Güneş, S. Automatic determination of diseases related to lymph system from lymphography data using principles component analysis (PCA), fuzzy weighting pre-processing and ANFIS. Expert Syst. Appl. 2007, 33, 636–641. [Google Scholar] [CrossRef]
  50. Zhang, S.; Li, X.; Zong, M.; Zhu, X.; Wang, R. Efficient knn classification with different numbers of nearest neighbors. IEEE Trans. Neural Netw. Learn. Syst. 2017, 29, 1774–1785. [Google Scholar] [CrossRef] [PubMed]
  51. El-Kenawy, E.S.; Eid, M. Hybrid gray wolf and particle swarm optimization for feature selection. Int. J. Innov. Comput. Inf. Control 2020, 16, 831–844. [Google Scholar]
  52. Mafarja, M.; Aljarah, I.; Heidari, A.A.; Faris, H.; Fournier-Viger, P.; Li, X.; Mirjalili, S. Binary dragonfly optimization for feature selection using time-varying transfer functions. Knowl.-Based Syst. 2018, 161, 185–204. [Google Scholar] [CrossRef]
  53. Subasi, A. Use of artificial intelligence in Alzheimer’s disease detection. In Artificial Intelligence in Precision Health; Elsevier: Amsterdam, The Netherlands, 2020; pp. 257–278. [Google Scholar]
  54. Tizhoosh, H.R. Opposition-based learning: A new scheme for machine intelligence. In Proceedings of the International Conference on Computational Intelligence for Modelling, Control and Automation and International Conference on Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC’06), Vienna, Austria, 28–30 November 2005; Volume 1, pp. 695–701. [Google Scholar]
  55. Houssein, E.H.; Ibrahim, I.E.; Neggaz, N.; Hassaballah, M.; Wazery, Y.M. An Efficient ECG Arrhythmia Classification Method Based on Manta Ray Foraging Optimization. Expert Syst. Appl. 2021, 181, 115131. [Google Scholar] [CrossRef]
  56. Khodadadi, N.; Snasel, V.; Mirjalili, S. Dynamic arithmetic optimization algorithm for truss optimization under natural frequency constraints. IEEE Access 2022, 10, 16188–16208. [Google Scholar] [CrossRef]
  57. Mantegna, R.N. Fast, accurate algorithm for numerical simulation of Levy stable stochastic processes. Phys. Rev. E 1994, 49, 4677. [Google Scholar] [CrossRef]
  58. Frank. Machine Learning Repository. 2010. Available online: https://archive.ics.uci.edu/ml/index.php (accessed on 25 December 2022).
  59. Yan, F.; Xu, X.; Xu, J. Grey wolf optimizer with a novel weighted distance for global optimization. IEEE Access 2020, 8, 120173–120197. [Google Scholar] [CrossRef]
  60. Gajendra, E.; Kumar, J. A novel approach of ECG classification for diagnosis of heart diseases: Review. Int. J. Adv. Res. Comput. Eng. Technol. (IJARCET) 2015, 4, 4096–4100. [Google Scholar]
  61. Ahrari, A.; Elsayed, S.; Sarker, R.; Essam, D.; Coello, C.A.C. Problem Definition and Evaluation Criteria for the CEC’2022 Competition on Dynamic Multimodal Optimization. In Proceedings of the IEEE World Congress on Computational Intelligence (IEEE WCCI 2022), Padua, Italy, 18–23 July 2022; pp. 1–10. [Google Scholar]
  62. Friedman, M. A comparison of alternative tests of significance for the problem of m rankings. Ann. Math. Stat. 1940, 11, 86–92. [Google Scholar] [CrossRef]
  63. Vijayashree, J.; Sultana, H.P. A machine learning framework for feature selection in heart disease classification using improved particle swarm optimization with support vector machine classifier. Program. Comput. Softw. 2018, 44, 388–397. [Google Scholar] [CrossRef]
  64. Ismaeel, S.; Miri, A.; Chourishi, D. Using the Extreme Learning Machine (ELM) technique for heart disease diagnosis. In Proceedings of the 2015 IEEE Canada International Humanitarian Technology Conference (IHTC2015), Ottawa, ON, Canada, 31 May–4 June 2015; pp. 1–3. [Google Scholar]
  65. Tuncer, T.; Dogan, S.; Acharya, U.R. Automated detection of Parkinson’s disease using minimum average maximum tree and singular value decomposition method with vowels. Biocybern. Biomed. Eng. 2020, 40, 211–220. [Google Scholar] [CrossRef]
  66. Sharma, S.R.; Singh, B.; Kaur, M. Classification of Parkinson disease using binary Rao optimization algorithms. Expert Syst. 2021, 38, e12674. [Google Scholar] [CrossRef]
  67. Polat, K. A hybrid approach to Parkinson disease classification using speech signal: The combination of smote and random forests. In Proceedings of the 2019 Scientific Meeting on Electrical-Electronics & Biomedical Engineering and Computer Science (EBBT), Istanbul, Turkey, 24–26 April 2019; pp. 1–3. [Google Scholar]
  68. Wazery, Y.M.; Saber, E.; Houssein, E.H.; Ali, A.A.; Amer, E. An efficient slime mold algorithm combined with k-nearest neighbor for medical classification tasks. IEEE Access 2021, 9, 113666–113682. [Google Scholar] [CrossRef]
  69. Houssein, E.H.; Saber, E.; Ali, A.A.; Wazery, Y.M. Opposition-based learning tunicate swarm algorithm for biomedical classification. In Proceedings of the 2021 17th International Computer Engineering Conference (ICENCO), Cairo, Egypt, 29–30 December 2021; pp. 1–6. [Google Scholar]
Figure 1. Flowchart of proposed OBWOD algorithm.
Figure 1. Flowchart of proposed OBWOD algorithm.
Mathematics 11 00707 g001
Figure 2. The process of classifying medical datasets.
Figure 2. The process of classifying medical datasets.
Mathematics 11 00707 g002
Figure 3. Convergence curves of the proposed OBWOD and the compared optimization algorithms obtained on CEC’22 test functions.
Figure 3. Convergence curves of the proposed OBWOD and the compared optimization algorithms obtained on CEC’22 test functions.
Mathematics 11 00707 g003
Figure 4. Convergence curves of OBWOD and other optimization algorithms based on the kNN classifier.
Figure 4. Convergence curves of OBWOD and other optimization algorithms based on the kNN classifier.
Mathematics 11 00707 g004
Figure 5. Boxplots of OBWOD and other optimization algorithms based on the kNN classifier.
Figure 5. Boxplots of OBWOD and other optimization algorithms based on the kNN classifier.
Mathematics 11 00707 g005
Table 1. Related works in the literature based on ML and optimization algorithms.
Table 1. Related works in the literature based on ML and optimization algorithms.
Ref.DatasetsUsed TechniquesResults
[37]Dental diseaseUsing OBL, Cauchy mutation operators, and a modified version of Gaining– Sharing Knowledge (GSK).Average accuracy = 88%.
[38]Human heart diseasesGridSearchCV based on extreme gradient boosting classifierAccuracy = 99.03%.
[39]COVID-19Tsukamoto Fuzzy inference system with PSOAccuracy = 0.9914, precision = 0.9935, recall = 0.9814, F-measure = 0.9896, and AUC = 0.9903.
[40]Breast, cervical, and lung cancerA machine-learning-based feature modelingBreast accuracy = 99.62%, cervical accuracy = 96.88%, and lung accuracy = 98.21%.
[41]Breast cancerConvolutional neural networks and advanced thermal exchange optimization algorithmAccuracy = 93.79%, specificity = 67.7%, and recall = 96.89%.
[42]Heart diseasekNN with ANNAccuracy = 94.2%.
[43]Breast cancerVisual Geometry Group (VGG)-16 and VGG-19 pre-trained CNNsAccuracy = 97.1%, specificity = 97.9%, and recall = 96.3%.
[44]Cardiovascular diseaseREP tree, M5P tree, random tree, linear regression, NB, J48, and JRIPRandom tree’s accuracy = 100%.
[45]Breast cancerOptimized deep residual learning model, Improved Marine Predators Algorithm (IMPA), IMPA-ResNet50Accuracy = 98.32%.
[46]Skin cancerGrasshopper Optimization Algorithm (GOA) with AlexNet and extreme learning machine networkAccuracy = 98% and sensitivity = 93%.
[47]Heart diseaseSVM algorithm with X 2 statistical FSAccuracy = 89.7%.
[48]Glaucoma clinical dataDL model built on CFP classifiedAccuracy = 97%.
[49]LymphographyPrinciple Component Analysis (PCA), fuzzy weighting pre-processing, and Adaptive Neuro-Fuzzy Inference System (ANFIS)Accuracy = 88.83%.
Table 2. All used algorithms’ parameter settings.
Table 2. All used algorithms’ parameter settings.
AlgorithmParameterValue
INFO [24]c2
d4
WOA [20] a 1 [2, 0]
a 2 [−1, 1]
MFO [25] a [−1, −2]
SCA [22] b [2, 0]
HHO [23]escaping energy[0.5, 0.5]
BWO [26]whale fall W f [0.1, 0.05]
α 0.99
β 0.01
HGS [21]R ,
r 1 , r 2 [0, 1]
Table 3. Experimental runs’ details.
Table 3. Experimental runs’ details.
ItemSetting
l b 0
u b 1
m a x i t 100
N20
M20
k5
Table 4. Descriptions of disease datasets.
Table 4. Descriptions of disease datasets.
DatasetTotal FeaturesTotal PatientsCategoryFeature TypesSize
Arrhythmia279452ClassificationCategorical, Integer, RealHigh
Leukemia211,22672ClassificationInteger, RealHigh
Prostate Tumors10,510102ClassificationInteger, RealHigh
Statlog (Heart)13270ClassificationCategorical, RealMedium
Chronic Kidney Disease (CKD)25400ClassificationRealMedium
Parkinson’s23197ClassificationRealMedium
Pima Indians Diabetes (Pima)8768ClassificationInteger, RealMedium
Primary Tumor17339ClassificationCategoricallow
Lymphography18148ClassificationCategoricallow
Immunotherapy890ClassificationInteger, Reallow
Table 5. The description of the CEC’22 test suite functions [61].
Table 5. The description of the CEC’22 test suite functions [61].
TypeFunctionDescriptionSearch Range F i *
Unimodal FunctionCEC-01Shifted and fully rotated Zakharov function. [ 100 , 100 ] D 300
Basic FunctionsCEC-02Shifted and fully rotated Rosenbrock function. [ 100 , 100 ] D 400
Basic FunctionsCEC-03Shifted and fully rotated expanded Schaffer f6 Function. [ 100 , 100 ] D 600
basic FunctionsCEC-04Shifted and fully rotated non-continuous Rastrigin function. [ 100 , 100 ] D 800
Basic FunctionsCEC-05Shifted and fully rotated Levy function. [ 100 , 100 ] D 900
Hybrid FunctionsCEC-06Hybrid Function 1 (N = 3) [ 100 , 100 ] D 1800
Hybrid FunctionsCEC-07Hybrid Function 2 (N = 6). [ 100 , 100 ] D 2000
Hybrid FunctionsCEC-08Hybrid Function 3 (N = 5). [ 100 , 100 ] D 2200
Composition FunctionsCEC-09Composition Function 1 (N = 5) [ 100 , 100 ] D 2300
Composition FunctionsCEC-10Composition Function 2 (N = 4) [ 100 , 100 ] D 2400
Table 6. Comparison of the CEC’22 between OBWOD and other optimization algorithms based on the kNN classifier.
Table 6. Comparison of the CEC’22 between OBWOD and other optimization algorithms based on the kNN classifier.
CEC FunctionsMeasureBWOOBWODINFOHHOHGSSCAWOAMFO
CEC-01 μ 2.494610 × 10 4 2.152710 × 10 4 5.979706 × 10 4 7.571684 × 10 4 9.463143 × 10 12 9.558730 × 10 12 8.52330 × 10 12 3.087340 × 10 4
σ 1.953938 × 10 4 1.064252 × 10 4 2.311781 × 10 4 2.539657 × 10 4 9.558730 × 10 11 7.5746858 × 10 2 5.851858 × 10 3 1.947247 × 10 4
CEC-02 μ 6.564376 × 10 2 5.086089 × 10 2 1.508763 × 10 3 7.756663 × 10 2 7.433591 × 10 3 7.508678 × 10 2 7.69828 × 10 3 6.151887 × 10 2
σ 9.464901 × 10 2 1.439867 × 10 2 8.297016 × 10 2 1.040701 × 10 3 7.508678 × 10 2 9.14076 × 10 13 9.5896 × 10 12 3.057235 × 10 2
CEC-03 μ 6.303864 × 10 2 6.287108 × 10 2 6.668264 × 10 2 6.408925 × 10 2 7.527101 × 10 2 7.603132 × 10 2 7.569872 × 10 2 6.421989 × 10 2
σ 1.900605 × 10 1 1.244823 × 10 1 1.042105 × 10 1 2.797057 × 10 1 7.603132 × 10 1 4.570383 × 10 13 3.72589 × 10 13 2.024130 × 10 1
CEC-04 μ 8.965367 × 10 2 8.449059 × 10 2 9.676970 × 10 2 9.052959 × 10 2 1.066585 × 10 3 1.077359 × 10 3 1.188301 × 10 4 9.001926 × 10 2
σ 4.459035 × 10 1 2.369911 × 10 1 3.657403 × 10 1 4.566980 × 10 1 1.077359 × 10 2 2.371115 × 10 3 4.371115 × 10 2 4.345011 × 10 1
CEC-05 μ 2.363152 × 10 3 2.020499 × 10 3 3.100224 × 10 3 3.369690 × 10 3 1.038756 × 10 4 1.049249 × 10 4 3.887064 × 10 3 3.666241 × 10 3
σ 1.041466 × 10 3 5.239663 × 10 2 1.281234 × 10 3 1.946050 × 10 3 1.049249 × 10 3 7.312613 × 10 2 3.329509 × 10 3 8.850809 × 10 2
CEC-06 μ 6.790918 × 10 7 1.521993 × 10 7 1.513433 × 10 8 1.377625 × 10 8 8.770613 × 10 9 8.859205 × 10 9 1.263948 × 10 8 2.595407 × 10 7
σ 4.237511 × 10 8 1.003401 × 10 8 8.375683 × 10 8 4.769067 × 10 8 8.859205 × 10 8 1.341870 × 10 9 1.541873 × 10 9 1.113193 × 10 8
CEC-07 μ 2.123416 × 10 3 2.008917 × 10 3 2.205811 × 10 3 2.094071 × 10 3 2.664960 × 10 3 2.691879 × 10 3 3.265476 × 10 3 3.452825 × 10 3
σ 5.428055 × 10 1 4.403649 × 10 1 4.602230 × 10 1 7.801352 × 10 1 2.691879 × 10 2 2.691879 × 10 3 4.627033 × 10 2 4.452825 × 10 1
CEC-08 μ 2.251655 × 10 3 2.208762 × 10 3 2.280427 × 10 3 2.362581 × 10 3 2.230307 × 10 5 2.69876 × 10 5 2.252836 × 10 5 2.254398 × 10 3
σ 6.435948 × 10 1 4.460133 × 10 1 1.178931 × 10 2 6.281607 × 10 1 2.252836 × 10 4 3.78027 × 10 8 1.755027 × 10 10 4.393503 × 10 1
CEC-09 μ 2.544037 × 10 3 2.305232 × 10 3 2.782409 × 10 3 2.533574 × 10 3 6.551957 × 10 3 6.618138 × 10 3 6.618138 × 10 3 2.509326 × 10 3
σ 1.294362 × 10 2 5.583183 × 10 1 2.330702 × 10 2 1.580436 × 10 2 6.618138 × 10 2 6.618138 × 10 3 6.618138 × 10 3 7.913255 × 10 1
CEC-10 μ 4.133289 × 10 3 2.803257 × 10 3 6.313881 × 10 3 4.372988 × 10 3 1.081208 × 10 4 3.45629 × 10 3 1.092129 × 10 4 2.526238 × 10 3
σ 1.211022 × 10 3 1.037671 × 10 3 9.866894 × 10 2 1.439390 × 10 3 1.092129 × 10 3 1.828153 × 10 8 1.828153 × 10 11 1.725860 × 10 2
Friedman Rank μ 2.651.104.704.006.307.206.803.25
σ 4.752.404.706.006.404.154.353.25
Table 7. Comparison of the best fitness between OBWOD and the other optimization algorithms based on the kNN classifier.
Table 7. Comparison of the best fitness between OBWOD and the other optimization algorithms based on the kNN classifier.
DatasetsBWOOBWODINFOHHOHGSSCAWOAMFO
μ Fitness σ Fitness μ Fitness σ Fitness μ Fitness σ Fitness μ Fitness σ Fitness μ Fitness σ Fitness μ Fitness σ Fitness μ Fitness σ Fitness μ Fitness σ Fitness
Arrhythmia0.24650.00000.20380.00000.06700.24550.00010.00130.22010.00210.44800.49820.49100.50080.44800.4982
Leukemia20.06010.03070.05600.00050.13990.06060.09770.03330.10998.08340.24400.02460.24480.00060.24540.0000
Prostate Tumors0.05240.05380.00110.02500.21330.08280.15580.04020.17810.09180.27000.02720.29920.00390.31380.0061
Statlog (Heart)0.21490.00950.18540.00050.24390.04220.23570.02380.22430.04960.28510.02880.27090.00610.31240.0007
CKD0.21370.02660.20370.00140.28910.05630.31200.02680.24810.06850.36270.03660.35620.00670.39940.0025
Parkinson’s0.09260.01410.08100.00160.13750.03550.13120.03130.13270.04970.15170.01530.20250.00460.18760.0446
Pima0.25670.00040.200600000.29880.17150.26570.02690.26490.03540.29720.03000.31820.00780.29110.0072
Primary Tumor0.18910.04200.18010.00080.22510.00400.22190.00250.22750.52910.31120.03240.27900.00090.57930.0005
Lymphography0.12270.01280.11670.01980.12900.00700.18020.01990.13180.02060.14510.03150.18490.01430.45160.0188
Immunotherapy0.15330.01710.12190.00020.14040.04740.01290.03840.16020.05380.22340.02250.19070.00070.11980.0020
Friedman Rank2.603.851.401.854.406.003.305.004.007.406.455.557.003.206.853.15
Rank2 1 5 3 4 6 8 7
Table 8. Comparison of the accuracy between OBWOD and other optimization algorithms based on the kNN classifier.
Table 8. Comparison of the accuracy between OBWOD and other optimization algorithms based on the kNN classifier.
DatasetsBWOOBWODINFOHHOHGSSCAWOAMFO
μ ACC σ ACC μ ACC σ ACC μ ACC σ ACC μ ACC σ ACC μ ACC σ ACC μ ACC σ ACC μ ACC σ ACC μ ACC σ ACC
Arrhythmia0.48160.07750.65900.00140.57310.01260.58650.02120.52220.00080.45510.04870.22630.19400.47780.01949
Leukemia20.78320.01320.98830.002940.89020.04600.95550.04420.83560.03280.77790.07820.49440.21840.78570.0320
Prostate Tumors0.90910.09811.00000.00001.00000.00001.00000.00001.00000.00000.98070.10150.51260.20671.00000.0000
Statlog (Heart)0.55350.10650.88130.00040.87860.02380.86220.02790.75930.00500.46170.07060.47440.06490.59120.0891
CKD0.52310.19140.78170.001710.70220.01490.70380.01750.63390.00560.55950.06170.21240.22070.60660.0164
Parkinson’s0.76840.10900.99830.00060.99560.01100.99440.01520.91130.01290.67640.09870.68270.06150.82840.0521
Pima0.61090.05370.85450.00060.80790.00940.81130.00980.77920.00200.58160.06250.60050.03300.65160.0604
Primary Tumor0.61710.03980.88760.00030.85740.03480.88160.03680.76920.07500.49430.08740.56380.05250.59360.0682
Lymphography0.52690.01330.57020.00100.58890.02290.52880.03590.53250.05200.56720.05720.57070.00700.059180.0082
Immunotherapy0.70370.05020.90890.00010.88890.03200.88890.04200.88890.03200.63170.07000.63360.04130.69080.0879
Friedman mean rank3.105.807.201.306.403.256.304.405.203.352.107.002.105.903.605.00
Rank6 1 2 3 4 7 8 5
Table 9. Comparison of the FS between OBWOD and other optimization algorithms based on the kNN classifier.
Table 9. Comparison of the FS between OBWOD and other optimization algorithms based on the kNN classifier.
DatasetsBWOOBWODINFOHHOHGSSCAWOAMFO
μ FS σ FS μ FS σ FS μ FS σ FS μ FS σ FS μ FS σ FS μ FS σ FS μ FS σ FS μ FS σ FS
Arrhythmia0.51000.49991.00000.00000.50350.50000.49560.45720.49960.50000.49730.50000.49850.50000.52100.4996
Leukemia20.04480.00300.02800.00000.05290.21610.00830.0014NaNNaN0.49760.50000.49190.50000.49420.5000
Prostate Tumors0.04020.00000.02150.00000.00520.02780.00000.00000.32140.12590.49760.50000.49420.50000.49700.5000
Statlog (Heart)0.28380.08900.21490.00000.29490.27580.29100.00000.23210.00810.46150.51890.38460.50640.38460.5064
CKD0.29490.03200.22880.00000.28820.27060.00000.00060.27230.02510.40860.49250.48750.50070.50180.5009
Parkinson’s0.17790.00900.07790.00000.18170.39450.18000.00000.10230.27690.27270.45580.31820.47670.36360.4924
Pima0.25620.02000.20620.00000.29350.41980.32100.08000.23650.00210.37500.51750.12500.35360.37500.5175
Primary Tumor0.19380.02000.17280.00000.19430.29610.01370.19660.18230.00200.41180.50730.47060.51450.29410.4697
Lymphography0.45900.03000.36850.00000.35940.44860.461000.02300.42500.02300.38330.48510.38890.50160.39330.4851
Immunotherapy0.14060.00000.12060.00000.34490.00000.37800.13030.34490.28570.48800.42860.53450.00000.14290.0000
Friedman Rank4.203.402.501.504.154.953.402.853.353.606.256.955.656.506.506.25
Rank5 1 4 3 2 7 6 8
Table 10. Comparison of the sensitivity between OBWOD and other optimization algorithms based on the kNN classifier.
Table 10. Comparison of the sensitivity between OBWOD and other optimization algorithms based on the kNN classifier.
DatasetsBWOOBWODINFOHHOHGSSCAWOAMFO
μ SE σ SE μ SE σ SE μ SE σ SE μ SE σ SE μ SE σ SE μ SE σ SE μ SE σ SE μ SE σ SE
Arrhythmia0.91780.13681.00000.00000.91780.13680.99980.00170.98480.00000.89740.09510.38510.44230.92920.0233
Leukemia20.98140.00931.00000.00001.00000.00001.00000.00001.00000.00000.99000.09950.77220.27151.00000.0000
Prostate Tumors0.86140.19591.00000.00001.00000.00001.00000.00001.00000.00000.99000.09950.38400.29431.00000.0000
Statlog (Heart)0.54320.28870.95650.01780.54320.28870.95450.02030.85190.00000.16290.25470.28660.28530.64530.1052
CKD0.79150.32830.99990.00140.99150.32830.99960.00290.97710.00440.90600.09730.22360.37400.94790.0227
Parkinson’s0.61380.21660.99920.00060.99020.00920.99810.01380.90000.00500.46920.19130.35180.15200.68990.1290
Pima0.89950.00230.83450.03700.89650.00730.72420.01190.70930.07140.71790.00880.71770.00660.88680.0156
Primary Tumor0.56550.13540.79950.00900.60210.20310.75440.08560.50000.07500.00810.04180.00950.06170.18300.1446
Lymphography0.66540.01160.99770.01570.74540.03260.99520.03550.78040.05080.54130.08620.49230.16940.61540.0554
Immunotherapy0.54460.14180.79590.02830.56260.14180.78850.04660.60000.00000.56080.01410.00860.06880.11860.2228
Friedman Rank3.605.607.302.455.405.006.503.605.103.152.405.001.306.204.405.00
Rank6 1 3 2 4 7 8 5
Table 11. Comparison of the specificity between OBWOD and other optimization algorithms based on the kNN classifier.
Table 11. Comparison of the specificity between OBWOD and other optimization algorithms based on the kNN classifier.
DatasetsBWOOBWODINFOHHOHGSSCAWOAMFO
μ SP σ SP μ SP σ SP μ SP σ SP μ SP σ SP μ SP σ SP μ SP σ SP μ SP σ SP μ SP σ SP
Arrhythmia0.10100.01940.27340.00950.12300.01750.63670.35270.14410.00470.04810.02100.05710.03280.06380.0281
Leukemia20.98730.04261.00000.00000.97630.05261.00000.00001.00000.00000.98220.10280.61970.27441.00000.0000
Prostate Tumors0.98840.04641.00000.00000.97540.04541.00000.00001.00000.00000.99000.09950.61880.31061.00000.0000
Statlog (Heart)0.44030.18050.83270.07240.45630.17250.88090.08060.66670.00000.23540.11360.31740.09390.48890.1096
CKD0.18460.01750.42470.00990.18460.01750.58380.26780.24740.01520.13330.03320.11180.05350.13730.0262
Parkinson’s0.81300.08420.99720.00960.83300.09520.99710.01110.93920.01470.70920.08680.72560.05490.87760.0583
Pima0.27780.08830.77940.00030.26380.04360.67360.03380.51050.00250.21300.04370.21700.05600.34300.0918
Primary Tumor0.86670.04730.99940.00030.74520.03210.99920.00680.94580.02130.67100.11430.76800.06830.76190.0889
Lymphography0.13800.11880.60930.00810.124100.30280.66210.07650.42860.00000.05120.04860.01910.06280.11420.1176
Immunotherapy0.91290.03340.99990.00240.91290.03340.99920.00770.92610.01500.72870.10100.76400.07030.88080.0683
Friedman Rank4.005.607.301.603.505.307.293.806.102.101.706.201.806.104.305.30
Rank5 1 6 2 3 8 7 4
Table 12. Comparison of the precision between OBWOD and other optimization algorithms based on the kNN classifier.
Table 12. Comparison of the precision between OBWOD and other optimization algorithms based on the kNN classifier.
DatasetsBWOOBWODINFOHHOHGSSCAWOAMFO
μ PPV σ PPV μ PPV σ PPV μ PPV σ PPV μ PPV σ PPV μ PPV σ PPV μ PPV σ PPV μ PPV σ PPV μ PPV σ PPV
Arrhythmia0.40430.17980.63470.00900.52430.21720.54420.01260.50390.01000.46320.04790.19220.23120.48060.0095
Leukemia20.62140.01730.77650.00200.62140.01730.86050.07700.67960.04350.61880.06220.37860.17080.62500.0034
Prostate Tumors0.98300.07661.00000.00000.97400.05261.00000.00001.00000.00000.99000.09950.67460.23131.00000.0000
Statlog (Heart)0.55080.09430.83220.00150.66080.07830.85140.03660.71950.00290.46190.06280.49650.05540.58080.0718
CKD0.48510.23890.67060.00800.48510.23890.66590.00850.6220.00320.57630.05980.22840.27860.60750.0078
Parkinson’s0.82790.07570.99940.00040.85290.04320.99920.00520.96370.00070.79340.09610.79300.03750.88330.0373
Pima0.36370.11460.75930.00040.36370.11460.75610.01610.72730.00080.27680.06850.31720.07510.45150.1420
Primary Tumor0.54310.14140.99950.00910.62410.20300.99710.02160.85350.05250.05850.15870.13850.21140.39400.2076
Lymphography0.23970.00170.37380.00140.23970.00170.37030.02330.25950.01310.19400.03200.17830.04490.21660.0181
Immunotherapy0.72390.03540.91980.00040.72390.03540.91910.01320.86670.03000.69080.07030.70410.02200.72900.0555
Friedman Rank3.255.557.651.353.955.557.153.655.952.852.106.001.406.504.554.55
Rank6 1 5 2 3 7 8 4
Table 13. Comparison of the time consumption between OBWOD and the other optimization algorithms based on the kNN classifier.
Table 13. Comparison of the time consumption between OBWOD and the other optimization algorithms based on the kNN classifier.
DatasetsBWOOBWODINFOHHOHGSSCAWOAMFO
μ Time μ Time μ Time μ Time μ Time μ Time μ Time μ Time
Arrhythmia10.623017.330411.051612.324010.13209.326810.658710.6861
Leukemia222.584051.754045.515332.653230.467146.007929.282651.6296
Prostate Tumors42.105443.131162.334732.365742.620158.254944.826466.4606
Statlog (Heart)3.44288.48557.67116.23397.09387.63337.66867.6215
CKD7.983717.713911.14299.29488.83179.39868.812610.9263
Parkinson’s7.31208.27127.89907.22448.09217.51917.50827.3047
Pima7.11469.30848.72009.25498.10868.11278.12648.3065
Primary Tumor6.01608.23617.34950.24457.09047.44787.39797.1740
Lymphography6.30348.47897.56847.22030.09267.53507.53967.3110
Immunotherapy6.08718.12767.50587.23147.09177.27107.28607.2984
Friedman Rank1.707.606.303.302.904.704.405.10
Rank18732546
Table 14. Comparison of OBWOD and MH algorithms on biomedical classification.
Table 14. Comparison of OBWOD and MH algorithms on biomedical classification.
Ref.DatasetsUsed TechniquesComparative Algorithm AccuracyOBWOD Accuracy
[63]Heart diseaseSVM with PSO called (PSO-SVM)88.13%88.13%
[2]Leukemia2Centroid-mutation-based search-and-rescue optimization algorithm95.60%98.83%
[64]Heart diseaseExtreme Learning Machine (ELM) algorithm80%88.13%
[65]Parkinson’s diseaseMinimum average maximum tree and SVD with kNN classifier92.46%99.83%
[66]Parkinson’s diseaseBinary Rao1 with a kNN classifier96.47%99.83%
[2]Primary TumorCentroid-mutation-based search-and-rescue optimization algorithm44.46%88.76%
[67]Parkinson’s diseaseSMOTE + random forests98.32%99.83%
[68]Leukemia2Slime Mold Algorithm (SMA) integrated with OBL based on kNN classifier90.59%98.83%
[2]Statlog (Heart)Centroid-mutation-based search-and-rescue optimization algorithm86.67%88.13%
[69]Primary TumorTunicate Swarm Algorithm (TSA) integrated with OBL based on the kNN classifier82.87%88.76%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Houssein, E.H.; Sayed, A. Dynamic Candidate Solution Boosted Beluga Whale Optimization Algorithm for Biomedical Classification. Mathematics 2023, 11, 707. https://doi.org/10.3390/math11030707

AMA Style

Houssein EH, Sayed A. Dynamic Candidate Solution Boosted Beluga Whale Optimization Algorithm for Biomedical Classification. Mathematics. 2023; 11(3):707. https://doi.org/10.3390/math11030707

Chicago/Turabian Style

Houssein, Essam H., and Awny Sayed. 2023. "Dynamic Candidate Solution Boosted Beluga Whale Optimization Algorithm for Biomedical Classification" Mathematics 11, no. 3: 707. https://doi.org/10.3390/math11030707

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop