Next Article in Journal
Strength Properties and Damage Evolution Mechanism of Single-Flawed Brazilian Discs: An Experimental Study and Particle Flow Simulation
Previous Article in Journal
Double-Controlled Quasi M-Metric Spaces
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Chaotic-Based Interactive Autodidactic School Algorithm for Data Clustering Problems and Its Application on COVID-19 Disease Detection

by
Farhad Soleimanian Gharehchopogh
* and
Aysan Alavi Khargoush
Department of Computer Engineering, Urmia Branch, Islamic Azad University, Urmia 5716963896, Iran
*
Author to whom correspondence should be addressed.
Symmetry 2023, 15(4), 894; https://doi.org/10.3390/sym15040894
Submission received: 18 February 2023 / Revised: 13 March 2023 / Accepted: 7 April 2023 / Published: 10 April 2023
(This article belongs to the Section Computer)

Abstract

:
In many disciplines, including pattern recognition, data mining, machine learning, image analysis, and bioinformatics, data clustering is a common analytical tool for data statistics. The majority of conventional clustering techniques are slow to converge and frequently get stuck in local optima. In this regard, population-based meta-heuristic algorithms are used to overcome the problem of getting trapped in local optima and increase the convergence speed. An asymmetric approach to clustering the asymmetric self-organizing map is proposed in this paper. The Interactive Autodidactic School (IAS) is one of these population-based metaheuristic and asymmetry algorithms used to solve the clustering problem. The chaotic IAS algorithm also increases exploitation and generates a better population. In the proposed model, ten different chaotic maps and the intra-cluster summation fitness function have been used to improve the results of the IAS. According to the simulation findings, the IAS based on the Chebyshev chaotic function outperformed other chaotic IAS iterations and other metaheuristic algorithms. The efficacy of the proposed model is finally highlighted by comparing its performance with optimization algorithms in terms of fitness function and convergence rate. This algorithm can be used in different engineering problems as well. Moreover, the Binary IAS (BIAS) detects coronavirus disease 2019 (COVID-19). The results demonstrate that the accuracy of BIAS for the COVID-19 dataset is 96.25%.

1. Introduction

One of the main scientific fields of machine learning and data mining is data clustering. It involves separating several objects into groups of things [1]. In other words, data clustering is a branch of unsupervised learning and an automatic process that divides samples into categories whose members are similar. Data clustering aims to illustrate an extensive dataset with fewer initial samples or clusters; this simplifies the data in modeling and plays a significant role in exploration and data mining. Clustering means identifying similar classes of objects. It is possible to identify further the dense and scattered areas in the object space, discover the general distribution pattern, and find the correlation properties between the data. Clustering techniques combine visible samples within clusters that meet two main criteria: (1) each group or cluster is homogeneous; and (2) each group or cluster must be different from other clusters. The most crucial clustering techniques are hierarchical, distribution, partition, density, fuzzy, and graph-based clustering [2,3].
Using asymmetric similarities and dissimilarities is one solution to data clustering. In order for them to accurately reflect the hierarchical asymmetric relationships between items in the studied dataset, they need to be applied in algorithms in the appropriate manner. Hence, it should be ensured that their use is in line with the data’s hierarchical linkages. Asymmetric coefficients and cluster coefficients that are put into the formulas for symmetric measures can be used to accomplish this. By building the asymmetric measurements off of the symmetric ones, we can do so. The hierarchy’s consistency should be guaranteed by the asymmetry coefficients and cluster coefficients. As they should guarantee greater values in the direction—from a more specific notion to a more generic one—in the event of similarities, they should.
Clustering means assigning samples to different cluster centers based on proximity and intra-cluster similarity. K-means clustering is widely used as one of the classical methods due to its easy implementation and low computational efficiency for clustering problems [4]. However, for K-means clustering, the number of clusters should be specified beforehand. While in many practical applications, users usually have no information about the number of clusters. If the clustering algorithm tries to test a different number of clusters to find the optimal state, finding the correct number will be time-consuming and challenging. Therefore, to overcome the above phenomenon, intelligent clustering methods should automatically determine the optimal number of clusters and obtain better partitioning [5].
Optimization algorithms are critical computational tools in engineering, and their application has grown significantly over the past decades. Analytical and metaheuristic methods differentiate the optimization algorithms. Analytical approaches, also called gradient-based algorithms, are deterministic and always offer the same optimal solution using the same starting point [6]. Although these numerical methods work well in solving optimization problems, they have three significant drawbacks compared to metaheuristic methods [7]. First, numerical methods cannot be used when the fitness function and constraints are discrete since their gradients are not defined. Second, numerical methods may get trapped in local minimums due to their dependence on the value of the starting point. Finally, the numerical methods are unstable and unreliable when the fitness function and constraints have multiple or sharp peaks. Researchers have turned to new stochastic approaches with specific features instead of traditional analytical techniques to solve complex engineering optimization problems.
Metaheuristic algorithms are essential in solving optimization problems; they are among the most successful methods in solving various complex optimization problems. These algorithms provide the optimal solution for optimization problems. The systems and behavior of animals often inspire metaheuristic algorithms in nature, such as flocks of birds, ant colonies, and fish schools. The behavior of the members of these algorithms is based on the behavior of the inspiring being in nature in terms of searching for the best food sources. Most metaheuristic optimization algorithms have similar characteristics: stochastic or random walk algorithms, independent of gradient information, iterative methods, and applied to continuous and discrete problems. The performance of any metaheuristic algorithm depends on the complexity of the cost function and the constraints that define the functional search space. Metaheuristic algorithms have been used to solve various optimization problems; they have been successful in many optimization problems, including clustering. Classical clustering algorithms such as k-means often converge to local optima and have slow convergence rates for larger datasets. Clustering-based algorithms use swarm-based metaheuristic methods to overcome such issues. Swarm or population-based metaheuristic approaches strive to achieve the optimal solution in clustering at a reasonable time [8].
The IAS is a novel metaheuristic algorithm proposed by Jahangiri in 2020 [9]. It simulates the interactions of a group of students trying to learn without the help of a teacher; thus, an autodidactic school sequence is created. To explore the search space looking for the optimal solution, the IAS, as with other population-based algorithms, iteratively uses a population in which the students’ leader is called the leader and the rest of the community is called the followers. This paper implements an improved IAS based on chaotic maps on various clustering datasets.
The proposed model is appraised on different benchmark test functions for analyzing its efficiency and accuracy. The experimental results demonstrated that the performance of the proposed model is improved in terms of global search and convergence rate. The proposed model is analyzed considering statistical criteria such as best, worst, lowest, and standard deviation. Moreover, its convergence is compared with other metaheuristic algorithms such as the Artificial Bee Colony (ABC) [10], Bat Algorithm (BA) [11], Crow Search Algorithm (CSA) [12], and Artificial Electric Field Algorithm (AEFA) [13]. Then, the IAS is developed to transfer the continuous search space to a binary one using the S-shaped transfer function. Furthermore, the BIAS was applied in a case study to detect coronavirus disease 2019 (COVID-19). The experimental results prove that BIAS is more efficient than other comparative algorithms in searching the problem space and selecting the most compelling features. The contributions of this paper are as follows:
Increasing the discovery of the optimal solution in the proposed model with a balance between exploration and exploitation by chaotic maps
They provide an improved version of the IAS for the data clustering problem based on chaotic maps.
Evaluation of the proposed model on 20 UCI datasets
Assessment of the proposed model based on fitness function and convergence rate
Developing the BIAS as the binary version of IAS using the V-shaped transfer function to find valuable features from COVID-19.
Comparison of the proposed model with ABC, BA, CSA, and AEFA
We are applying the BIAS in a case study to detect COVID-19.
The rest of the paper is organized as follows. In Section 2, related works in clustering by metaheuristic algorithms are surveyed. Section 3 describes material and methods such as the IAS algorithm and chaotic maps. Section 4 proposes a new version of the improved IAS algorithm based on chaotic maps for data clustering. In Section 5, the performance of the proposed model is compared with other algorithms on the clustering dataset. Section 6 establishes the actual application of the proposed BIAS for extracting essential features from the COVID-19 dataset. Finally, Section 7 provides concluding remarks and suggestions for future research.

2. Related Works

This section presents the subject’s background and related literature in data clustering using metaheuristic algorithms. Here, the aim is to review recent data clustering improvements using metaheuristic algorithms. Therefore, the related works are presented below in the order of publishing time.
Ahmadi et al. [14] presented an improved version of the Grey Wolf Optimizer (GWO) algorithm for clustering problems. A modified GWO has been proposed to address some metaheuristic algorithms’ challenges. This modification includes a balancing approach between exploring and exploiting the GWO and a local search for the best solution. The results show that the proposed model has a lower intra-cluster distance than other algorithms and a mean error of about 11%, which is the lowest among all comparison algorithms.
Ashish et al. [15] proposed a fast and efficient parallel BA for data clustering using a mapping reduction architecture. The parallel BA is very efficient and helpful since it uses an evolutionary approach to clustering instead of other algorithms, such as k-means; it also enjoys high speed due to Hadoop architecture. The results of various experiments show that the parallel BA performs better than Particle Swarm Optimization (PSO); it performs faster than other comparative algorithms when the number of nodes increases.
The applicability of the Cuttlefish Algorithm (CFA) to clustering issues has been examined in this study [16]. Additionally, it has demonstrated that the CFA can find the optimal cluster centers. The technique has prevented the cluster centers from readily becoming trapped in a local minimum, a significant drawback of the K-means. The CFA was used as a search method to reduce the clustering metrics. Based on the Shapes and UCI real-world datasets, the performance of the CFA-Clustering model has been assessed. The three well-known algorithms, Genetic Algorithm (GA), PSO, and K-means, were compared. The empirical findings show that, for the most part, the CFA-Clustering approach outperforms the other methods.
An asymmetric version of the k-means clustering algorithm [17] arises caused by usage of dissimilarities, which are asymmetric by definition (for example, the Kullback–Leibler divergence).
Cuckoo and krill herd algorithms are utilized on k-means++ to improve cluster quality and create optimized clusters [18]. Performance parameters such as accuracy, error rate, f-measure, CPU time, standard deviation, cluster quality check, and so forth are used to measure the clustering potentialities of these algorithms. The results presented the high performance of the newly designed algorithm.
Zhang et al. [19] proposed an improved K-means algorithm based on canopy density in 2018 to improve the K-means algorithm’s accuracy and stability and determine the most appropriate number of K clusters and the best initial data. An enhanced K-means method based on density Canopy is suggested to enhance the accuracy and stability of the K-means algorithm and to address the issue of selecting the best-starting seeds and the optimal number of K of clusters. The first step is to compute the density of the sample data sets, the average sample distance inside clusters, and the distance between clusters. The density maximum sampling point is then selected as the first cluster center, and the density cluster is then removed from the sample datasets. The K-means technique uses the density Canopy as a pre-processing step, and the output is utilized to determine the cluster number and starting clustering center. Comparative results show that the improved K-means algorithm based on canopy density has obtained better clustering results. The improved K-means algorithm based on canopy density is less sensitive to noisy data than the K-means algorithm, the canopy-based K-means algorithm, the semi-supervised K-means++ algorithm, and the K-means-u algorithm. The clustering accuracy of the proposed canopy density-based K-means algorithm is improved by an average of 30.7%, 6.1%, 5.3%, and 3.7% in the UCI dataset, respectively, and by 44.3%, 3.6%, 9.6%, and 8.9%, respectively, in the simulated dataset with the improved noise signal. It enjoys a more accurate performance than comparative algorithms.
To use the advantages of the two ABC and the K-means algorithms, Kummer et al. [20] proposed a hybrid algorithm combining these two algorithms, called the MABCKM Algorithm. The Hybrid MABCKM Algorithm modifies the solutions generated by ABC and considers them as the initial solutions for the K-means algorithm. According to the results obtained from comparing the performance of MABCKM, K-means, and ABC algorithms on different datasets taken from the UCI repository, it is clear that MABCKM outperforms other comparative algorithms.
The Whale Clustering Optimization Algorithm(WOA) was proposed for clustering data [21]. The results of WOA are compared with the well-known k-means clustering method and other standard stochastic algorithms such as PSO, ABC, Differential Evolution (DE), and GA clustering. The proposed model was checked using one artificial and seven real benchmark datasets from the UCI. Simulations have proven that the proposed model could successfully be used for data clustering.
Qaddoura et al. [22] presented an improved version of the GA’s evolutionary behavior as well as the advanced performance of the nearest neighbor search technique for clustering problems based on allocation and selection mechanisms. The success of evolutionary algorithms in solving various machine learning problems, including clustering, has been proven. The proposed model’s objective was to improve the quality of clustering results by identifying a solution that maximizes differentiation between different clusters and coherence between data points within the same cluster. Various experiments show that the proposed model works well with the Silhouette coefficient’s fitness function and outperforms other algorithms.
Zhou et al. [23] presented an enhanced version of the symbiotic organism search (SOS) algorithm to solve data clustering. It evokes the symbiotic interaction strategies used by organisms in the ecosystem to survive and spread. This paper implemented the proposed model on ten standard UCI machine-learning repository datasets. Various experiments showed that the SOS algorithm performed better than other algorithms in accuracy and precision.
Rahnema and Gharehchopogh proposed an improved ABC based on the whale optimization algorithm for data clustering in 2020 [2]. In this paper, two random and elite memories are used in the ABC to overcome the problem of exploration and late convergence. Finally, the proposed model was evaluated by being implemented on ten standard datasets taken from the UCI machine learning repository. Ewees et al. presented an improved version of the Multi-Verse Harris-Hawk optimization (CMVHHO) [24]. The primary purpose of this algorithm was to use chaotic maps to determine optimized values of the main parameters of the Harris algorithm. In addition, it was used as a local search approach to improving the ability to exploit the search space. It was tested using several different chaotic maps. Experimental results show that the Circle chaotic point is the best function among all available functions since it has improved the performance of the proposed model and has a positive effect on the behavior of the proposed model.
Chen et al. presented a chaotic-based dynamic weighted PSO algorithm [25]. The proposed model introduces a chaotic map and an emotional weight for modifying the search process. Dynamic weight is a fitness function that increases the search accuracy and performance of the proposed model. Various experiments show that the proposed model outperformed nature-inspired and PSO algorithms in almost all functions.
To overcome the shortcomings of the Fruit Fly Optimization (FFO) algorithm [26], Zhang et al. proposed a new version of the FFO using the Gaussian mutation operator and the local chaotic search strategy. The Gaussian mutation operator is integrated into the FFO algorithm to prevent premature convergence and improve the exploration process. Then, a chaotic local search approach is adopted to increase the group’s local search ability; the results prove that the proposed model works better than the basic FFO algorithm.
In this section, important clustering literature using metaheuristic algorithms was reviewed. Most of these works have considered the clustering problem an optimization problem and applied a metaheuristic algorithm to solve it; in addition, the fitness function of the intra-cluster dataset was used as the fitness function. Some authors have used a combination of genetic operators and other methods, while others have employed chaotic and quantum mapping to improve exploitation and convergence. Considering the literature reviewed in this paper, an enhanced version of the IAS based on chaotic maps is proposed for the clustering problem.

3. Material and Method

3.1. IAS Algorithm

As with other population-based algorithms, IAS randomly generates an initial population called students [9]. A specific problem’s upper and lower limit values determine students’ eligibility for inclusion in the IAS. The student with the highest performance (minimum score) in each step gets the position of “leader student” or simply “leader”. In IAS optimization, the best performance is achieved when the minimum value of the cost function is performed. However, this position can be reassigned to another more skilled student at any point in the process. The method of student generation and assessment of student eligibility in school can be described as Algorithm 1.
Algorithm 1 The method of student generation and assessment of student eligibility
  1 :   F o r   i = : N _ s t u d e n t
  2 :   S i = L B + r i 0 , 1 U B L B ;   M i = | f S i
3 :   E n d   F o r
4 :   f L S = m
where Si is the ith generated student, L B , and U B are the lower and upper limits of the variables, respectively, ri (0, 1) is a random number between 0 and 1, N_student is the number of students, Mi is the score of the ith student, and LS is the leader student. Autodidactic/self-learning sessions in this interactive school are held in three stages: individual training, group training, and new student challenges.
Individual Training Session: First, a random group of two follower students is selected. Then, they discuss it one by one with the leader student. The student’s knowledge will increase in peer-to-peer discussions with the leader. Accordingly, an individual training session can be formulated as described in Algorithm 2:
Algorithm 2 Individual Training Session
1: F o r   i = 1 : N s t u d e n t
2 : R a n d o m l y   s e l e c t   o n e   s t u d e n t   S j . w h e r e   i j
3: T S i * = T S i + r i 1.2 L S I C i T S i ;
4: T S j * = T S j + r j 1.2 L S I C j T S j ;
5:   E n d   f o r
6 : A c c e p t   T S i *   a n d   T S j *   i f   t h e y   i t a c h i e v e s   b e t t e r   m a r k s   t h a n
where TSi and TSj are the first and second follower students, respectively; ICi and ICj are the inherent competencies of the first and second students, respectively; ri (1, 2) and rj (1, 2) are two different random vectors between 1 and 2. Individual competencies (ICi and ICj) are randomly determined as 1 or 2.
Collective Training Session: After the individual training session, each follower student has the opportunity to review the contents of the last session and interact with other follower students in the same group to resolve the unclear points of the lesson. In addition to the knowledge level of individually trained students, their social abilities, such as communication skills, teamwork, and collaboration, referred to as collective competencies, can significantly impact the effectiveness of group learning. Accordingly, the group training session can be formulated as described in Algorithm 3.
Algorithm 3 Collective Training Session
1: F o r   i = 1 : N s t u d e n t
2: C C i j = C C i × T S i + C C j × T S j / C C i + C C j ;
3: T S i * = T S i + r i 1.2 × L S C C i × C C i j ;
4 : T S j * = T S j + r j 1.2 × L S C C j × C C i j ;
5: E n d   f o r
6 : A c c e p t   T S i *   a n d   T S j *   i f   t h e y   i t a c h i e v e s b e t t e r   m a r k s   t h a n   T S i   a n d   T S j
where CCij is defined as the collective ability of the group as a team, based on the weighted average of students’ competencies. Moreover, ri (1, 2) and rj (1, 2) are two different random vectors between 1 and 2. Students’ collective competencies (CCi and CCj) are randomly set as 1 or 2.
Challenge of the New Student: In some optimization problems, due to the complex nature of the cost function, the gradual improvement of follower students may be limited to a specific area of design space solely around the leader student (i.e., the current temporary/local optima). However, it is still far from a permanent/global optimization. Accordingly, a bad operating loop hinders the optimization process and will probably fail to find the global optimum. The new student challenge is introduced to complement the algorithm to provide a more dynamic and exploratory IAS, creating an ongoing rebellion against the current leader. If the new student has more skills than the current leader student, they will take on the role of leader. The new student challenge can be formulated as described in Algorithm 4.
Algorithm 4 New student challenge
1: N S = L B + R × U B L B ;
2 : M F 1 = r o u n d r 0.1 ;
3: M F 2 = 1 M F 1
4: L S * = M F 1 × L S + M F 2 × N S ;
5: A r c h i e v e s   a   b e t t e r   m a r k   t h a n   L S
where NS is a new student; MF1 and MF2 are the first and second corrective factors, respectively; r(0, 1) is a random vector between 0 and 1. In addition, LS Student is the new leader of the school.
The process (including all three sessions) is repeated until the termination criteria are met. At the end of the process, each student has to have communicated with the leader at least once. In both individual and group training sessions, groups of two students are randomly selected in the search space to interact with the leader and themselves. Proper selection of regulatory parameters, such as the number of students and several iterations, can lead to faster detection of a global optimum. The more students exist in the autodidactic school, the more likely there will be elite students among them. In addition, the number of sessions held is equal to the number of students in the school. Hence, the population in this IAS has a significant effect on increasing the knowledge level of students.

3.2. Chaotic Maps

Most chaotic maps have been used to solve various stochastic and optimization algorithms problems [27]. This section introduces ten chaotic maps to improve the IAS. Each chaotic map has unique features, described and formulated in Table 1. The whole set of chaotic maps employed in this paper is selected with an initial point of 0.7 with different behaviors. The initial point in chaotic maps can be any number between 0 and 1.
Table 1 lists the proposed chaotic maps to improve the IAS. The proposed model uses chaotic maps to create the initial population and generate random parameters.

4. Proposed Model

The IAS is one of the most successful optimization algorithms. However, it fails to work effectively in global optimization and finding the best solution. The main reason may be the generation of an inadequate initial population and random parameters. Due to the ergodic nature and lack of correct iteration of chaotic maps, better global and local searches can be performed than random searches that rely primarily on probability. As a result, this paper presents different versions of the IAS based on other chaotic maps to solve the clustering problem. The flowchart of the proposed model is shown in Figure 1.

4.1. Pre-Processing

The pre-processing step includes data conversion and data normalization. For datasets where the data is of string type, the label-encoder method is used to convert string data to numeric data. Once the string data is converted to numeric data, the data normalization is carried out. The MinMax method is the most popular standard normalization method that transfers data to the space between 0 and 1, as given in Equation (1).
X n o r m a l = X v a l u e M i n X v a l u e M i n X v a l u e M a x X v a l u e
In Equation (1), Xvalue is the initial value of a feature in the dataset, and Xnormal refers to the normalized feature. The MaxXvalue and MinXvalue parameters represent the feature’s largest and smallest values.
A dataset D = x 1 . L 1 .   x 2 . L 2 . . x m . L m with m samples is defined according to Equation (2).
D = x 11   x 12     x 1 d   L 1   x 21   x 22     x 2 d   L 2                 x m 1   x m 2     x m d   L m  
In Equation (2) x i . L i is the ith samples of D, x i = x i 1 . x i 2 . . x i d is the information of the ith sample, and L i shows the label of the ith sample.

4.2. Chaotic-Based Population Generation

First, the IAS based on chaotic maps must generate a suitable initial population to improve the algorithm’s convergence rate. Therefore, student generation and assessment of students’ competence in school can be described as Equation (3) according to the chaotic maps.
S i j = l b i + c h o m a p i 0.1 × u b l b i
where Si is the ith generated student; lb and ub are the lower and upper bounds, respectively; ri (0, 1) is a number generated by chaotic maps (listed in Table 1) between 0 and 1. Thus, the IAS generates a population based on chaotic maps from the very beginning.

4.3. Chaotic-Based Individual Training Session

In the second step, the IAS uses chaotic sequences instead of random numbers to improve the convergence speed of the algorithm in different iterations. Therefore, according to the chaotic maps, the individual training session can be described as Algorithm 5.
Algorithm 5 Chaotic-Based Individual Training
1: F o r   i = 1 : N s t u d e n t
2 : R a n d o m l y   s e l e c t   o n e   s t u d e n t   S j . w h e r e   i j
3:   h j = 1 + c h o m a p j 0.1
4 : h i = 1 + c h o m a p i 0.1
5 : T S i * = T S i + 1 + c h o m a p i 0.1 L S I C i T S i ;
6: T S j * = T S j + 1 + c h o m a p j 0.1 L S I C j T S j ;
7 : E n d   f o r
8 : A c c e p t   T S i *   a n d   T S j *   i f   t h e y   i t a c h i e v e s b e t t e r   m a r k s   t h a n   T S i   a n d     T S j
where TSi and TSj are the first and second students, hj and hi are two different chaotic vectors between 1 and 2 that are generated by the chaotic maps (listed in Table 1). Individual competencies (ICi and ICj) are randomly set to 1 or 2, and there is no need to use chaotic maps.

4.4. Chaotic-Based Group Training Session

In the third step, the IAS uses chaotic sequences instead of random numbers to improve the convergence speed of the algorithm in different iterations. Therefore, according to the chaotic maps, the group training session can be described as Algorithm 6.
Algorithm 6 Chaotic-Based Group Training
1: F o r   i = 1 : N s t u d e n t
2 : h j = 1 + c h o m a p j 0.1
3 : h i = 1 + c h o m a p i 0.1
4: C C i j = C C i T S i + C C j T S j / C C i + C C j ;
5: T S i * = T S i + h i L S C C i C C i j ;
6: T S j * = T S j + h j L S C C j C C i j ;
7 : E n d   f o r
8 : A c c e p t   T S i *   a n d   T S j *   i f   t h e y i t   a c h i e v e s b e t t e r   m a r k s   t h a n   T S i   a n d   T S j
where CCij is defined as the collective ability of the group as a team based on the weighted average of students’ competencies, hj and hi are two different chaotic vectors between 1 and 2 generated by the chaotic maps (listed in Table 1). Students’ collective competencies (CCi and CCj) are randomly set to 1 or 2, and there is no need to use chaotic maps.

4.5. Chaotic-Based New Student Challenge

In the fourth step of the IAS, chaotic sequences are used instead of random numbers to improve the convergence speed of the algorithm in different iterations. Therefore, according to the chaotic maps, the group training session can be described as Algorithm 7.
Algorithm 7 Chaotic-Based New Student
1 : N S = l b i + c h o m a p i 0.1 u b l b i ;
2: m= c h o m a p i 0.1
3: M F 1 = r o u n d m ;
4 : M F 2 = 1 M F 1
5: L S * = M F 1 L S + M F 2 N S ;
6 : A r c h i e v e s   a   b e t t e r   m a r k   t h a n   L S
In Algorithm 7, a new solution (i.e., NS, is generated entirely by chaotic maps), and MF1 and MF2 are the first and second corrective factors generated based on the chaotic variable m. The worst side of this step is that, instead of random numbers, the chaotic sequence generated by the chaotic maps is applied to increase the exploitation of the proposed model.

4.6. Formation of Clusters

For the proposed model, each student vector expresses a solution with a certain number of cluster centers, ranging from C m i n to C m a x . The decision variables are inscribed as real-valued strings and regarded as cluster centers. Assuming that the dimension of the dataset is d , the maximum length of the student vector is Kmax × d. For each student vector whose cluster number is c, the c × d entries are evaluated as effective cluster center solutions, and the remaining variables are invalid. Figure 2 shows the format of students’ initial population for clustering. In IAS, the candidate solution is determined as { X j 1 k , X j 2 k ,…, X j d k }, where k = 1.2 . . P . Here, P describes the number of iterations.
Figure 2 shows that if a dataset has two clusters, different solutions are generated to find the two clusters. In each solution, other features are formed for the centrality of a cluster. Each solution is evaluated, and at the end, the solution with the best fitness (closest distance) is selected as the optimal solution.

4.7. Fitness Function of Clustering

In our proposed model, the fitness function of clustering called intra-cluster summation using the Euclidean distance function is employed, the most popular and valid distance criterion in clustering. It can be calculated as Equation (4).
d i s t a n c e O i   .   O j = p = 1 m O i p O j p 1 2 2
In Equation (4), the variable m indicates the number of features, Oip represents the value of the feature p of the object Oi, and O j p represents the value of the feature p of the object Oj. This function minimizes the distance between each object and the cluster’s center, which is allocated to generate compact groups. Intra-cluster is defined by Equation (5).
S S E = i = 1 k j = 1 n W i j × p = 1 m O j p O i p 2
Here, if W i j is 1, the object O j is in cluster i; otherwise, O j is not in cluster i . The variable k shows the number of clusters; the variable n indicates the number of objects. The variable m shows the number of features. Note that O i p shows the value of feature p of the center of cluster ith.

5. Results and Discussion

An IAS based on ten chaotic maps (i.e., CIAS, was presented in the previous section). In this section, statistical criteria such as the fitness function’s minimum value and the fitness function’s convergence rate are considered to compare the proposed model and other algorithms. Here, ten versions of the proposed CIAS algorithm are first compared with each other in terms of statistical criteria. The best version is considered an improved or superior version. Then it is compared with other metaheuristic methods such as the BA, CSA, ABC, and AEFA. Therefore, more details about the implementation, parameters, criteria, comparison, and evaluation of the proposed CIAS algorithm for the clustering problems are given here.

5.1. Dataset

All clustering datasets used here to evaluate the improved version of the IAS based on chaotic maps are listed in Table 2; the number of features and samples taken from these 20 valid UCI clustering datasets is mentioned.

5.2. Simulation Environment and Parameters Determination

The proposed CIAS approaches and comparative algorithms are implemented using MATLAB 2019 on a system with 8 GB of RAM, a Cori5 CPU (2.4 GHz), and a 64-bit operating system. For a better comparison, the quantitative parameters of the proposed CIAS approaches, the BS, CSA, ABC, and AEFA, are set the same (see Table 3). In addition, the qualitative parameters of each algorithm are set as standard.
Table 3 shows that the initial values of the population and the number of iterations are considered the same for all algorithms, and the importance of the other parameters is set as standard. Different versions of the IAS based on chaotic maps (i.e., CIAS-1, CIAS-2, …, CIAS-10) are compared with other statistical criteria. The evaluations and comparisons of different versions of the IAS based on chaotic maps are provided below. The convergence rate for implementing various versions of the IAS based on chaotic maps on 10 clustering datasets is presented in Figure 3.
The results related to the convergence rate of different versions of the proposed model on 20 datasets indicate that: (1) IAS-2 had a better performance on BLOOD and DERMATOLOGY datasets, and IAS-1 had a better performance on BLOOD and CANCER datasets; (2) IAS-1 had a better performance on IRIS and WINE datasets and IAS-4 had a better performance on STEEL and IRIS datasets; (3) IAS-4 had a better performance on GLASS, HABERMAN, BREASTEW datasets, and IAS-1 had a better performance on BREASTEW AND HABERMAN datasets; (4) IAS-1 had a better performance on HEART and LUNG CANCER datasets, and IAS-2 had a better performance on HABERMAN dataset; (5) IAS-1 had a better performance on VOWEL datasets, and IAS-2 had a better performance on SEEDS dataset. The results related to the convergence rate of different versions of the IAS based on chaotic maps implemented on the whole dataset show that IAS-1, IAS-2, IAS-4, and IAS-6 have improved results compared to other versions. To further evaluate the different versions of the IAS based on chaotic maps, the developments related to the worst solution for the population of the algorithms are compared, as shown in Table 4.
IAS-1, in 60% of clustering datasets, achieved better results than other versions; IAS-2, IAS-4, and IAS-6 succeeded in 10% of clustering datasets. Regarding the average fitness function results for the different versions of the IAS, IAS-1 exceeded other versions in 67% of the clustering datasets. Chaotic functions tend to reach the closest point to the objective function by finding optimal solutions.

5.3. Comparison of the Proposed Model with Other Metaheuristics

In this section, the first chaotic map-based IAS (IAS-1), called CIAS, is compared with other basic metaheuristic algorithms in terms of different statistical criteria. The results of other evaluations and comparisons are given below. The results related to the convergence rate of the proposed model and comparative metaheuristic algorithms implemented on 10 datasets are shown in Figure 4.
The results related to the convergence rate of the proposed model and the fifth group of comparative algorithms show that the proposed CIAS algorithm performed better than the other metaheuristic algorithms in two of the fifth group of datasets. The results related to the convergence rate of the proposed model and comparative algorithms implemented on the whole dataset indicate that the proposed CIAS approach has achieved better results. The proposed CIAS model performed better in 75% of clustering datasets. The results related to the worst, best, and average solutions for the population of the proposed model and other comparative algorithms are presented in Table 5.
The outcomes of the worst, best, and average population solutions for the proposed model, as well as other comparative algorithms, demonstrate that the worst population solution outperformed other algorithms in clustering datasets, while the best population solution outperformed other algorithms in clustering datasets, and the average population solution outperformed other comparative algorithms. In this section, the simulation and parameter determination was first carried out. Then the different versions of the IAS based on the chaotic map (CIAS-1, CIAS-2, …, CIAS-10) were compared in various statistical criteria. Further evaluations and comparisons showed that the Chebyshev chaotic map achieved better results than other chaotic maps. The Chebyshev-based IAS was compared with the basic metaheuristic algorithms such as BA, CSA, ABC, and AEFA. The results of various experiments indicate that the Chebyshev-based IAS has better convergence and performance than other basic metaheuristic algorithms.

6. Real Application: Binary CIAS on COVID-19 Dataset

The severe acute respiratory syndrome coronavirus2 is the causative agent of the sickness known as coronavirus disease 2019 (COVID-19) (https://github.com/Atharva-Peshkar/Covid-19-Patient-Health-Analytics, accessed on 22 January 2023), which is an exceptionally infectious and dangerous illness (SARS-CoV-2). In December 2019, Wuhan, China, was the location of the first confirmed case, which was quickly followed by a rapid global spread. Due to the escalating number of likely COVID-19 acute respiratory issues and the disease’s high fatality rates, the World Health Organization (WHO) has proclaimed the COVID-19 illness a worldwide catastrophe. It is vital to develop effective processes that consistently identify potential cases of COVID-19 to halt its spread and partially alleviate the global crisis. This will enable likely patients to be isolated from the general population. Several alternative optimization approaches are being developed as part of the response to the COVID-19 pandemic. These approaches may be separated into distinct categories: screening, monitoring, prediction, and diagnosis. In recent times, a significant number of diagnostic procedures that detect the COVID-19 disease by exploiting efficient features taken from the clinical dataset have been developed. So far, various models such as BE-WOA [28], Binary Simulated Normal Distribution Optimizer (BSNDO) [29], and Artificial Gorilla Troop Optimization (AGTO) [30] have been proposed for the diagnosis of COVID-19 disease.
The applicability and performance of the BIAS are tested on the novel coronavirus 2019 dataset, which is a pre-processed and released version of the original COVID-19 dataset. The results of these evaluations are discussed in the section that follows. Table 6 describes the dataset after it has been pre-processed. In that Table, the column labeled “diff_sym_hos” contains the number of days that have elapsed since the date on which symptoms were first observed (which corresponds to the column “sym_on” in the raw dataset) and the date on which the patient checked into the hospital (column hosp-vis in the original dataset). All of the categorical columns in the pre-processed dataset were label-encoded by assigning a number to each distinct, unconditional value included inside the column. There are 864 cases and 14 attributes included in this dataset.
The experiment’s results were repeated 20 times to evaluate the BIAS’s performance compared to other algorithms. The K Nearest Neighbor (KNN) classifier was used with k equal to 3 and the 10-fold cross-validation approach to construct the classification model for every algorithm.

6.1. Fitness Function

The main challenge is determining which features from a dataset will help a classifier correctly identify the category to which a sample belongs [31,32]. While we are selecting essential features, we must automatically rule out those that are redundant. When the selected feature subset is used for classification, we will be able to maximize the classification accuracy of a classification problem [33]. In this paper, BIAS is used to identify the most helpful feature subset, and then a classifier is used to determine how accurately this feature subset can be classified. Let A C C stand for the classification accuracy of the model that was determined with the help of a classifier; D a for the dimension of the feature subset; and N t for the total number of attributes that were included in the initial dataset. Therefore, the error in classification is denoted by the notation 1 A C C , and the proportion of features chosen from the complete dataset is represented by D a N a . The fitness function is defined according to Equation (6).
F F = α × 1 A C C + 1 α × D a N a
In Equation (6), α 0 , 1 denotes the weightage given to the classification error.

6.2. Transfer Function

Since FS is a binary optimization problem, its result is the numbers 0 and 1, where 0 indicates that the feature is not selected since it is unnecessary. One suggests that it is chosen since it is beneficial. However, we cannot determine the potential that the result produced will fall outside the desired range. A binarization function needs to be applied to each agent for us to guarantee that the output will always fall within the parameters of the selected range. The sigmoid (S-shaped) transfer function is responsible for carrying out this activity in BIAS. The S-shaped transfer function is defined according to Equation (7).
T x = 1 1 + e x
X d t = 1 i f   r a n d < T X d t 0 i f   r a n d T X d t
This function has a range of [0,1] in its domain. If the output of the transfer function is more significant than rand, where it is a random number chosen from a uniform distribution between 0 and 1, the value is equal to 1. This property is helpful, its value will always be equal to 0 if it is equal to or lower than the rand. Since the attribute is unnecessary, we will not consider it.

6.3. Evaluation Criteria

The BIAS’s effectiveness was evaluated based on accuracy, Recall, precision, F-measure, and the total amount of features (selection size).
Precision: The significance of the results is defined by the accuracy of the results, which is represented as the ratio of successfully predicted positive observations to the total number of positive observations.
Precision = TP TP + FP
Recall: The term “recall” refers to the proportion of accurately predicted affirmative observations relative to the total number of observations in an actual class that answer “yes”.
Recall = TP TP + FN
F-measure: The F1 Score is another method for determining the correctness of an experiment. It is calculated using the weighted mean of the Precision and Recall scores. As a result, this score considers the possibility of both false positives and negatives.
F Measure = 2 × Precision × Recall Precision + Recall
Accuracy: Accuracy is the measurable statistic that correctly classifies the occurrence instance, and it is simply a ratio of predicted correct observations to the total sample size. It is the performance measure that is the most intuitive to assess.
Accuracy = TP + TN TP + TN + FP + FN
According to accuracy values in Table 7, BIAS could take the highest average accuracy value of 96.25%, while BABC, BBA, BCSA, and BAEFA could take 91.95%, 92.48%, 95.32%, and 94.79%, respectively.
Compared to previous algorithms, the proposed model performed significantly better in Recall, precision, accuracy, and f-measure, as evidenced by the experiment’s findings. Within the search space, the implementation of BIAS investigates regions that are relatively close to optimum global values. During the exploration and exploitation phases, the BIAS versions search the most promising area of the search space. According to the results of BIAS’s search history analyses, the distribution of candidate solution points around the global optimal is greater than that of BCSA. Figure 5 compares the performance of BIAS with other algorithms based on recall, precision, f-measure, and accuracy.
The BIAS optimization algorithm was used to produce the best possible feature set, displayed in Table 8). BIAS identifies the best potential value for six of the thirteen provided features. It is utilized in these characteristics to predict the presence of COVID-19 positive in individuals exhibiting various symptoms. When compared to the input feature set, which consisted of 14 features (Table 6), 8 of those features were removed. The vast majority of the elements that have been removed are those that pertain to personal information such as age, sex, etc. The results of the suggested model, based on the selection of features, are presented in Table 8. The accuracy percentage ranges from 98.41 to 98.68 for five features. The accuracy percentage that is the highest for six different features is 98.23, while the accuracy percentage that is the lowest is 98.06. In addition, the recall and precision scores are both 98.35 and 98.30, correspondingly. The accuracy ranges from 97.76 to 98.31 percent over seven distinct features. The least accurate accuracy range is 97.76 percent. The accuracy ranges from 97.52 to 97.65 when eight features are chosen. Accuracy is 97.29 percent for precision, 97.48 percent for Recall, 97.38 percent for F-Measure, and 97.52 percent overall when ten features from the feature space are selected. The highest possible accuracy is 96.92%, and the lowest possible accuracy percentage is 96.84 when 11 features are selected. According to the findings, the proposed model with fewer features has a better accuracy percentage than its competitors.

7. Conclusions and Future Works

The IAS is a population-based metaheuristic optimization algorithm with three robust operators: individual training sessions, group training sessions, and new student challenges. This paper presented an improved version of the chaotic IAS to solve data clustering problems. First, ten different chaotic maps were used to generate different versions of the IAS (i.e., CIAS-1, CIAS-2, …, and CIAS-10). Next, 20 valid UCI clustering datasets were used to evaluate the proposed approaches. In addition, the intra-cluster summation fitness function was previously defined as the fitness function for the proposed model and other comparative algorithms. An improved version of the chaotic IAS was implemented on MATLAB 2019; the initial population was 20, and the number of repetitions was 100. At first, CIAS-1, CIAS-2, …, and CIAS-10 were compared with different criteria. The various evaluations and comparisons showed that the Chebyshev-Chaotic-Map-based version performed better. Finally, the Chebyshev-Chaotic-Map-based performance of the IAS was compared with other basic metaheuristic methods such as the BA, the CSA, the ABC, and the AEFA. The various experiments showed that this version of the IAS has better convergence and performance than other basic metaheuristic algorithms. Furthermore, BIAS is experimented on a COVID-19 dataset for detecting the coronavirus disease. Future research will consider a multi-objective IAS with chaotic maps for solving high-dimensional data clustering.

Author Contributions

Conceptualization, F.S.G. and A.A.K.; methodology, F.S.G.; software, A.A.K.; validation, F.S.G. and A.A.K.; formal analysis, A.A.K.; investigation, A.A.K.; resources, F.S.G.; data curation, F.S.G.; writing—original draft preparation, A.A.K.; writing—review and editing, F.S.G.; visualization, A.A.K.; supervision, F.S.G.; project administration, F.S.G.; funding acquisition, F.S.G. All authors have read and agreed to the published version of the manuscript.

Funding

This paper received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data used in this manuscript was downloaded from the UCI depository.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Sorkhabi, L.B.; Gharehchopogh, F.S.; Shahamfar, J. A systematic approach for pre-processing electronic health records for mining: Case study of heart disease. Int. J. Data Min. Bioinform. 2020, 24, 97–120. [Google Scholar] [CrossRef]
  2. Arasteh, B.; Abdi, M.; Bouyer, A. Program source code comprehension by module clustering using combination of discretized gray wolf and genetic algorithms. Adv. Eng. Softw. 2022, 173, 103252. [Google Scholar] [CrossRef]
  3. Nadimi-Shahraki, M.H.; Taghian, S.; Mirjalili, S.; Ewees, A.A.; Abualigah, L.; Elaziz, M.A. MTV-MFO: Multi-Trial Vector-Based Moth-Flame Optimization Algorithm. Symmetry 2021, 13, 2388. [Google Scholar] [CrossRef]
  4. Izci, D. A novel improved atom search optimization algorithm for designing power system stabilizer. Evol. Intell. 2022, 15, 2089–2103. [Google Scholar] [CrossRef]
  5. Ekinci, S.; Izci, D.; Al Nasar, M.R.; Abu Zitar, R.; Abualigah, L. Logarithmic spiral search based arithmetic optimization algorithm with selective mechanism and its application to functional electrical stimulation system control. Soft Comput. 2022, 26, 12257–12269. [Google Scholar] [CrossRef]
  6. Arasteh, B.; Sadegi, R.; Arasteh, K. Bölen: Software module clustering method using the combination of shuffled frog leaping and genetic algorithm. Data Technol. Appl. 2021, 55, 251–279. [Google Scholar] [CrossRef]
  7. Arasteh, B.; Sadegi, R.; Arasteh, K. ARAZ: A software modules clustering method using the combination of particle swarm optimization and genetic algorithms. Intell. Decis. Technol. 2020, 14, 449–462. [Google Scholar] [CrossRef]
  8. Gharehchopogh, F.S.; Gholizadeh, H. A comprehensive survey: Whale Optimization Algorithm and its applications. Swarm Evol. Comput. 2019, 48, 1–24. [Google Scholar] [CrossRef]
  9. Jahangiri, M.; Hadianfard, M.A.; Najafgholipour, M.A.; Jahangiri, M.; Gerami, M.R. Interactive autodidactic school: A new metaheuristic optimization algorithm for solving mathematical and structural design optimization problems. Comput. Struct. 2020, 235, 106268. [Google Scholar] [CrossRef]
  10. Karaboga, D. An Idea Based on Honey Bee Swarm for Numerical Optimization; Technical Report-tr06; Erciyes University: Ercis, Turkey, 2005. [Google Scholar]
  11. Yang, X.-S. A new metaheuristic bat-inspired algorithm. In Nature Inspired Cooperative Strategies for Optimization (NICSO 2010); Springer: Berlin/Heidelberg, Germany, 2010; pp. 65–74. [Google Scholar]
  12. Askarzadeh, A. A novel metaheuristic method for solving constrained engineering optimization problems: Crow search algorithm. Comput. Struct. 2016, 169, 1–12. [Google Scholar] [CrossRef]
  13. Yadav, A.; Kumar, N. Artificial electric field algorithm for engineering optimization problems. Expert Syst. Appl. 2020, 149, 113308. [Google Scholar] [CrossRef]
  14. Ahmadi, R.; Ekbatanifard, G.; Bayat, P. A Modified Grey Wolf Optimizer Based Data Clustering Algorithm. Appl. Artif. Intell. 2021, 35, 63–79. [Google Scholar] [CrossRef]
  15. Ashish, T.; Kapil, S.; Manju, B. Parallel bat algorithm-based clustering using mapreduce. In Networking Communication and Data Knowledge Engineering; Springer: Berlin/Heidelberg, Germany, 2018; pp. 73–82. [Google Scholar]
  16. Eesa, A.S.; Orman, Z. A new clustering method based on the bio-inspired cuttlefish optimization algorithm. Expert Syst. 2020, 37, e12478. [Google Scholar] [CrossRef]
  17. Olszewski, D. Asymmetric k-means algorithm. In Adaptive and Natural Computing Algorithms; Lecture Notes in Computer Science; Dobnikar, A., Lotric, U., Ster, B., Eds.; Springer: Berlin/Heidelberg, Germany, 2011; Volume 6594, pp. 1–10. [Google Scholar]
  18. Aggarwal, S.; Singh, P. Cuckoo and krill herd-based k-means++ hybrid algorithms for clustering. Expert Syst. 2019, 36, e12353. [Google Scholar] [CrossRef]
  19. Zhang, G.; Zhang, C.; Zhang, H. Improved K-means algorithm based on density Canopy. Knowl. Based Syst. 2018, 145, 289–297. [Google Scholar] [CrossRef]
  20. Kumar, A.; Kumar, D.; Jarial, S. A novel hybrid K-means and artificial bee colony algorithm approach for data clustering. Decis. Sci. Lett. 2018, 7, 65–76. [Google Scholar] [CrossRef]
  21. Nasiri, J.; Khiyabani, F.M. A whale optimization algorithm (WOA) approach for clustering. Cogent Math. Stat. 2018, 5, 1483565. [Google Scholar] [CrossRef]
  22. Qaddoura, R.; Faris, H.; Aljarah, I. An efficient evolutionary algorithm with a nearest neighbor search technique for clustering analysis. J. Ambient. Intell. Humaniz. Comput. 2021, 12, 8387–8412. [Google Scholar] [CrossRef]
  23. Zhou, Y.; Wu, H.; Luo, Q.; Abdel-Baset, M. Automatic data clustering using nature-inspired symbiotic organism search algorithm. Knowl. Based Syst. 2019, 163, 546–557. [Google Scholar] [CrossRef]
  24. Ewees, A.A.; Elaziz, M.A. Performance analysis of Chaotic Multi-Verse Harris Hawks Optimization: A case study on solving engineering problems. Eng. Appl. Artif. Intell. 2020, 88, 103370. [Google Scholar] [CrossRef]
  25. Chen, K.; Zhou, F.; Liu, A. Chaotic dynamic weight particle swarm optimization for numerical function optimization. Knowl. Based Syst. 2018, 139, 23–40. [Google Scholar] [CrossRef]
  26. Zhang, X.; Xu, Y.; Yu, C.; Heidari, A.A.; Li, S.; Chen, H.; Li, C. Gaussian mutational chaotic fruit fly-built optimization and feature selection. Expert Syst. Appl. 2019, 141, 112976. [Google Scholar] [CrossRef]
  27. Gharehchopogh, F.S.; Nadimi-Shahraki, M.H.; Barshandeh, S.; Abdollahzadeh, B.; Zamani, H. CQFFA: A Chaotic Quasi-oppositional Farmland Fertility Algorithm for Solving Engineering Optimization Problems. J. Bionic Eng. 2022, 20, 158–183. [Google Scholar] [CrossRef]
  28. Nadimi-Shahraki, M.H.; Zamani, H.; Mirjalili, S. Enhanced whale optimization algorithm for medical feature selection: A COVID-19 case study. Comput. Biol. Med. 2022, 148, 105858. [Google Scholar] [CrossRef]
  29. Ahmed, S.; Sheikh, K.H.; Mirjalili, S.; Sarkar, R. Binary Simulated Normal Distribution Optimizer for feature selection: Theory and application in COVID-19 datasets. Expert Syst. Appl. 2022, 200, 116834. [Google Scholar] [CrossRef]
  30. Piri, J.; Mohapatra, P.; Acharya, B.; Gharehchopogh, F.S.; Gerogiannis, V.C.; Kanavos, A.; Manika, S. Feature Selection Using Artificial Gorilla Troop Optimization for Biomedical Data: A Case Analysis with COVID-19 Data. Mathematics 2022, 10, 2742. [Google Scholar] [CrossRef]
  31. Nadimi-Shahraki, M.H.; Fatahi, A.; Zamani, H.; Mirjalili, S. Binary Approaches of Quantum-Based Avian Navigation Optimizer to Select Effective Features from High-Dimensional Medical Data. Mathematics 2022, 10, 2770. [Google Scholar] [CrossRef]
  32. Nadimi-Shahraki, M.H.; Fatahi, A.; Zamani, H.; Mirjalili, S.; Oliva, D. Hybridizing of Whale and Moth-Flame Optimization Algorithms to Solve Diverse Scales of Optimal Power Flow Problem. Electronics 2022, 11, 831. [Google Scholar] [CrossRef]
  33. Nadimi-Shahraki, M.H.; Moeini, E.; Taghian, S.; Mirjalili, S. DMFO-CD: A Discrete Moth-Flame Optimization Algorithm for Community Detection. Algorithms 2021, 14, 314. [Google Scholar] [CrossRef]
Figure 1. Flowchart of the proposed model.
Figure 1. Flowchart of the proposed model.
Symmetry 15 00894 g001
Figure 2. Format of the initial population of students for clustering.
Figure 2. Format of the initial population of students for clustering.
Symmetry 15 00894 g002
Figure 3. Convergence rate of different versions of IAS based on chaotic maps.
Figure 3. Convergence rate of different versions of IAS based on chaotic maps.
Symmetry 15 00894 g003aSymmetry 15 00894 g003b
Figure 4. Convergence rate of the proposed model and other comparative algorithms.
Figure 4. Convergence rate of the proposed model and other comparative algorithms.
Symmetry 15 00894 g004aSymmetry 15 00894 g004b
Figure 5. Performance comparison of BIAS with other algorithms.
Figure 5. Performance comparison of BIAS with other algorithms.
Symmetry 15 00894 g005
Table 1. Functions of Chaotic Maps.
Table 1. Functions of Chaotic Maps.
MethodsChaoticsMathematical ModelRange
CIAS-1Chebyshevpq + 1 = cos(q cos − 1(pq))(−1, 1)
CIAS-2Circle p q + 1 = m o d p q + d c 2 π s i n s i n   2 π p q   .1 . c = 0.5   a n d   d = 0.2 (0, 1)
CIAS-3Guess/mouse p q + 1 = { 1 p q = 0 1 m o d   p q .1 . o t h e r w i s e (0, 1)
CIAS-4Iterative p q + 1 = s i n s i n   c π p q   .   c = 0.7 (−1, 1)
CIAS-5Logistic p q + 1 = c p q 1 p q , c = 4(0, 1)
CIAS-6Piecewise p q + 1 = { p q l . 0 p q l   p q 1 0.5 l . l p q 0.5   1 l p q 0.5 l . 0.5 p q 1 l   1 p q l . 1 l p q 1   (0, 1)
CIAS-7Sine p q + 1 = c 4 s i n s i n   2 π p q   .   c = 4 (0, 1)
CIAS-8Singer p q + 1 = μ 7.86 p q 23.31 p q 2 + 28.75 p q 3 13.302875 p q 4 .   μ = 1.07 (0, 1)
CIAS-9Sinusoidal p q + 1 = c p q 2 s i n s i n   π p q , c = 2.3(0, 1)
CIAS-10Tent p q + 1 = { p q 0.7 p q 0.7 10 3 1 p q .   o t h e r w i s e (0, 1)
Table 2. Clustering Dataset.
Table 2. Clustering Dataset.
No.DatasetsNumber of FeaturesNumber of Samples
1Balance Scale4625
2Blood4748
3breast30569
4CMC91473
5Dermatology34366
6Glass9214
7Haberman’s Survival3306
8hepatitis19155
9Iris4150
10Libras90360
11lung cancer3256
12Madelon5002600
13ORL1024400
14seeds7210
15speech310125
16Starlog (Heart)13270
17Steel331941
18Vowel3871
19wine13178
20Wisconsin9699
Table 3. Values of initial parameters.
Table 3. Values of initial parameters.
ValuesParametersAlgorithms
ABC [10]Limit 5D5D
Population Size20
Number of lookers20
Iterations100
BA [11]R0.5
A0.8
population size20
Iterations100
CSA [12]Ap0.8
population size is20
Iterations100
AEFA [13]FCheck1
population size20
Iterations100
Proposed Modelpopulation size20
Iterations100
Table 4. Results related to the worst, best, and average solutions for the population of different versions of the ISA.
Table 4. Results related to the worst, best, and average solutions for the population of different versions of the ISA.
DatasetResultsIAS-1IAS-2IAS-3IAS-4IAS-5IAS-6IAS-7IAS-8IAS-9IAS-10
BloodWorst4.21E+058.46E+058.46E+054.20E+058.47E+058.46E+058.47E+058.46E+058.46E+058.46E+05
Best4.10E+054.10E+054.12E+054.10E+054.18E+054.13E+054.12E+054.15E+054.20E+054.13E+05
Avg4.14E+054.93E+054.93E+054.15E+055.46E+055.20E+055.22E+056.30E+056.55E+055.20E+05
CancerWorst4.25E+034.45E+033.59E+033.63E+034.36E+033.94E+034.71E+034.24E+034.25E+035.36E+03
Best3.28E+033.93E+033.30E+033.26E+033.68E+033.42E+033.94E+033.77E+033.50E+033.72E+03
Avg3.82E+034.20E+033.44E+033.48E+034.08E+033.62E+034.40E+034.03E+033.83E+035.05E+03
CMCWorst9.70E+039.91E+031.38E+041.01E+041.32E+041.26E+041.31E+041.38E+041.38E+041.31E+04
Best8.08E+037.79E+037.40E+037.33E+037.80E+037.60E+037.31E+036.93E+037.66E+037.15E+03
Avg9.11E+038.67E+031.16E+048.85E+031.10E+041.09E+041.01E+041.18E+041.21E+041.05E+04
DermatologyWorst4.95E+033.54E+031.21E+044.76E+034.65E+034.00E+037.71E+034.94E+035.05E+034.37E+03
Best3.05E+032.75E+032.83E+032.87E+033.31E+033.03E+032.96E+033.08E+032.93E+032.90E+03
Avg3.75E+033.27E+031.15E+043.69E+033.80E+033.41E+033.99E+033.79E+033.71E+033.51E+03
IrisWorst2.29E+023.03E+022.97E+022.38E+022.75E+022.88E+022.86E+023.03E+023.03E+022.84E+02
Best1.67E+021.71E+021.75E+021.55E+022.00E+021.47E+022.05E+021.85E+021.97E+021.71E+02
Avg1.97E+022.55E+022.51E+021.96E+022.43E+022.39E+022.57E+022.66E+022.57E+022.29E+02
OrlWorst9.55E+057.65E+055.28E+055.59E+055.69E+057.77E+055.72E+057.32E+056.38E+057.67E+05
Best8.44E+057.35E+055.23E+055.36E+055.55E+057.44E+055.54E+056.70E+055.73E+057.51E+05
Avg9.37E+057.62E+055.27E+055.45E+055.64E+057.71E+055.65E+057.01E+056.28E+057.60E+05
SteelWorst3.8E+094.64E+094.18E+093.8E+094.64E+09 4.26E+094.64E+094.64E+094.64E+094.64E+09
Best2.40E+062.42E+062.48E+062.36E+062.55E+062.48E+062.51E+062.55E+062.37E+062.44E+06
Avg3.29E+093.97E+093.85E+093.05E+093.92E+093.72E+093.85E+093.96E+093.98E+093.71E+09
WineWorst2.26E+042.90E+043.51E+042.27E+048.38E+043.83E+044.28E+048.38E+048.38E+048.38E+04
Best1.74E+041.82E+041.77E+041.75E+041.75E+041.78E+041.79E+041.79E+041.91E+041.82E+04
Avg2.01E+042.18E+043.40E+042.00E+043.31E+042.58E+042.52E+043.64E+043.83E+043.35E+04
Balance ScaleWorst1.49E+031.51E+032.33E+031.63E+031.54E+031.52E+031.51E+031.53E+031.79E+031.51E+03
Best1.45E+031.45E+031.52E+031.47E+031.46E+031.44E+031.45E+031.45E+031.48E+031.45E+03
Avg1.47E+031.49E+032.15E+031.55E+031.49E+031.48E+031.48E+031.50E+031.61E+031.48E+03
Worst2.37E+032.99E+032.69E+032.44E+033.22E+032.79E+033.13E+032.55E+032.50E+032.79E+03
BreastsBest2.22E+032.69E+032.58E+032.29E+032.84E+032.53E+032.68E+032.43E+032.43E+032.47E+03
Avg2.30E+032.89E+032.63E+032.37E+033.10E+032.73E+032.98E+032.51E+032.47E+032.63E+03
GlassWorst8.46E+028.86E+029.65E+028.76E+021.19E+031.15E+031.19E+031.19E+031.20E+031.20E+03
Best5.52E+025.87E+025.89E+025.15E+025.31E+025.73E+026.19E+025.65E+025.93E+026.10E+02
Avg8.13E+027.72E+029.35E+027.44E+029.74E+028.87E+029.86E+021.02E+031.09E+038.79E+02
HabermanWorst3.61E+034.47E+034.46E+033.64E+034.14E+034.46E+035.64E+034.16E+034.28E+034.52E+03
Best2.70E+032.84E+033.07E+032.73E+033.00E+033.14E+033.21E+032.78E+033.01E+032.79E+03
Avg3.14E+033.85E+033.43E+033.29E+033.65E+033.78E+033.77E+033.70E+033.78E+033.91E+03
HeartWorst1.97E+042.72E+043.33E+041.97E+044.22E+043.46E+044.17E+044.22E+044.15E+043.62E+04
Best2.40E+062.42E+062.48E+062.36E+061.46E+041.43E+041.42E+041.37E+041.42E+041.31E+04
Avg3.29E+093.97E+093.85E+093.05E+092.43E+042.18E+042.75E+042.71E+043.03E+041.99E+04
HepatitisWorst2.26E+042.90E+043.51E+042.27E+042.22E+042.25E+041.96E+042.25E+042.24E+042.27E+04
Best1.74E+041.82E+041.77E+041.75E+041.35E+041.31E+041.34E+041.36E+041.32E+041.34E+04
Avg2.01E+042.18E+043.40E+042.00E+041.83E+041.78E+041.72E+041.90E+041.79E+041.75E+04
LibrasWorst1.49E+031.51E+032.33E+031.63E+031.05E+039.16E+029.21E+026.11E+027.15E+027.38E+02
Best1.45E+031.45E+031.52E+031.47E+036.89E+028.72E+026.69E+025.78E+026.88E+025.97E+02
Avg1.47E+031.49E+032.15E+031.55E+038.82E+028.94E+028.62E+025.87E+027.07E+026.60E+02
Lung CancerWorst2.37E+032.99E+032.69E+032.44E+031.98E+021.87E+022.03E+022.19E+021.97E+022.07E+02
Best2.22E+032.69E+032.58E+032.29E+031.69E+021.70E+021.80E+021.79E+021.66E+021.76E+02
Avg2.30E+032.89E+032.63E+032.37E+031.88E+021.80E+021.95E+021.97E+021.83E+021.93E+02
MadelonWorst8.46E+028.86E+029.65E+028.76E+021.95E+061.83E+061.86E+061.82E+061.84E+061.82E+06
Best5.52E+025.87E+025.89E+025.15E+021.94E+061.83E+061.84E+061.82E+061.82E+061.82E+06
Avg8.13E+027.72E+029.35E+027.44E+021.95E+061.83E+061.85E+061.82E+061.83E+061.82E+06
SeedsWorst3.61E+034.47E+034.46E+033.64E+037.75E+028.30E+028.41E+027.97E+028.58E+027.19E+02
Best2.70E+032.84E+033.07E+032.73E+035.29E+025.15E+024.93E+025.29E+025.28E+025.33E+02
Avg3.14E+033.85E+033.43E+033.29E+036.63E+026.42E+026.66E+026.96E+027.47E+026.29E+02
SpeechWorst1.97E+042.72E+043.33E+041.97E+046.58E+126.58E+126.52E+126.58E+126.58E+126.55E+12
Best2.40E+062.42E+062.48E+062.36E+063.68E+122.54E+123.02E+123.31E+123.45E+123.46E+12
Avg3.29E+093.97E+093.85E+093.05E+095.40E+125.06E+125.09E+125.03E+124.42E+125.07E+12
VowelWorst2.26E+042.90E+043.51E+042.27E+045.83E+054.49E+055.72E+055.76E+056.92E+054.43E+05
Best1.74E+041.82E+041.77E+041.75E+042.45E+052.50E+052.60E+052.32E+052.37E+052.23E+05
Avg2.01E+042.18E+043.40E+042.00E+044.04E+053.46E+053.87E+054.38E+054.46E+053.31E+05
Table 5. Results related to the worst, best, and average solutions for the population of the proposed model and other comparative algorithms.
Table 5. Results related to the worst, best, and average solutions for the population of the proposed model and other comparative algorithms.
DatasetCSAABCBAAEFACIAS
Bloodworst4.10E+053.90E+066.01E+054.88E+064.41E+05
best4.08E+054.11E+056.01E+054.85E+054.41E+05
avg4.09E+051.13E+066.01E+051.99E+064.41E+05
Cancerworst4.43E+039.45E+035.85E+033.57E+032.96E+03
best4.09E+033.57E+035.81E+033.56E+032.96E+03
avg4.30E+035.78E+035.82E+033.56E+032.96E+03
CMCworst6.47E+031.05E+047.72E+036.74E+035.53E+03
best6.30E+035.95E+037.69E+036.74E+035.53E+03
avg6.35E+037.75E+037.70E+036.74E+035.53E+03
Dermatologyworst2.97E+033.51E+033.08E+033.14E+032.25E+03
best2.97E+033.16E+033.07E+033.13E+032.24E+03
avg2.97E+033.35E+033.07E+033.14E+032.25E+03
Irisworst1.06E+023.60E+021.50E+021.07E+029.67E+01
best1.03E+021.22E+021.46E+021.05E+029.67E+01
avg1.04E+022.21E+021.48E+021.07E+029.67E+01
Orlworst5.01E+057.77E+056.36E+057.33E+055.03E+05
best5.00E+057.68E+056.36E+057.26E+055.03E+05
avg5.00E+057.74E+056.36E+057.30E+055.03E+05
Steelworst2.99E+099.93E+096.82E+092.98E+105.81E+09
best2.95E+092.15E+096.82E+096.30E+095.81E+09
avg2.97E+093.40E+096.82E+091.85E+105.81E+09
Wineworst1.72E+041.83E+041.71E+045.35E+041.63E+04
best1.71E+041.65E+041.71E+041.90E+041.63E+04
avg1.72E+041.72E+041.71E+045.02E+041.63E+04
balance scaleworst1.43E+031.72E+031.45E+031.43E+031.43E+03
best1.43E+031.44E+031.44E+031.43E+031.43E+03
avg1.43E+031.52E+031.44E+031.43E+031.43E+03
breastsworst3.43E+036.08E+033.05E+032.36E+032.02E+03
best3.39E+032.32E+033.03E+032.36E+032.02E+03
avg3.41E+034.34E+033.04E+032.36E+032.02E+03
glassworst4.37E+026.30E+023.69E+024.11E+022.53E+02
best3.91E+023.07E+023.65E+024.10E+022.53E+02
avg4.10E+025.03E+023.67E+024.11E+022.53E+02
Habermanworst2.62E+031.11E+042.94E+032.59E+032.57E+03
best2.59E+032.62E+032.93E+032.59E+032.57E+03
avg2.61E+033.90E+032.93E+032.59E+032.57E+03
heartworst1.10E+043.01E+041.45E+041.19E+041.06E+04
best1.08E+041.07E+041.45E+041.13E+041.06E+04
avg1.09E+041.37E+041.45E+041.18E+041.06E+04
Hepatitisworst1.24E+041.25E+041.48E+041.93E+041.18E+04
best1.20E+041.18E+041.48E+041.48E+041.18E+04
avg1.22E+041.21E+041.48E+041.93E+041.18E+04
Librasworst5.87E+029.16E+027.34E+027.78E+025.41E+02
best5.85E+028.71E+027.23E+027.78E+025.40E+02
avg5.86E+028.92E+027.26E+027.78E+025.41E+02
lung Cancerworst1.59E+021.71E+021.64E+021.65E+021.38E+02
best1.58E+021.60E+021.63E+021.65E+021.38E+02
avg1.59E+021.66E+021.63E+021.65E+021.38E+02
Madelonworst1.86E+063.91E+062.85E+062.67E+061.91E+06
best1.86E+063.64E+062.85E+062.52E+061.90E+06
avg1.86E+063.77E+062.85E+062.59E+061.90E+06
seedsworst3.77E+021.04E+033.69E+023.68E+023.12E+02
best3.67E+023.72E+023.63E+023.65E+023.12E+02
avg3.71E+025.29E+023.64E+023.66E+023.12E+02
speechworst4.65E+122.41E+126.92E+123.63E+133.00E+12
best3.71E+122.16E+126.92E+127.07E+123.00E+12
avg4.18E+122.26E+126.92E+121.68E+133.00E+12
vowelworst1.71E+053.73E+052.55E+054.16E+051.62E+05
best1.69E+051.92E+052.55E+052.09E+051.62E+05
avg1.70E+052.58E+052.55E+053.27E+051.62E+05
Table 6. Description of the novel coronavirus 2019 dataset.
Table 6. Description of the novel coronavirus 2019 dataset.
No.Features NameDescription
1LocationThe location where patients belong to
2CountryThe country where patients belong to
3GenderThe gender of patients
4AgeThe ages of the patients
5vis_wuhan (Yes: 1, 0: No)Whether the patients visited Wuhan, China
6from_wuhan (Yes: 1, 0: No)Whether the patients from Wuhan, China
7symptom 1Fever
8symptom 2Cough
9symptom 3Cold
10symptom 4Fatigue
11symptom 5Body pain
12symptom 6Malaise
13diff_sym_hosThe day’s difference between the symptoms being noticed and admission to the hospital
14ClassThe class of patient can be either death or recovery
Table 7. Comparison of the BIAS and other algorithms based on the accuracy.
Table 7. Comparison of the BIAS and other algorithms based on the accuracy.
ModelsIterationsPrecisionRecallF-MeasureAccuracy
BABC10091.1591.2491.1991.21
20091.4891.6291.5591.95
BBA10092.2992.3892.3392.15
20092.3792.5192.4492.48
BCSA10094.1494.2694.2094.71
20095.2595.3895.3195.32
BAEFA10094.0694.1994.0994.36
20094.5294.6394.5794.79
Proposed Model10095.5395.7695.6495.84
20096.0496.3596.1996.25
Table 8. Results of the BIAS Feature Selection.
Table 8. Results of the BIAS Feature Selection.
FeaturesPrecisionRecallF-MeasureAccuracy
598.3298.4398.3798.68
598.4198.4798.4498.41
698.2698.3598.3098.23
698.1498.4698.3098.06
798.4598.5598.5098.31
797.3597.4997.4297.76
897.5097.5897.5497.65
897.1497.3097.2297.52
997.3297.5897.4597.41
1097.2997.4897.3897.52
1097.0697.1997.1297.13
1196.5896.6796.6296.84
1196.6196.7596.6896.92
1296.3596.4296.3896.56
1296.4296.5696.4996.42
1396.1196.2096.1596.25
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Gharehchopogh, F.S.; Khargoush, A.A. A Chaotic-Based Interactive Autodidactic School Algorithm for Data Clustering Problems and Its Application on COVID-19 Disease Detection. Symmetry 2023, 15, 894. https://doi.org/10.3390/sym15040894

AMA Style

Gharehchopogh FS, Khargoush AA. A Chaotic-Based Interactive Autodidactic School Algorithm for Data Clustering Problems and Its Application on COVID-19 Disease Detection. Symmetry. 2023; 15(4):894. https://doi.org/10.3390/sym15040894

Chicago/Turabian Style

Gharehchopogh, Farhad Soleimanian, and Aysan Alavi Khargoush. 2023. "A Chaotic-Based Interactive Autodidactic School Algorithm for Data Clustering Problems and Its Application on COVID-19 Disease Detection" Symmetry 15, no. 4: 894. https://doi.org/10.3390/sym15040894

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop