2.4.2. Diversity Maintenance

One of the problems of evolutionary algorithms is maintaining the diversity of the population [46,56,57]. Eremeev in [58] described the mutation genetic operator as the essential procedure that guarantees population diversity. Along with the convergence to the Pareto optimal set to solve the problem of multi-criteria optimization, the evolutionary algorithm must also support a good distribution of solutions, preferably evenly covering as much of the optimal front as possible [59–62].

Early versions of the NSGA used the well-known fitness-sharing approach to prevent the concentration of solutions in specific areas of the search space and maintain stable population diversity. However, the proximity parameter ς*share* has a significant effect on the efficiency of maintaining a wide distribution of the population. This parameter determines the degree of redistribution of fitness between individuals [55] and is directly related to the distance metric chosen to calculate the measure of proximity between two members of the population. The parameter ς*share* denotes the largest distance within which any two solutions share each other's suitability. The user usually sets this parameter, which entails apparent difficulties in making a reasonable choice.

In the NSGA-II [55], a different approach is used based on the crowded comparison. Its indisputable advantage is the absence of parameters set by the user. The critical components of the approach are the density estimate and the crowded-comparison operator.

To estimate the density of solutions concerning the chosen solution, along each direction in the criteria space, two nearest solutions are found on both sides of the chosen solution. The distance between them is determined as the difference between the values of a different criterion. An estimate of the density of solutions near the selected point will be the average of the calculated distances, called the crowding distance. From the geometric point of view, it is possible to estimate the density by calculating the perimeter of the hypercube formed by the nearest neighboring solutions as vertices.

Figure 1 shows a graphical interpretation of the above approach for the case of our two objective functions (4). Solutions denoted as *pi*+<sup>1</sup> and *i* − 1, which are nearest to the *i*th solutions and belong to the first front (filled dots), represent the vertices of the outlined rectangle (dotted line). Crowding distance, in this case, can be defined as the average length of the edges of the rectangle.

*Cov<sup>+</sup>*

**Figure 1.** Density estimation of solutions belonging to the front for problem (4).

 *-*

(*Pi*)

Calculating the crowding distance requires sorting the individuals in the population according to each objective function in ascending order of the value of this objective function. For individuals with the boundary value of the objective function (maximum or minimum), the crowding distance is assigned equal to infinity. All other intermediate solutions are assigned a distance value equal to the absolute value of the difference between the values of the functions of two neighboring solutions. This calculation continues for all objective functions. The final value of the crowding distance is calculated as the average of the individual values of the distances corresponding to each objective function. Prenormalization of objective functions is recommended.

After all individuals in the population have been assigned to an estimate of the crowding distance, we can compare the solutions in terms of their degree of closeness to other solutions. A smaller value for the crowding distance indicates a higher density of solutions relative to the selected point. The density estimate is used in the crowdedcomparison operator described below.

Crowded-Comparison Operator ( <sup>≺</sup>*n*) directs the evolutionary process of population transformation towards a fairly uniform distribution along the Pareto front.

We assume that each individual in the population has two attributes:


Then, the partial ordering is defined as follows. Individual *i* is preferred over individual *j* if the following conditions are met:

$$i \prec\_{\rm tr} j \Longleftrightarrow (i\_{\rm rank} < j\_{\rm rank}) \bigvee \left( i\_{\rm rank} = j\_{\rm rank} \bigwedge i\_{\rm distance} > j\_{\rm distance} \right). \tag{5}$$

Such ordering means that we prefer a solution with a lower (close to the first, which means the optimal front) rank between two solutions with different ranks. Otherwise, if both solutions are located in the same front, we prefer a solution located in areas where solutions are less crowded.

#### 2.4.3. Basic Procedure of the NSGA-II

In this case, solving linear optimization problem is rather simple. Initially, the parent population *P*0 is randomly created and sorted based on the dominance principle. Thus, the greater the suitability of an individual, the lower the value of its rank. Then, traditional genetic operators are applied: binary tournament selection, crossover, mutation, creating a child population *Q*0 of a specified size *N*. Since elitism is introduced by comparing the current child population with the parent population, the procedure differs for the first generation from the repeated one.

Let the *t*th iteration of the algorithm be executed, at which the parent population *Pt* generated the child population *Qt*. A joint population *Rt* = *Pt*∪ *Qt* is then sorted according to the dominance principle. Thus, decisions that belong to the first front F1 and are not dominated should have a better chance of moving to the next generation population, which is ensured by the implementation of the elitism principle. If the first front F1 includes less than *N* members, then all members of this front move to the next population *Pt*+1. The remaining members of the population *Pt*+<sup>1</sup> are selected from subsequent fronts in the order of their ranking. That is, the front F2 is included in the new parent population *Pt*+1, then the front F3, and so on. This procedure continues until the inclusion of the next front leads to an excess of the population size *N*. Let the front F*l* no longer be included in the new population as a whole. To select members of the front F*l* who will be included in the next generation, we sort the solutions of this front using the crowded-comparison operator and select the best solutions to supplement the population *Pt*+1. Now, a new population *Pt*+<sup>1</sup> of size *N* can be used to apply selection, crossover, and mutation operators. It is important to note that the tournament selection binary operator is still used, but now, it is based on the crowded-comparison operator ≺*<sup>n</sup>*, to use which, in addition to determining the rank, it is necessary to calculate the crowding distance for each member of the population *Pt*+1.

The schematic procedure of the NSGA-II algorithm originally proposed in [55] is shown in Figure 2.

**Figure 2.** Next-generation transition procedure.

Thus, a successful distribution of solutions within one front is realized by using the crowded-comparison procedure, which is also used in the individuals selection. Since the solutions compete with their crowding distances (a measure of the solution's density in a neighborhood), the algorithm does not require any additional parameter determining the size of niches in the search space. The proposed crowding distance is calculated in the function space, but it can be implemented in the parameter space if necessary.

#### *2.5. Our Approach: An Evolutionary Algorithm for Pattern Generation*

The pattern *Pa* is determined by baseline observation *a* = 1*<sup>x</sup>*1, *x*2,..., *xp*2 and values of control variables *Y* = .*y*(*a*) 1 ,..., *y*(*a*) *p* ' that are binary values. Thus, the solution to the problem of pattern generation is a set of points in the space of control variables that approximate the Pareto front for a given base observation.

The posed problem of finding logical patterns determines the binary representation of solutions in the form of binary strings, rather than real variables, as was postulated in the original NSGA-II. The binary representation of the solution also determined the list of available crossover and mutation operators, among which uniform crossover and mutation by gene inversion were chosen [63,64].

In addition, the crowding distance in space "the number of covered observations of the base class" requires a different interpretation—"the number of covered observations of the class other than the base". Firstly, uniform coverage of the Pareto front is not required, since a preferable area is an area with a smaller scope of observations of the opposite class. Figure 3 illustrates this position: points of one Pareto front are colored according to their preference, with white color meaning less preference.

 **Figure 3.** Illustration of a typical first Pareto front for logical patterns.

Taking into account these preconditions, the crowding distance was replaced by one of the heuristic definitions of informativity [3]:

$$I = \sqrt{Cov^+(P\_a)} - \sqrt{Cov^-(P\_a)}.\tag{6}$$

Similar to the crowding distance, the more preferable the individual, the greater the informative value.

Special attention is paid to the formation of the initial set of individuals. The usual approach is a uniform discrete distribution for values 0 and 1. However, based on the nature of the problem, an equal probability of dropping out 0 and 1 will mean, on average, the fixation of half of the features in the initial population, which gives rise to overly selective patterns. Therefore, the discrete distribution of values in the original population is defined differently: dropout 1 with probability *p*, and dropout 0 with probability 1 − *p*. The value *p* will be the hyperparameter of the genetic algorithm.

Figure 4 shows a diagram of the developed approach to construct a classifier using the NSGA-II algorithm for pattern generation. The NSGA-II algorithm is run independently for each baseline observation. The launch result is a set of patterns of the first front. The sets of patterns obtained for each baseline observation were combined into one complete set of patterns, which can be reduced using the selection procedure [2,65]. When recognizing control (or new) observations, the decision about the class is made by balanced voting of patterns on the observation under consideration [8].

**Figure 4.** Scheme for constructing a classifier using the algorithm NSGA-II.

A detailed description of pattern generation procedure from Figure 4 is presented in a form of pseudocode (Algorithms 1 and 2).
