**2. Related Works**

In biology, the concept of coevolution is defined as follows: an adaptive coevolution in which two interacting species develop in the course of evolution. An evolutionary type of genetic evolution in which one species is influenced by another. At the biological level, it has several major significances:


This idea was successfully introduced into computer algorithms. More and more researchers begin to pay attention to the performance improvement brought by coevolution strategy to the EAs. In order to adapt to increasingly complex problems, Potter et al. incorporated the idea of coevolution into EAs. It extended the evolutionary paradigm of the time and described an architecture that evolved subcomponents into collections of collaborative species [15]. Then they analyzed the robustness of cooperative coevolutionary algorithms (CCEAs), which provided a theoretical basis for the effectiveness of coevolutionary strategies [16]. Wiegand et al. also used evolutionary game theoretic (EGT) models to help understand CCEAs and analyze whether CCEAs are really suitable for optimization tasks [17]. One of the EGT models was the multi-population symmetric game, which can be used to analyze and model the coevolutionary algorithm. In this context, coevolution tended to decompose an evolving population into several small subpopulations and ensured that each subpopulation did not interfere with each other. Then the individual in the population was optimized continuously through the cooperation of each subpopulation. The effectiveness of CCEAs is verified using CCEAs to solve complex problems (or structure) [18,19].

The coevolution strategy of CCEAs can group the population, which is suitable for large-scale optimization problems (LSOPs). The dimension of decision variables in LSOP is too high, so grouping is a good solution at present. This is also the initial application scenario of CCEAs. Yang et al. considered that traditional CCEAs can only deal with and decompose separable LSOPs, but often cannot solve the inseparable LSOPs. Therefore, a stochastic grouping scheme and adaptive weighting were introduced into problem decomposition and coevolution, and a new differential evolutionary algorithm was used to replace the traditional evolutionary algorithm. Through this improvement, the algorithm can effectively optimize the 1000-dimensional indivisible problem [20]. In addition, a multilevel coevolution (MLCC) [21] framework was proposed to solve LSOPs. MLCC was a framework that determined the size of a group when the problem was decomposed. MLCC constructed a set of problem decomposers based on random grouping strategies with different group sizes, and used an adaptive mechanism to select decomposers based on historical performance to self-adapt between different levels.

CCEAs is also applied to optimization problems in other scenarios. Liu et al. used cooperative coevolution (CC) to improve the speed of evolutionary programming (EP) [22]. However, this study showed that the time cost increased linearly as the dimension of the problems was increased. CC was also used to deal with global optimization and find global optimal solutions [23]. Chen et al. proposed a cooperative coevolution with variable interaction learning (CCVIL) framework [24], which treated all variables as independent and put them into separate groups, and then continuously merged groups when found the relationship between them at the iteration.

In addition to the above optimization problems, many researchers in recent years have begun to apply CC to MaOPs. Tan et al. combined SPEA2 and CC effectively and proposed SPEA2-CC [25]. After experimental comparison, the performance of SPEA2- CC was significantly better than that of the original SPEA2 as the number of objectives increases. SPEA2-CC provided theoretical support for the scalability of performance of CC in MaOPs.

A lot of researchers combined CC with the preference of the decision maker to deal with MaOPs, which led to the preference-inspired coevolutionary algorithm (PICEA) [26]. Researchers have shown that PICEA can handle not only MOPs, but also MaOPs [27]. The experiments showed that the preference-driven coevolution algorithm was superior to some other methods under the measurement of a hypervolume indicator. One defect of PICEA was the uneven distribution of the obtained solutions on PF, which means poor diversity. In order to solve this problem, an improved fitness allocation method (PICEAg) [28] was proposed, which can consider the density information of solutions. In addition, a new preference-inspired coevolutionary algorithm using weight vectors (PICEA-w) [29] was proposed. The algorithm coevolved with the candidate solution during the search process. Coevolution adaptively constructed the appropriate weights in the optimization process, thus, effectively led the candidate solutions to the PF.

Liang et al. proposed a multi-objective coevolutionary algorithm based on a decomposition method [30], which used subpopulations to enhance objectives. Running on multiple subpopulations and external archive via the differential evolution (DE) operator to improve each objective and diversify the trade-offs of external archiving solutions. In addition, when an objective was not optimized, computing resources on that objective were allocated to other objectives and external archive strengthens the tradeoffs on all objectives. In addition, PF was approximated by parallel subpopulations [31]. Firstly, the MaOPs were decomposed by using a uniformly distributed weight vector, and then each subpopulation was associated with a weight vector. Using subpopulations to optimize each subproblem, and elite individuals in subpopulations were used to produce offspring. This can not only enhanced the diversity of the population, but also accelerated the convergence rate.

There were also studies that used new approaches to further improve CC performance on MaOPs. Shu et al. proposed a preference-inspired coevolution algorithm (PICEA-g/LPCA) with local principal component analysis (PCA) oriented goal vectors [32]. PICEA-g/LPCA was a further improvement on the basis of PICEA-g, and it used local PCA to extend the ability of PICEA-g and improved the convergence. In addition, a coevolutionary particle swarm optimization algorithm with a bottleneck objective learning (BOL) strategy [33] was proposed to meet the convergence and diversity challenges in finite population size. In this algorithm, multiple subpopulations coevolved to maintain diversity. The BOL strategy was also used to improve convergence across all objectives. Elitist learning strategy (ELS) was also used to jump out of local PFs, and juncture learning strategy (JLS) was used to develop areas that are missing in PF.

Coevolution strategies have now been applied to many problems. In addition to general MaOPs, there are dynamic interval many-objective optimization problems (IMaOPs) [34,35], large-scale multi-objective optimization problems (LSMOPs) [33,36], and feature selection [37]. Finally, some recent work also uses coevolution or learning techniques [38,39] to deal with MaOPs [40–42].

As described in Section 1, the solution set obtained by Pareto-based MOEAs has a good distribution on PF, but there is a general problem of slow convergence and the performance will decline with the increase of the objective number. Non-Pareto MOEAs shows good convergence performance, but not good diversity performance. The solution set of non-Pareto MOEAs tends to converge to one or some special regions of PF, especially in the case of extremely irregular PF. Li et al. proposed a bi-criterion evolution (BCE) framework in 2015 [43], which performed well in many-objective optimization. In the BCE framework, two populations evolved simultaneously. One used the Pareto criterion (PC) and the other used the non-Pareto criterion (NPC). The aim was to take advantage of both approaches and compensate for their shortcomings. These two parts work together to promote evolution through the exchange of information between populations. Among them, NPC population led PC population to converge, and PC population can make up for the loss of NPC population in diversity. The two operations included in the framework, population maintenance and individual exploration, were used to preserve good nondominated individuals and explore unexplored areas of NPC population respectively. Although the framework of BCE did not use the method of subpopulation coevolution in CC, the idea of cooperation between the two populations should also belong to CC.

Dynamic learning strategy (DLS) can consider the evolutionary state of solution set during algorithm iteration. It is well known that the initial population of MOEAs is randomly generated without specific requirements. The randomly generated solution is to take the value of the solution in the domain [*<sup>x</sup>*min, *x*max] in the case of a normal (Gaussian) distribution. The equation is as follows:

$$\mathbf{x} = \mathbf{x}\_{\text{min}} + rand \ast (\mathbf{x}\_{\text{max}} - \mathbf{x}\_{\text{min}}) \tag{2}$$

where *rand* is a random number generated by a standard normal distribution. So, the convergence of initial population is very poor, just random individuals in the solution space. DLS can pay attention to this point, so that in the initial stage of population evolution, it can ensure rapid convergence of solutions by using more computing resources to the selection of convergence-related solutions. As the iteration goes on, the solutions converge towards PF. At this time, it is necessary to keep the solution set more diversified. Therefore, with the iteration of MaOEAs, computational resources will gradually incline to diversity-related solutions to maintain a better distribution of the population on PF.

Therefore, an effective combination of BCE and DLS may yield relatively good results, as confirmed by the experimental results in Section 5. This paper takes advantage of the coevolution of information interaction between the two populations, and introduces DLS into the environmental selection of NPC to better enable the evolution of NPC population. The cost value (*CV*) [44] will be selected as indicator. This algorithm will be called DL-TPCEA. The detailed algorithm will be described in Section 4.

#### **3. The Background of MaOPs**

At present, many single objective optimization problems in the optimization field have become the focus of research, such as workshop scheduling problems [45–49] and numerical optimization problems [50,51]. Most of these can be solved by classical algorithms and their improved versions, such as artificial bee colony algorithm (ABC) [47,48,52,53], particle swarm optimization (PSO) [51,54], monarch butterfly optimization (MBO) [55–58], ant colony optimization (ACO) [59,60], krill herd algorithm (KH) [52,61–64], elephant herding optimization (EHO) [65–67], and other metaheuristic algorithms [68–77]. However, there are some problems in many-objective optimizations, which cannot be solved by single objective techniques. Because of the conflicts between objectives, all objectives cannot be optimized simultaneously using single objective techniques. MaOPs also have different characteristics, which are described in more detail below. The current MaOPs in the field of many-objective optimizations are mainly divided into the following categories:

(1) General MaOPs: As mentioned in Equation (1), general MaOPs are problems with *M* conflicting objectives. The overall goal of solving MaOPs is to obtain a solution set that

can characterize PF, but at the same time there are a variety of problems. For objective numbers, MaOPs are more difficult to resolve than MOPs. Low dimensional optimization is mainly solved by non-dominated sorting, such as NSGA-II [4] and improving the strength of the Pareto evolutionary algorithm (SPEA2) [78]. The non-dominated sorting is described as follows: for the minimization problem, taking two vectors *x*1 and *x*2 in Ω, if and only if *fi*(*x*1) ≤ *fi*(*x*2) for each *i* in {1, 2, ... , *M*} and *fj*(*x*1) < *fj*(*x*2) for at least one *j* in {1, 2, . . . , *M*}. Let us call it *F*(*x*1) Pareto dominates *F*(*x*2), and the notation is *F*(*x*1) > *F*(*x*2), and if, and only if, no point *x* in Ω to satisfy *F*(*x*) > *F*(*x\**), called *F*(*x\**) Pareto optimal solution and *x\** is Pareto optimal point, and the set of all Pareto optimal solutions is PF mentioned above, the set of all Pareto optimal points is called Pareto Set (PS).

(2) Large-scale MaOPs: these problems often involve high dimensional decision variables. In general, MOPs are called large-scale MOPs (LSMOPs) [79] when its decision variable dimension *N* > 100. The performance of the MaOEAs will decrease as the number of decision variables increases. For example, when using a mutation operator to mutate individuals, the probability of producing good individuals after mutation will also decrease due to the large dimension of decision variables. There are some researches on LSMOPs. At present, most of this work is based on classifying decision variables and dealing with them separately.

Ma et al. proposed a many-objective evolutionary algorithm based on decision variable analysis (MOEA/DVA) [80], which divided the whole population into convergence-related variables and diversity-related variables through decision variable analysis strategy. Moreover, MOEA/DVA optimized the two parts respectively, so that the convergence and diversity of the population were maintained well. Zhang et al. [81] proposed an evolutionary algorithm based on decision variable clustering for large-scale many-objective optimization problems (LMEA). LMEA used *k*-means clustering method and takes the angle between solutions and the direction of convergence as the feature to carry out the clustering, and divided the decision variables into convergence-related variables and diversity-related variables. LMEA further classified the unclassified individuals in MOEA/DVA to promote the convergence and diversity of the population. In addition, Chen et al. [1] proposed an evolutionary algorithm based on covariance matrix adaptation evolution strategy and scalable small subpopulation to solve large-scale many-objective optimization problems (S3-CMA-ES).

The above work is based on the premise of grouping decision variables to deal with large-scale many-objective optimization problems, which makes a grea<sup>t</sup> contribution to the large-scale many-objective optimization.

(3) Dynamic MaOPs (DMaOPs): DMaOPs add time (environment) variation to the general MaOPs. It is described as follows:

$$\begin{array}{ll}\text{minimize} & F(\mathbf{x}) = \{f\_1(\mathbf{x}, \mathbf{t}), f\_2(\mathbf{x}, \mathbf{t}), \dots, f\_M(\mathbf{x}, \mathbf{t})\} \\ & \text{subject to} & \mathbf{x} \in X \end{array} \tag{3}$$

where *t* is time (environment) variation. When time (environment) changes, PF of the MaOPs also changes, that is, the optimal solution set in the previous state is not necessarily the optimal solution set in the current state. This means that the algorithm is not only required to adapt to the many-objective environment to optimize multiple objectives, but also needs the changes brought by the response time (environment). When the time (environment) changes, the algorithm can respond quickly and ge<sup>t</sup> the optimal solution set in the latest environment.

In the environment of DMaOPs, many excellent algorithms have been proposed. Liu et al. proposed a dynamic multi-population particle swarm optimization algorithm (DP-DMPPSO) based on decomposition and prediction [82]. Using the archive update mechanism based on the objective space decomposition and the population prediction mechanism to accelerate the convergence, the results show that the algorithm has a good effect in DMaOPs processing. Finally, there are also many dynamic multi-objective evolutionary algorithms (DMOEAs) that use various optimization strategies [83–87] to deal with DMaOPs.

The main purpose of this paper is to solve the general MaOPs with high dimensional objective space, using the Pareto-based and non-Pareto-based methods for coevolution of the two populations, respectively. The two populations make use of the advantages of each other and make up for the disadvantages, which is very promising to solve the difficulty of optimization in the high-dimensional objective space. The details will be introduced in Section 4.

#### **4. The Framework of DL-TPCEA**

In this part, the specific process of the dynamic learning strategy will be introduced first, and then the DL-TPCEA will be introduced. All algorithmic details such as parameter control and algorithmic flow are given.

#### *4.1. Dynamic Learning Strategy*

#### 4.1.1. The Description of DLS

Previous MOEAs generally used the immutable evolution strategy during iteration. For example, NSGA-II used non-dominated sorting to select the non-dominated solutions in population to control the convergence, and then used crowding distance to select among the non-dominated solutions to improve the diversity of the population. This method is very time-consuming because of Pareto sorting, and tends to have poor convergence effect when the number of objectives is relatively large. However, DLS will make full use of the advantages of fast running speed and good convergence effect of indicator-based algorithm. Moreover, the enhancement of diversity is further strengthened to balance the convergence and diversity of the solution set. This paper will take a two-objective problem as an example to illustrate the advantages of DLS over traditional immutable evolutionary strategies.

As shown in Figure 1a, after the population initialization, the distribution of these individuals in the objective space is very chaotic. In other words, the convergence and diversity of the population are poor. According to the current population, the priority is to ge<sup>t</sup> these individuals to converge to PF as soon as possible. This will be guided by indicator-based method. For example, in a practical engineering problem, the individuals on PF are those who can minimize the cost. In this case, more computing resources should be allocated to the process of convergence-related operations to achieve rapid convergence of the population to PF. A small part of the computational resources are then allocated to operations that increase the diversity of the population to ensure that the diversity of the population is not particularly poor.

After the above operation, the distribution of individuals in the population in the objective space will gradually move towards PF, as shown in Figure 1b. However, the convergence level of the whole population is not enough at this time, so high selection pressure is still needed to promote convergence. As the iteration goes on, the distribution of individuals in population in the objective space will gradually become close to PF, as shown in Figure 1c. As introduced in Section 1, the indicator-based algorithm converges quickly but loses diversity easily. The example in Figure 1c shows that these individuals are close to PF under the guidance of the indicator, but the convergence position is more inclined to the central region of PF. At this point, more computing resources need to be tilted to increase the diversity of the population, such as the preference to keep the individual A and the individual B in Figure 1c into the next generation. By changing the computational resource allocation according to the evolutionary state of the population, individuals in the population can maintain good convergence and diversity. As in Figure 1d, the individuals in the resulting solution set are uniformly distributed on PF.

**Figure 1.** The process of dynamic learning strategy. (**a**) The state after initialization; (**b**) the state at the beginning of evolution; (**c**) the state of late evolution; (**d**) the state at the end of evolution.

#### 4.1.2. The Details of DLS

The above is only a brief description of the steps of DLS; the following is a detailed explanation of the specific process of DLS. First, supposing the population size of MaOEAs is *N*. In an iteration, *N* new individuals are generated by crossover and mutation operators, at which time the original individuals and newly generated individuals form a new population, which is denoted as *P*2*N* here. What needs to be done next is to select *N* individuals that are most conducive to maintain convergence and diversity through environmental selection as the initial population *Pnew* of the next iteration. These operations are accomplished through DLS.

As shown in Algorithm 1, the 2*N* individuals are first layered by non-dominated sorting (Line 1, Algorithm 1). Here, *FrontNo* is the number of layers that each individual resides in, and *MaxFNo* is the largest number of layers that are non-dominated. Where *MaxFNo* satisfies:

$$\sum\_{i=1}^{\text{MaxFNo}-1} L\_i \le N \text{&É} \sum\_{i=1}^{\text{MaxFNo}} L\_i > N \tag{4}$$

where *Li* represents the number of individuals in the *i*th non-dominated layer (*i* = 1, 2, . . . , *MaxFNo*). Here, the non-dominated individuals in Layer 1 to layer *MaxFNo*-1 will preferentially select into *Pnew* (Line 2, Algorithm 1), and then continue to select the remaining individuals in Layer *MaxFNo*.

Although in the case of 2- or 3-objective problems (MOPs), it may be more clearly layered. This makes the number of individuals in Layer *MaxFNo* smaller, which means fewer individuals are selected through DLS. However, with the increase of the objective number, the proportion of non-dominated individuals in the whole population also increased, almost all individuals are non-dominated when the objective number is more than 12 which is described in Section 1. This leads to an increase in the number of individuals in

Layer *MaxFNo*, even if all individuals in the population are in Layer *MaxFNo*. This also makes the role of DLS greatly increased, and become more useful in solving MaOPs.



Next, the values of *Cn* and *Dn* will be calculated according to needs (Lines 3–4, Algorithm 1), representing the number of convergence-related individuals and diversityrelated individuals that need to be preserved, respectively. This is also the key for DLS to ensure dynamic computational resource allocation within algorithm iteration. The calculation equation of the *Cn* is as follows:

$$\mathcal{C}\_n = \left[ R\_{\mathcal{gen}} \times \pi \times (1 - \frac{\mathcal{gen}}{\max \,\mathcal{gen}}) \right] \tag{5}$$

where *gen* represents the current number of iterations and *maxgen* represents the maximum number of iterations. *Rgen* represents the total number of individuals that need to be selected at Layer *MaxFNo* at generation *gen*. Moreover, *α* [0, 1] is a convergence factor that controls the rate of convergence of the population. Through experimental research, it is found that when *α* is about 0.9, the performance can reach the best. In this way, the convergence speed can be achieved quickly at the same time; it will not fall into the local optimal. The symbol · rounds the element to the nearest integer greater than or equal to that element. Then the number of diversity-related individuals that needs to be preserved will continue to be calculated. The calculation of *Dn* is as follows:

$$D\_n = R\_{\text{gen}} - \mathbb{C}\_n \tag{6}$$

After *Cn* and *Dn* are calculated, two indicators of convergence and diversity will be calculated for the individuals in population, and *Rgen* individuals in Layer *MaxFNo* will be retained according to the rules of DLS. In this paper, cost value (*CV*) [44] will be selected as the convergence-related indicator. Let *<sup>F</sup>*(*xi*)=(*f* 1(*xi*), *f* 2(*xi*), ... , *fM*(*xi*)) be the objective vector for individual *xi* (*i* = 1, 2, ... , *N*). Then the mutual evaluation of individual *xi* by individual *xj* is as follows:

$$cv\_{ij} = \max\_{m} \{ f\_m(\mathbf{x}\_j) / f\_m(\mathbf{x}\_i) \}, \ m = 1, 2, \dots, M \tag{7}$$

Then the mutual evaluation of each individual in the population is as follows:

$$CV\_i = \min\_{j \neq i} \mathcal{cr}\_{ij}, j = 1, 2, \dots, N \tag{8}$$

This indicator will not be affected by the change of objective number, and the characteristics of this indicator can be clearly understood according to Equations (7) and (8). The first point is that *xi* is a non-dominated individual when *CVi* > 1, and the second is that *xi* is a dominated individual when *CVi* ≤ 1. Therefore, we can use this indicator as convergence-related indicator to select the individuals in Layer *MaxFNo*. The individuals that have larger *CV* will be retained, in other words, retaining the individuals who have better convergence.

As for the diversity-related indicators, the distance between individuals in the population is generally used as the evaluation criterion. For example, Euclidean distance is used to calculate the crowding distance in NSGA-II. In this paper, the *Lp*-norm-based distance is selected to calculate the distance between individuals in the population. It has been experimentally demonstrated that the *Lp*-norm-based distance is more efficient than the Euclidean distance, Manhattan distance, etc., especially when dealing with MaOPs [88]. Parameter *p* of *Lp*-norm-based distance is recommended as 1/ *M*. Therefore, *Lp*-norm-based distance is selected as the diversity-related indicator in this paper.

After the calculation of two indicators for individuals in the population, the individuals in Layer *MaxFNo* were selected and saved to *Pnew* according to *Cn* and *Dn*, until the size of *Pnew* reached *N*.

#### *4.2. The Framework of BCE*

There are two main populations in BCE, namely NPC population and PC population. These two parts use the non-Pareto method and Pareto method to evolve the population, respectively. For the NPC population, any non-Pareto evolutionary criterion can be used directly. However, when the next generation is produced through competition, the environmental selection needs to select individuals from both NPC population and PC population (NPC selection). For PC population, non-dominated individuals from NPC and PC population are reserved (PC selection). Since the number of non-dominated solutions is unknown, population maintenance operation is carried out to eliminate some individuals with poor diversity when the number of non-dominated individuals is greater than the predefined threshold *N*.

Because of NPC population convergence speed is relatively fast, the individuals in NPC population can accelerate the convergence of PC population in PC selection. Because of the diversity of PC population is better, NPC population can explore the unexplored areas on PF through individual exploration operation and use the individuals in NPC population to enhance the diversity of PC population. In this way, the two populations interact with each other to promote the evolution of each population so that the convergence and diversity of the final solution set are good. The final output here is the PC population.

#### 4.2.1. PC Selection and NPC Selection

The process of PC selection is to select non-dominated individuals from the mixed set of PC population and the new individuals produced by PC and NPC evolutions. NPC selection is based on the criteria of NPC evolution, which conducts environmental selection on a mixture of NPC populations and new individuals generated by PC populations. Assuming that the evolution of NPC populations uses indicator-based algorithms, then NPC selection is the selection of individuals having better indicator values in the mixed population for the next generation.

For the evolution of NPC populations, some algorithms rely on the information of the parent generation to update the individual, which is not feasible here. So individuals in the PC population are compared with individuals in the NPC population. If an individual in the PC population is better than one or more individuals in the NPC population according to the evolutionary criteria of the NPC population, then that individual (or a random one of those individuals) in the NPC population will be replaced by that individual in the PC population.
