*Article* **Variable Decomposition for Large-Scale Constrained Optimization Problems Using a Grouping Genetic Algorithm**

**Guadalupe Carmona-Arroyo \*, Marcela Quiroz-Castellanos \* and Efrén Mezura-Montes**

Artificial Intelligence Research Institute, Universidad Veracruzana, Campus Sur, Calle Paseo Lote II, Sección Segunda 112, Nuevo Xalapa, Veracruz 91097, Mexico; emezura@uv.mx **\*** Correspondence: gcarmonaarroyo@gmail.com (G.G.-A.); maquiroz@uv.mx (M.Q.-C.)

**Abstract:** Several real optimization problems are very difficult, and their optimal solutions cannot be found with a traditional method. Moreover, for some of these problems, the large number of decision variables is a major contributing factor to their complexity; they are known as Large-Scale Optimization Problems, and various strategies have been proposed to deal with them. One of the most popular tools is called Cooperative Co-Evolution, which works through a decomposition of the decision variables into smaller subproblems or variables subgroups, which are optimized separately and cooperate to finally create a complete solution of the original problem. This kind of decomposition can be handled as a combinatorial optimization problem where we want to group variables that interact with each other. In this work, we propose a Grouping Genetic Algorithm to optimize the variable decomposition by reducing their interaction. Although the Cooperative Co-Evolution approach is widely used to deal with unconstrained optimization problems, there are few works related to constrained problems. Therefore, our experiments were performed on a test benchmark of 18 constrained functions under 100, 500, and 1000 variables. The results obtained indicate that a Grouping Genetic Algorithm is an appropriate tool to optimize the variable decomposition for Large-Scale Constrained Optimization Problems, outperforming the decomposition obtained by a state-of-the-art genetic algorithm.

**Keywords:** Grouping Genetic Algorithm; variable decomposition; Large-Scale Constrained Optimization

#### **1. Introduction**

A constrained numerical optimization problem is defined by finding the vector **<sup>x</sup>** <sup>∈</sup> <sup>R</sup>*<sup>D</sup>* that minimizes the objective function *Obj*(**x**) subject to inequality *gj*(**x**) and equality *hk*(**x**) constraints [1]. This is described by Equation (1).

> minimize *Obj*(**x**) subject to *gj*(**x**) ≤ 0, *j* = 1, . . . , *q* (1)

$$h\_k(\mathbf{x}) = 0, \ k = 1, \dots, r.$$

where *q* and *r* represent the number of inequality and equality constraints, respectively, **x** = (*x*1, ... , *xD*), and the search space S is defined by the lower limits *l* and upper limits *u* (*li* ≤ *xi* ≤ *ui*), while the feasible region is defined as the subset of solutions that satisfy the constraints of the problem <sup>F</sup> <sup>⊂</sup> <sup>S</sup>.

To handle the constrained problem, the constraint violation sum *cvs* [2], is calculated by Equation (2).

$$\text{cvs}(\mathbf{x}) = \sum\_{j}^{q} \max(0, g\_j(\mathbf{x})) + \sum\_{k}^{r} \max(0, |h\_k(\mathbf{x})| - \epsilon) \tag{2}$$

**Citation:** Carmona-Arroyo, G.; Quiroz-Castellanos, M.; Mezura-Montes, E. Variable Decomposition for Large-Scale Constrained Optimization Problems Using a Grouping Genetic Algorithm. *Math. Comput. Appl.* **2022**, *27*, 23. https://doi.org/10.3390/mca 27020023

Academic Editor: Claudia Schillings

Received: 26 January 2022 Accepted: 1 March 2022 Published: 3 March 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

131

where |*hk*(**x**)| − is the transformation of equality constraints into inequality constraints |*hk*(**x**)| − ≤ 0 with = <sup>1</sup> × <sup>10</sup><sup>−</sup>4.

According to the specialized literature [3,4], a Large-Scale Optimization Problem consists of 100 or more variables, while benchmark functions for sessions and competitions on the field include thousands of decision variables. Algorithms that solve Large-Scale Optimization Problems are usually affected by the curse of dimensionality; i.e., these problems are more complex to solve when the number of decision variables increases. One of the best-known approaches to deal with these problems is the one proposed by Potter and De Jong called Cooperative Co-Evolution (CC) [5], which is based on the divideand-conquer strategy. This CC approach works in three stages: (1) first, the problem is decomposed into subcomponents of less dimension and complexity; then, (2) each subproblem is optimized separately; and finally, (3) the solutions of each subproblem cooperate to create the solution of the original problem.

Although many of the approaches to solve Large-Scale Optimization Problems have implemented the CC approach, the first problem that arises is to find the adequate decomposition of the subgroups since the interaction among the variables must be taken into account to divide the problem. In other words, if two or more variables interact with each other, they must remain in the same subcomponent, just as the variables that do not interact with others must be part of subcomponents with just one variable. The decomposition of the subgroups can be evaluated considering definitions of problem separability and partial separability, as explained in Section 3.2. If the interacting variables are not grouped into the same subgroup, CC tends to find a solution that is not the optimum of the original problem but a local optimum introduced by an incorrect problem decomposition [6].

Several strategies have been proposed in the literature to deal with the problem of creating an adequate decision variable decomposition, ranging from random approaches to strategies that study the interaction among variables to optimize this decomposition. When the original problem is decomposed into subproblems, we aim for the interaction between them to be at minimum. For this reason, we can work with the decomposition through optimization strategies, where the objective is to group variables that interact with each other in the same subcomponent.

One of the first works related to the optimization of the variable decomposition for Large-Scale Constrained Problems was proposed by Aguilar-Justo et al. [7], who presented a Genetic Algorithm (GA) to handle the interaction minimization in the subcomponents. This GA and its operators, such as crossover and mutation, work under an integer genetic encoding, which is one of the most popular ways of representing a solution as a chromosome in this type of algorithm.

In this work, we resort to a Grouping Genetic Algorithm (GGA) to solve the decomposition problem, since these algorithms have proven to be some of the best when it comes to combinatorial optimization problems where the optimization of elements in groups is involved [8]. This proposal aims to show the benefits of using a GGA and its group-based representation, for the creation of subcomponents, compared against a genetic algorithm. In addition, to the best of the authors' knowledge, our proposal is the first GGA approach to handle decomposition in Large-Scale Constrained Optimization Problems.

We chose similar main operators and parameters to the genetic algorithm proposed by Aguilar-Justo et al. [7] in order to evaluate the impact of the representation schemes for the decomposition problem and to make a fair comparison of the performance.

Both algorithms were evaluated on a set of 18 test functions proposed by Sayed et al. [3], which are problems with 1, 2, and 3 constraints with 100, 500, and 1000 variables, respectively. Experimental results show that the proposed GGA obtains a suitable variable decomposition when compared against the GA of Aguilar-Justo et al. [7] for variable decomposition in Large-Scale Constrained Optimization Problems, especially where the separation is more complicated, such as in non-separable problems.

The work continues as follows: in the next section, we show related work regarding Decomposition Methods and Grouping Genetic Algorithms. In Section 3, we show our proposed GGA and describe each of its components in detail. Section 4 contains the experiments and results of our algorithm compared to a genetic algorithm and a brief analysis of the performance of the GGA. Finally, in Section 5, we describe the conclusions and future work corresponding to our research.

#### **2. Related Works**

#### *2.1. Decomposition Methods*

According to Ma et al. [6], several variable grouping strategies have been proposed in the specialized literature for variable decomposition to deal with Large-Scale Unconstrained Optimization Problems. They classify them into the next classes: static variable grouping, random variable grouping, variable grouping based on interaction learning, variable grouping based on domain knowledge, overlap and hierarchical variable grouping, and finally, some hybrids of them. In the next paragraphs, we describe the works related to each class.

*Static variable grouping* methods do not rely on any intelligent procedure to create the variable decomposition. Instead, they preliminarily decompose the decision vector into a set of low-dimensional subcomponents and fix the variable grouping during the process of optimization. Among the works that perform static grouping of variables are the works of Potter and De Jong [5] and Liu et al. [9], which show good performance with fully separable problems. For non-separable problems, Van den Bergh and Engelbrecht [10] propose a static sequential decomposition. However, the decomposition process is dependent on an adequate number of subcomponents that must be adequate from the beginning of the strategy, and the static decomposition process has poor performance in many problems.

For that reason, several authors proposed *random variable grouping* strategies, such as the case of Yang et al. [11], who present random decomposition with a fixed number of subcomponents. Moreover, the same authors propose in [12] a random decomposition with a dynamic number of subcomponents. Omidvar et al. [13,14] improve these random strategies by integrating the probability of interaction between variables in the grouping technique.

In addition, if an algorithm can learn the structure of the problem and decompose it, the difficulty in solving the problem can be significantly reduced (*variable grouping based on interaction learning*). Many approaches have been proposed to detect variable interactions, and we can subdivide them into those based on perturbation, statistical models, distribution models, approximate models, and linkage adaptation. For example, in some cases, the interaction is captured by perturbing the decision variables and measuring the change in fitness caused by the perturbations, like in the work of Xu et al. [15]. Furthermore, Differential Grouping (DG) and Differential Grouping version 2 from Odmivar et al. [16,17], respectively, are based on perturbation as well, which are among the most popular decomposition algorithms and have been highly studied, which has led to various improvements, such as recursive decomposition [18–21]. Moreover, Delta Grouping [14] is classified as a decomposition method based on statistical models, where all variables and the objective functions are considered as random variables. Statistical analyses of variables or objective functions are performed first, and then the variables are grouped. In a distribution model, the set of promising solutions is first used to estimate the variable distributions and variable interactions, and then it is taken to generate new candidate solutions, based on the learned variable distributions and variable interactions: such is the case of Estimation of Distribution Algorithm (EDA), as is the case with the work of Sopov [22], where a genetic algorithm is combined with an EDA for collecting statistical data based on the past search experience to provide the problem decomposition by fixing genes in chromosomes, as well as other representatives of such methods [23,24]. As an example of an approximate model, the fitness evaluation of a Large-Scale Continuous Optimization Problem is converted to the evaluation of a simpler, partially separable problem in [25]. Linkage adaption methods use specially designed evolution operators, representations, and mechanisms to divide variables into groups [26].

When it comes to *overlap and hierarchy variable grouping*, which are usually interspersed in the decision vector, more elaborate strategies have to be used; Goh et al. [27,28] suggested assigning each variable to several subcomponents, each of which contains more than one variable, and the subcomponents compete with each other to represent shared variables. Furthermore, Strasser et al. [29] introduce some overlapping grouping strategies, including random overlapping grouping, neighbor overlapping grouping, centered overlapping grouping, and more.

Regarding the *variable grouping based on domain knowledge*, before CC is implemented to solve specific real-world problems, domain knowledge can be harnessed to reduce the complexity of the problems. The conflicting probability of two flights was used by Guan et al. [30] to learn the variable interaction in solving the flight conflicting avoidance problem, which is one of several examples of real optimization problems where domain knowledge can be a good tool.

All the works mentioned above focus on unconstrained problems. One of the first decomposition methods for solving Large-Scale Constrained Optimization Problems is an extension of the work of Sayed et al. [25]; this new version is known as the Variable Interaction Identification Technique for Constrained Problems (VIIC) [3]. This method can find the interaction between variables in problems of 100, 500, and 1000 dimensions with inequality constraints. Sayed et al. proposed to measure each decomposition of variables by minimizing the absolute difference between the full evaluation and the sum of each evaluated subgroup based on the definition of the separability and partial separability of a function. The approach was tested at a new benchmark and compared to Random Grouping (RG), and the results showed that VIIC outperformed RG. Later, Aguilar-Justo and Mezura-Montes [31] improved the performance of the VIIC to achieve adequate decomposition for a fixed number of subgroups. They transformed the constrained problem into an unconstrained problem and used a neighborhood heuristic to guide the search by their proposed decomposition and then optimize it (called VIICN). After that, they proposed an improvement to their work where the principles of VIIC and VIICN are used to build a genetic algorithm that performs a dynamic decomposition which they called DVIIC, without establishing a fixed number of subcomponents [7]. Recently, Vakhnin and Sopov [32] proposed a method based on CC that increases the size of groups of variables at the decomposition stage (called iCC), working with a fixed number of subcomponents.

Intending to improve the existing methods for the decomposition of variables, we propose a genetic algorithm with a genetic encoding based on groups, better known as the Grouping Genetic Algorithm, to optimize the variables decomposition. Experimental results demonstrate the benefits of using a group-based encoding scheme for this problem and its advantages over the genetic algorithm with an integer-based encoding scheme (DVIIC [7]).

#### *2.2. Grouping Genetic Algorithms*

As we have mentioned before, the variable decomposition problem is an optimization problem because we search for the best decomposition, in the sense of the variable interaction; that is, given a set *X* = *x*1, *x*2, ... , *xD* of *D* variables, we want to decompose said set into *m* disjoint groups, so that the variables within each group do not interact with the variables of the other groups. Therefore, we see our problem as a grouping problem.

According to the literature [8], grouping problems are a type of combinatorial optimization problem where a set *X* of *D* items is usually partitioned into a collection of *m* mutually disjoint subsets (groups) *Gj*, so that *<sup>X</sup>* = ∪*<sup>m</sup> <sup>j</sup>*=1*Gj*, and *Gj* ∩ *Gk* = ∅, *j* = *k*. In this way, an algorithm designed to solve a grouping problem seeks the best possible distribution of the *D* items of the set *X* in *m* different groups (1 ≤ *m* ≤ *D*), such that each item is exactly in one group.

Kashan et al. [33] organized the grouping problems in three categories, using as criteria the number of groups, the type of groups, and the dependence on the order of the groups. First, using the number of groups as the criterion, grouping problems can be classified

as either constants or variables. In this sense, if the number of groups required is known, the problem is constant. On the other hand, if the number of groups is unknown, the problem is variable. Second, these problems can be divided into identical and non-identical groups, considering the characteristics of their groups. In this classification, if the quality of a solution is modified by exchanging all the items of two groups, that problem belongs to the non-identical grouping class. Otherwise, the problem is part of the identical category. Finally, grouping problems are called order-dependent when the solution quality depends on the groups' order. Consequently, grouping problems without this dependency belong to the not order-dependent class. Thus, we can say that the decomposition problem is a variable, identical, and not order-dependent grouping problem.

Grouping problems are very difficult to solve. Most of them are NP-hard, which implies there is no algorithm capable of finding an optimal solution for every instance in polynomial time [34]. There are several NP-hard grouping problems, such as the Bin Packing Problem, Job Shop Scheduling Problem, etc. Ramos-Figueroa et al. [8] studied in their work the strategies that work with most problems like the ones mentioned before. They concluded that the best and the most popular strategies to solve such problems are the Grouping Genetic Algorithms or GGAs.

The GGA was designed in 1992 by Falkenauer [35] and is an extension to the traditional GA with the difference of using a group-based solutions representation scheme and variation operators working together with such solution encoding. Ramos-Figueroa et al. [36] remark that the encoding of a grouping problem solution into a chromosome is a key issue for obtaining good GGA performance; the authors also comment on the importance of integrating crossover and mutation operators adapted to work at the group level. They present a survey of different variation operators designed to work with GGAs that use different types of encoding, as well as their advantages to solve grouping problems.

The state-of-the-art [8] indicates that some of the best results when solving NP-hard grouping problems have been obtained by GGAs that combine grouping encoding schemes and special operators adapted to work with these genetic encodings. Moreover, GGAs have been highly studied for grouping problems that have similarities with the variable decomposition problem, which is also due to the exploration and exploitation of the search space that is given by the nature of the elements of evolutionary algorithms [37]. In this work, we propose, to the best of the authors' knowledge, the first GGA for the variable decomposition problem in Large-Scale Constrained Optimization Problems. A comparative study is conducted to evaluate the performance of our Grouping Genetic Algorithm versus the genetic algorithm DVIIC [7] on the decomposition of variables for Large-Scale Constrained Optimization Problems. To promote a fair comparison, we implement similar operators and equivalent parameter settings. The experiments were carried out using 18 test functions each with 100, 500, and 1000 variables. The obtained results allow us to validate the advantages of the group-based encoding over the integer-based encoding.

#### **3. A Grouping Genetic Algorithm for the Variable Decomposition Problem**

The variable decomposition problem can be classified as a grouping problem. We seek to optimize the separation into groups of the decision variables of the Large-Scale Problem; that is, to create the best partition of the decision variables into a collection of *m* mutually disjoint groups so that the variables belonging to each group have no interaction with the variables of another group.

To study the importance of the solution encoding in a genetic algorithm to solve the variable decomposition problem, we decided to develop a GGA with operators and parameters with similar features to the genetic algorithm DVIIC (proposed by Aguilar-Justo et al. [7]) so that the comparison is as fair as possible. The main difference between the two algorithms is the genetic encoding. The proposal of Aguilar-Justo et al. [7] includes an integer-based representation, where a chromosome has a fixed length that is equal to the number of variables, and each gene represents a variable and indicates the group where the variable is set. On the other hand, our GGA includes a group-based representation, where a chromosome can have a variable length, equal to the number of subcomponents, and each gene represents a subcomponent and indicates the variables that belong to this subset.

In Algorithm 1, we show the general steps of the GGA proposed in this work. The precise details are shown in the following subsections. The process begins by generating an initial population *P* of *pop*\_*size* individuals created by the population initialization strategy (Line 1). After that, each of the individuals in the population is evaluated, and the best solution for the population is obtained (Line 2). Then, we iterate through a *max*\_*gen* number of generations or until we find a value equal to zero in the decomposition evaluation. Within this cycle, the individuals to be crossed will be selected, and the offspring will be created through the grouping crossover operator (Lines 4–5). Similarly, the population is updated by the mutation of some individuals in the population (Lines 6–7). Finally, the population is evaluated again to update the population and the best global solution found so far (Lines 8–9).


In the following subsections, we detail the components and operators of our GGA.

#### *3.1. Genetic Encoding*

One of the most important decisions to make while implementing a genetic algorithm is to decide the representation to use to represent the solutions. It has been observed that improper representation can lead to poor performance of the GA. Our GGA works with group-based representation, which is the main characteristic of the GGAs.

Each individual in the population is represented by the groups of variables. Figure 1 shows an example of an individual that represents a problem with 10 variables numbered from 0 to 9 randomly assigned to four subcomponents or groups. The groups of variables (genes) according to this individual are the following: *grps*<sup>1</sup> = {*x*3, *x*6, *x*8}, *grps*<sup>2</sup> = {*x*1, *x*2, *x*7}, *grps*<sup>3</sup> = {*x*0, *x*4}, and *grps*<sup>4</sup> = {*x*5, *x*9}. Note that the number *V* of variables in each subcomponent is variable and can be between 1 and *D*; in addition, the number of subcomponents *m* is between 1 and *D*.

#### -- -- - -

**Figure 1.** Group-based chromosome, where each gene represents a subcomponent (set of variables).

#### *3.2. Decomposition Evaluation*

Each individual is evaluated to determine its fitness and to discover which one is the best within the population.

Sayed et al. [3] proposed a decomposition evaluation inspired by the definitions of problem separability [38] and partial separability [39]. The definition of problem separability states that a fully separable problem that has *D* variables can be written in the form of a linear combination of subproblems of the decision variables, where the evaluation of the complete problem, *F*(**x**), is the same as the aggregation of the evaluation of the subproblems, *f*(*xi*), which means *F*(**x**) = ∑*<sup>D</sup> <sup>i</sup>*=<sup>1</sup> *f*(*xi*). Additionally, a partially separable

problem is defined as one which has *D* variables and that can be decomposed into *m* subproblems, where the summation of all subproblems equals the solution of the complete problem *F*(*x*) such that *F*(**x**) = ∑*<sup>m</sup> <sup>k</sup>*=<sup>1</sup> *fk*(*xv*), *v* = [1 + *V* × (*k* − 1), *V* × *k*], where *m* is the number of subproblems and *V* is the number of variables in the *k*-th subproblem. Sayed et al. proposed to measure each decomposition of variables by minimizing the absolute difference between the full evaluation and the sum of each evaluated subgroup.

Algorithm 2 shows the decomposition evaluation procedure. First (in Line 1), we obtain *fitallc*<sup>1</sup> and *fitallc*<sup>2</sup> through the evaluations of Equations (3) and (4), where all the variables take the constant value *c*<sup>1</sup> and *c*2, respectively. After that, both evaluations are added and multiplied by the number of subgroups in the problem (*m*) to obtain *fitallc*1*c*<sup>2</sup> , and then we initialize *fitgrpsc*1*c*<sup>2</sup> as 0 (Lines 2–3). Afterwards, we start a loop from *k* = 1 to the number of subgroups *m* in the individual (Lines 4–10), Within this loop, we create two arrangements of *D* variables in which each variable belonging to group *k* takes the value *c*1, while the remaining variables take the value *c*<sup>2</sup> to evaluate this arrangement and obtain *fitgrpsk*,*c*<sup>1</sup> . On the other hand, to obtain *fitgrpsk*,*c*<sup>2</sup> , the variables in the *k*-th group take the value of *c*2, and the remaining ones take the value of *c*<sup>1</sup> (Lines 5–8) according to Equations (5) and (6). After that, we calculate *fitgrpsk*,*c*1*c*<sup>2</sup> as the sum of the previously calculated *fitgrpsk*,*c*<sup>1</sup> and *fitgrpsk*,*c*<sup>2</sup> (Line 9). Thus, to end the loop, we update *fitgrpsc*1*c*<sup>2</sup> as the sum of *fitgrpsc*1*c*<sup>2</sup> and *fitgrpsk*,*c*1*c*<sup>2</sup> . Finally, we obtain the evaluation of the decomposition by calculating the absolute difference *grpsdi f f* shown in Line 11.


$$fit\_{all\_{c\_2}} = Obj(\mathbf{x}) + cvs(\mathbf{x}), \ x\_i = c\_2, \forall i \in [1, D] \tag{4}$$

$$\mathbf{x}\_{k,c\_1} = \begin{cases} c\_1 & \forall x\_i \in grs\_k \\ c\_2 & \text{otherwise} \end{cases} \tag{5}$$

$$\mathbf{x}\_{k,c\_2} = \begin{cases} \ c\_2 & \forall x\_i \in \mathcal{g}rps\_k \\\ c\_1 & \text{otherwise} \end{cases} \tag{6}$$

To clarify the decomposition evaluation procedure, we present an example below. Let *Obj*(**x**) + *cvs*(**x**) = *f*(**x**) = *x*1*x*<sup>2</sup> + *x*3*x*<sup>4</sup> be the problem to decompose, and according to the arrangement decomposition given by *grps*<sup>1</sup> = {*x*1}, *grps*<sup>2</sup> = {*x*2, *x*4}, and *grps*<sup>3</sup> = {*x*3}, we have *<sup>m</sup>* = 3. Suppose *<sup>c</sup>*<sup>1</sup> = 1 and *<sup>c</sup>*<sup>2</sup> = 2. In the first step, we calculate *fitallc*<sup>1</sup> and *fitallc*<sup>2</sup> . According to Equations (3) and (4),

$$fit\_{all\_{c\_1}} = f(x\_i = c\_1) = 1 \ast 1 + 1 \ast 1 = 2$$

$$fit\_{all\_{c\_2}} = f(x\_i = c\_2) = 2 \ast 2 + 2 \ast 2 = 8$$

Then, continuing with step 2,

$$fit\_{all\_{c\_1c\_2}} = m \times [fit\_{all\_{c\_1}} + fit\_{all\_{c\_2}}] = \mathfrak{3} \times [\mathfrak{2} + \mathfrak{8}] = \mathfrak{30}$$

In step 3, we initialize *fitgrpsc*1*c*<sup>2</sup> = 0. Then, we start the for loop. At this point in the process, we have to create arrangement **x***k*,*c*<sup>1</sup> according to Equation (5) in step 5 and evaluate it in step 6. Similarly, the process is performed for *c*<sup>2</sup> in steps 7 and 8. To calculate *fitgrpsk*,*c*<sup>1</sup> the variables of the *k*-th group will be evaluated in *c*1, and the rest in *c*2; for *k* = 1, the group is *grps*<sup>1</sup> = {*x*1}, so *x*<sup>1</sup> = 1 and *x*2, *x*3, *x*<sup>4</sup> = 2. A similar calculation is performed with *fitgrpsk*,*c*<sup>2</sup> , but evaluating the variables of the *k*-th group in *c*<sup>2</sup> and the rest in *c*1. Therefore, following the steps into the loop,

For *k* = 1, *grps*<sup>1</sup> = {*x*1}:

$$\begin{aligned} fit\_{\mathcal{S}rps\_{k\_{\ell\_1}}} &= f\_{\mathcal{S}rps\_{k\_{\ell\_1}}} = 1 \ast 2 + 2 \ast 2 = 6 \\\\ fit\_{\mathcal{S}rps\_{k\_{\ell\_2}}} &= f\_{\mathcal{S}rps\_{k\_{\ell\_2}}} = 2 \ast 1 + 1 \ast 1 = 3 \\\\ fit\_{\mathcal{S}rps\_{k\_{\ell\_1}r\_2}} &= fit\_{\mathcal{S}rps\_{k\_{\ell\_1}}} + fit\_{\mathcal{S}rps\_{k\_{\ell\_2}}} = 6 + 3 = 9 \\\\ fit\_{\mathcal{S}rps\_{\ell\_1r\_2}} &= fit\_{\mathcal{S}rps\_{\ell\_1r\_2}} + fit\_{\mathcal{S}rps\_{k\_{\ell\_1}r\_2}} = 0 + 9 = 9 \\\\ \text{For } k = 2, \text{} \mathit{grps}\_2 &= \{x\_2, x\_4\}; \\\\ fit\_{\mathcal{S}rps\_{k\_{\ell\_1}}} &= f\_{\mathcal{S}rps\_{k\_{\ell\_1}}} = 2 \ast 1 + 2 \ast 1 = 4 \end{aligned}$$

$$\begin{aligned} \left. \right|\_{\mathcal{S}^{rpss\_1s\_2}} &= \left. \right|\_{\mathcal{S}^{rps\_1s\_2}} = \left. \right|\_{\mathcal{S}^{rps\_1s\_2}} = 1 \ast 2 + 1 \ast 2 = 4 \\\\ \left. \right|\_{\mathcal{S}^{rps\_ks\_{k\_1c\_2}}} &= \left. \right|\_{\mathcal{S}^{rps\_ks\_{k\_1c\_1}}} = \left. \right|\_{\mathcal{S}^{rps\_ks\_{k\_2c\_2}}} = 4 + 4 = 8 \\\\ \left. \right|\_{\mathcal{S}^{rps\_{k\_1c\_2}}} &= \left. \right|\_{\mathcal{S}^{rps\_{k\_1c\_2}}} + \left. \right|\_{\mathcal{S}^{rps\_kc\_{k\_1c\_2}}} = 9 + 8 = 17 \\\\ \text{For } k = 3, \text{g} \,\text{prs}\_3 &= \{x\_3\}: \\\\ \left. \right|\_{\mathcal{S}^{rps\_{k\_1c\_2}}} &= \left. \right|\_{\mathcal{S}^{rps\_{k\_2c\_2}}} = 2 \ast 2 + 1 \ast 2 = 6 \end{aligned}$$

$$\begin{aligned} \operatorname{fit}\_{\mathfrak{J}^{\mathfrak{r}ps\mathfrak{k}\_{\mathfrak{r}}}\_{\mathfrak{r}\_1}} &= f\_{\mathfrak{J}^{\mathfrak{r}ps\mathfrak{k}\_{\mathfrak{r}\_1}}} = 2 \star 2 + 1 \star 2 = 6 \\\\ \operatorname{fit}\_{\mathfrak{J}^{\mathfrak{r}ps\mathfrak{k}\_{\mathfrak{r}\_2}}} &= f\_{\mathfrak{J}^{\mathfrak{r}ps\mathfrak{k}\_{\mathfrak{r}\_2}}} = 1 \star 1 + 2 \star 1 = 3 \\\\ \operatorname{fit}\_{\mathfrak{J}^{\mathfrak{r}ps\mathfrak{k}\_{\mathfrak{k}\_{\mathfrak{r}\_1}}}} &= f \operatorname{it}\_{\mathfrak{J}^{\mathfrak{r}ps\mathfrak{k}\_{\mathfrak{k}\_1}}} + f \operatorname{it}\_{\mathfrak{J}^{\mathfrak{r}ps\mathfrak{k}\_{\mathfrak{k}\_{\mathfrak{r}\_2}}}} = 6 + 3 = 9 \\\\ \operatorname{fit}\_{\mathfrak{J}^{\mathfrak{r}ps\mathfrak{k}\_{\mathfrak{r}\_1}\mathfrak{k}\_2}} &= f \operatorname{it}\_{\mathfrak{J}^{\mathfrak{r}ps\mathfrak{k}\_{\mathfrak{k}\_{\mathfrak{r}\_1}}}} + f \operatorname{it}\_{\mathfrak{J}^{\mathfrak{r}ps\mathfrak{k}\_{\mathfrak{k}\_{\mathfrak{r}\_1}}}} = 17 + 9 = 26 \end{aligned}$$

Finally,

$$\text{grps}\_{diff} = |fit\_{all\_{c\_1c\_2}} - fit\_{\text{grps}\_{c\_1c\_2}}| = |30 - 26| = 4$$

The purpose of the problem decomposition is to create the best decomposition; that is, to create independent subcomponents, as well as to minimize the difference *grpsdi f f* . Therefore, we have adopted a similar evaluation to the one proposed by Aguilar-Justo et al. [7], which is defined next: to maximize the number of subproblems, the *grpsdi f f* is updated as follows: (1) if the number of subgroups is one, then the *grpsdi f f* takes an extreme greater value; (2) if the *grpsdi f f* is zero, this means that the decomposition is perfect, and it is rewarded by subtracting the number of subgroups (*m*) of the individual; (3) in another case, the *grpsdi f f* value does not change. Since the use of the previous evaluation function

benefits a decomposition with a high number of subcomponents, in some cases, a complete decomposition (as many groups as numbers of variables) would be presented as optimal when in reality it is not the case. For this reason, a modification in the evaluation function is necessary (avoiding the benefit to a high number of groups). Therefore, the evaluation function has been modified in our algorithm. This change is summarized in Equation (7).

$$\text{grps}\_{diff} = \begin{cases} \begin{array}{l} \text{inffinite} \\ \text{grps}\_{diff} \end{array} & \text{if } m = 1 \\ \text{grps}\_{diff} & \text{otherwise} \end{array} \tag{7}$$

#### *3.3. Population Initialization*

The initial population in most GGAs is generally generated by obtaining random partitions of the elements to group. In our GGA, to create a new chromosome, a random number between 1 and the dimension of the problem (*D*) is generated—i.e., *m* ∈ [1, *D*] which represents the number of subcomponents *m*, and then each variable is randomly assigned to one of these groups. First, we ensure that each group contains at least one variable, and this is done by shuffling the variables and assigning the first *m* of them to each group. After that, the remaining variables are randomly assigned to one of the created groups. This is done because in the genetic algorithm DVIIC [7], a random number is chosen to determine the number of groups, and each variable is randomly assigned to a group (under the integer-based representation).

#### *3.4. Grouping Crossover Operator*

After choosing the individuals that are subject to the crossover operator, each pair of these individuals, called parents, will create two new individuals (offspring) through a mating strategy. There are several crossover operators for GGAs; however, for comparison purposes, we have chosen the two-point crossover operator that is analogous to the one used in the genetic algorithm DVIIC [7]. This operator works as follows: two crossing points (*a* and *b*) between 1 and the number of genes in the individual minus one (*m* − 1) are selected randomly to define the crossing section of both parents (*P*<sup>1</sup> and *P*2). In this way, the first child (*C*1) is generated with a copy of *P*1, injecting and replacing the groups between the crossing points (*a* and *b*) of *P*2. Next, the groups copied from *P*<sup>1</sup> with duplicated items are identified, removing the groups and releasing the remaining variables (missing variables), among which are also those elements that were lost when eliminating the groups from the crossing section and were not in the inserted groups. It is important to note that the injected groups remain intact. Finally, the missing variables are re-inserted into a random number of new groups (between 1 and the number of missing variables) to form the complete individual. The second child (*C*2) is generated with the same process but changing the role of the parents.

In Figure 2, we can see an example of crossover for two individuals with 10 variables. The crossing points *a* and *b* are marked in step (1); then, in step (2), the section between *a* and *b* of parent *P*<sup>1</sup> is inserted and replaced in the other parent (*P*2) and vice versa. In step (3), we have the free variables that result from the groups eliminated for having repeated variables, such as, in the first child, the free element 8 that was in the group with 3 and 6, which were repeated elements, and the elements 2 and 4 that were lost variables (elements). Finally, in step (4), we have the offspring with the free elements re-inserted.

## *3.5. Grouping Mutation Operator*

The mutation operator used in the genetic algorithm DVIIC [7] is called uniform mutation, in which once an individual is selected to mutate, one of its genes is randomly selected and is changed from the group to which it belongs. Therefore, to take a similar operator, we have chosen to implement the group-oriented elimination mutation operator for GGAs. This operator works by eliminating a random group of the individual. Later, the deleted elements are re-inserted by adding a random number of groups between 1 and the number of free variables, with the variables randomly assigned to them (similar to how an individual is created). Figure 3 shows an example of the elimination operator. In step (1), the group marked in gray is the eliminated one; then, their elements pass to the free group of elements shown in step (2). Finally, the elements are re-inserted in step (3) with the aforementioned strategy.

**Figure 3.** Elimination operator.

#### *3.6. Selection and Replacement Strategies*

In a Genetic Algorithm, we have to select the members of the population that will be candidates for crossover and mutation. A selection scheme decides which individuals are allowed to pass on their genes to the next generation, either through cloning, crossover, or mutation. Generally, selection schemes from the literature can be classified into three classes: proportional selection, tournament selection, and ranking selection. Usually, the selection is according to the relative fitness using the best or random individuals [40,41].

Several strategies have been proposed for the parent selection (individuals for crossover). In our GGA, we use a selection scheme similar to that included in the genetic algorithm DVIIC [7], and we carry out a shuffling of the population; for each pair of parents, a random number between 0 and 1 is created. This number determines if the pair of individuals is subject to crossover. That is, the crossover of both individuals is applied when the number is less than or equal to *pc*.

In the same way, the selection of individuals to mutate has been studied, and there are various selection techniques for mutation. In this case, the selection method for mutation is similar to the selection method of the genetic algorithm DVIIC [7]. Given a mutation probability *pm*, for each individual in the population, a random number between 0 and 1 is generated, and when this number is less or equal than *pm*, the individual will be mutated.

In addition to the selection scheme, there must also be a criterion under which the population will be replaced in each generation. Generally, the replacement strategies can be split into three classes: age-based, fitness-based, and random-based (deleting the oldest, worst, or random individuals, respectively) [42]. Similar to the strategy of the genetic algorithm DVIIC [7], in our GGA, after crossover, the offspring replace the parents, and after the mutation, the mutated individuals replace the original ones. Elitism is adopted to always maintain the best solution of the population, replacing the worst individual of the new population.

#### **4. Experiments and Results**

In order to study the benefits of using a group-based against an integer encoding in a genetic algorithm, we compared our proposal with the decomposition strategy proposed by Aguilar-Justo et al. [31]. Therefore, we chose the same set of test functions the authors used. It is the first set for Large-Scale Constrained Optimization Problems and it was proposed by Sayed et al. in 2015 [3]. This test set has different separability complexity degrees, which are described in Table 1. It can be tested over three numbers of variables (100, 500, and 1000). These 18 functions were created by combining 6 objective functions with 1, 2, or 3 constraints. The 6 objective functions are based on 2 problems in the literature that have been used, for example, in the CEC 2008 benchmark problems [4], which are the Rosenbrock's function, which is multimodal and nonseparable, and the Sphere function, which is unimodal and separable. In addition, in Table 2, we can see the components of these 18 test functions; that is, the objective function and the constraints that make up each function. The details of the mathematical expression of each function can be consulted in the work of Sayed et al. [3].


**Table 1.** Characteristics of the objective functions and constraints.

We have compared the results of our GGA against the Dynamical Variable Interaction Identification Technique for Constrained Problems (DVIIC), in which Aguilar et al. [7] proposed a genetic algorithm for the decomposition of the 18 test functions. We computed 25 independent runs per each benchmark function, in 3 different numbers of variables (100, 500, and 1000). The parameters of our algorithm were set similarly as in the DVIIC work, to compare under equal conditions and perform the same number of function evaluations. These are as follows:


Such a configuration implies that the same number of evaluations is carried out by having 100 individuals in each generation for 100 generations, which is equal to 10,000 evaluations. These experiments were conducted on an Intel(R) Core(TM) i5 CPU with 2.50 GHz, Python 3.4, and Microsoft Windows 10.

In the following tables, we show the results of the execution of our proposed GGA and the genetic algorithm DVIIC. Both were executed for each of the 18 functions, 25 times in each dimension. Furthermore, each of the tables shows the results of the Wilcoxon Rank Sum test for each of the functions (column W). A checkmark (-) means that there are significant differences in favor of the GGA; in addition, an equality symbol (=) represents there are no significant differences between both algorithms.


**Table 2.** Components of the 18 test functions.

First, Table 3 contains the results according to the evaluation of the best individual for the 25 runs in the 18 functions under 100 variables. The best, median, and the standard deviation registered of the evaluation function (*grpsdi f f*) value are shown. We can observe that the GGA improves the decomposition evaluation function value in all of the cases compared to DVIIC. As we can see, unlike DVIIC, the GGA reaches the value of 0 in the best result in most cases. Furthermore, the median and standard deviation values obtained by the GGA are smaller in all cases. Such values equal to zero indicate that our algorithm found the best value for the evaluation function (*grpsdi f f* = 0) in the 25 runs for functions 1 to 12. Finally, the Wilcoxon Rank Sum Test reveals that there are significant differences in favor of the GGA in all cases.

Second, Table 4 shows the results of our algorithm to solve the same 18 functions, now with 500 variables. The GGA obtained the smallest values in most cases for the best, median, and standard deviation when compared against DVIIC. In a similar way as in Table 3, the standard deviation and median values equal to zero indicate that our algorithm found the best value for the evaluation function (*grpsdi f f* = 0) in the 25 runs for functions 1 to 12. Moreover, in the other test functions, the best, median, and standard deviation values are smaller when compared to DVIIC. On the other hand, the Wilcoxon Rank Sum test shows that there are no significant differences between the two algorithms in function 1 and shows significant differences in favor of the GGA in the remaining 17 functions. According to this test, in functions 2 to 18, the Wilcoxon Rank Sum test rejects the hypothesis that the DVIIC approach is as effective as the proposed GGA approach, and *F*<sup>1</sup> is trivial to solve by the two algorithms.


**Table 3.** Statistical results in dimension 100. Best results shown in boldface.

**Table 4.** Statistical results in dimension 500. Best results shown in boldface.


Finally, Table 5 contains the results for the 18 functions with both algorithms implemented on 1000 variables. In this experiment, we observed that, in a similar way to the experiment with 500 variables, our GGA obtained the smallest best, median, and standard deviation values in most cases. On the other hand, the behavior of the median and standard deviation values allows us to see that we obtained the zero value in the 25 independent runs for the first 12 functions. Moreover, the Wilcoxon Rank Sum test results show that there are no significant differences between the two algorithms in function 1 and indicate significant differences in favor of the GGA in the remaining 17 functions. In a similar way as in the results with 500 variables, the Wilcoxon Rank Sum test determines that *F*<sup>1</sup> is a trivial case and rejects the hypothesis that the DVIIC approach is as effective as the proposed GGA approach for the other 17 remaining functions.


**Table 5.** Statistical results in dimension 1000. Best results shown in boldface.

Given the previous tables, we observe that our algorithm presents better performance than DVIIC, obtaining better *grpsdi f f* values in all cases (in comparison with the mentioned algorithm). An interesting behavior is observed in these experiments; it seems to be more difficult for our algorithm to find the minimum decomposition evaluation in the 18 test functions as the dimension decreases. Zero best, median, and standard deviation values of the 25 independent runs indicate a stable behavior of our algorithm in each execution of the first 12 functions of the benchmark (in the three experiments). However, these values increase with the complexity of the functions, and in the end, functions 16, 17, and 18 do not reach the minimum in any of the experiments.

#### *Analyzing the Performance of the GGA*

Due to the behavior observed in the previous experiments, a detailed study of the algorithm is necessary to improve it in future work. For this reason, we decided to make a brief study of the convergence of our algorithm.

In order to understand the on-line behavior of our algorithm, we carried out some plots of the GGA convergence for the most difficult functions of the benchmark. Figure 4

shows the convergence in functions *F*16, *F*17, and *F*<sup>18</sup> through 100 generations for three dimension values.

**Figure 4.** Convergence plots of 100 generations for functions *F*16, *F*17, and *F*<sup>18</sup> with 100, 500, and 1000 variables.

Three convergence behaviors are shown in each of the graphs within Figure 4. First, we shown the convergence of the worst individual in the population—that is, the individual with the highest decomposition evaluation value (red color). After that, we show the behavior across the 100 generations of the average decomposition evaluation of the 100 individuals in the population (green color). Finally, we show the convergence across the 100 generations of the best individual in the population in terms of their decomposition evaluation value (blue color).

Figure 4a–c shows the convergence in the experiment with 100 variables from one of the 25 GGA runs. We can observe similar behavior in the three functions, with decomposition evaluation values below 4.0 × 107 in all three cases (best, worst, and average). It is important to note that the behavior of the best individual presents an early convergence in the three functions and how the decomposition evaluation of the worst individuals remains stable over the generations.

Regarding the convergence of the functions with 500 variables (Figure 4d–f), we can observe that function *F*<sup>18</sup> shows the highest values of the decomposition evaluation for the three values (best, worst, and average), unlike the other functions. Similar to those functions with 100 variables, we see that this value does not converge to zero in any of the cases, and the best individual has a quick convergence. We can also induce, according to the graphs, that several individuals of the population do not converge in the neighborhood of the best solution.

In the case of the functions evaluated with 1000 variables (Figure 4g–i), we see a fast convergence of the best individual in the population. As in the previous graphs, the value of the decomposition evaluation in the worst individual has a continuous behavior, without converging to zero during the 100 generations, and the best value has a quick convergence.

The above discussed best, worst, and average values' convergence behaviors in the functions with spliced nonseparable and overlapping variables suggest that the included strategies in the GGA do not appear to lead to better solutions. We can see from the plots in Figure 4 that not the entire population converges to the neighborhood of the best solution, due to the low selective pressure of the selection and replacement strategies. We also observe that the GGA produces good solutions in the early stages but leads to the premature convergence of the best individual. This behavior can be related to the crossover and mutation operators that promoted the creation of new groups, which does not seem to be a suitable strategy for nonseparable functions. All these observations indicate that, although our algorithm performs well, it can still be further improved by analyzing its components.

#### **5. Conclusions and Future Work**

In this paper, we have proposed a Grouping Genetic Algorithm (GGA) to deal with the decomposition of variables in Large-Scale Constrained Optimization Problems to create subproblems of the original problem and thus reduce the dimension. To evaluate the impact of the representation scheme on the performance of a genetic algorithm, our GGA was designed in a similar way to a state-of-the-art genetic algorithm that works with the decomposition of variables and which includes an integer-based representation. The main difference between the two algorithms was the genetic encoding. The experiments were carried out in a benchmark of 18 functions with different complexity characteristics, and these functions were tested in 100, 500, and 1000 dimensions.

The obtained results confirm that the use of a group-based genetic encoding allows our GGA to obtain good and robust decompositions on test functions with different features and separability complexity degrees, outperforming in all the benchmark functions the results obtained by a genetic algorithm with an integer-based encoding.

We are aware that there are still test functions with spliced nonseparable and overlapping variables that show a high degree of difficulty; for these functions, the included strategies in the GGA do not appear to lead to better solutions. However, the GGA presented in this work does not include the state-of-the-art grouping genetic operators.

Future work will consist of studying the parameters of the GGA as well as the effect of each of the methods used in the crossover and mutation operators to identify the best strategies that work together with the grouping encoding scheme and the features of the functions. Furthermore, it is necessary to implement an efficient reproduction technique with a balance in selective pressure and population diversity to avoid the premature convergence of the best individuals and increase the algorithm's performance.

The introduction of a new decomposition method opens up an interesting range of possibilities for future research. Currently, we are working on including our GGA in the decomposition step of two Cooperative Co-Evolution methods that include different strategies for the optimization and cooperation of the subcomponents, with the respective feasibility and computational complexity analysis.

Finally, although the set of test functions analyzed in this work is varied concerning the characteristics of the functions, we would explore the proposal in other Large-Scale Constrained Optimization benchmarks.

**Author Contributions:** Conceptualization, G.C.-A., M.Q.-C. and E.M.-M.; methodology, M.Q.-C.; software, G.C.-A.; validation, M.Q.-C. and E.M.-M.; formal analysis, M.Q.-C.; investigation, G.C.-A., M.Q.-C. and E.M.-M.; resources, G.C.-A.; writing—original draft preparation, G.C.-A.; writing review and editing, G.C.-A., M.Q.-C. and E.M.-M.; visualization, G.C.-A. and M.Q.-C.; supervision, M.Q.-C.; project administration, M.Q.-C. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **References**

