*Article* **Usage of Selected Swarm Intelligence Algorithms for Piecewise Linearization**

**Nicole Škorupová 1,\*,†, Petr Raunigr 2,† and Petr Bujok 2,†**

	- 70103 Ostrava, Czech Republic; petr.raunigr@osu.cz (P.R.); petr.bujok@osu.cz (P.B.)

**Abstract:** The paper introduces a new approach to enhance optimization algorithms when solving the piecewise linearization problem of a given function. Eight swarm intelligence algorithms were selected to be experimentally compared. The problem is represented by the calculation of the distance between the original function and the estimation from the piecewise linear function. Here, the piecewise linearization of 2D functions is studied. Each of the employed swarm intelligence algorithms is enhanced by a newly proposed automatic detection of the number of piecewise linear parts that determine the discretization points to calculate the distance between the original and piecewise linear function. The original algorithms and their enhanced variants are compared on several examples of piecewise linearization problems. The results show that the enhanced approach performs sufficiently better when it creates a very promising approximation of functions. Moreover, the degree of precision is slightly decreased by the focus on the speed of the optimization process.

**Keywords:** swarm intelligence algorithms; piecewise linearization; optimization; parameter tuning; approximation; experimental comparison

**MSC:** 68T20

#### **1. Introduction**

Linearization is one of the most powerful methods that deal with nonlinear systems. One of the most important factors in piecewise linearization is the number of linear segments. Piecewise linear functions are often used to approximate nonlinear functions, and the approximation itself is an important tool for many applications. This method can be found in many applications, for example, dynamical systems, nonlinear non-smooth optimization, nonlinear differential equations, fuzzy ordinary differential equations and partial differential equations, petroleum engineering, and medicine [1–5].

Various existing approaches attempt to find a piecewise linear approximation of a given function. Classical mathematical methods based on differentiable nonlinear functions have been introduced, for example, the Newton–Kantorovich iterative method, analytical linearization, forward–difference approximation, or center-difference approximation [6,7]. Other types of classical transforms or approximations, e.g., Laplace, Fourier, or integral, are used for the construction of approximation models [8]. Methods based on fuzzy theory are called fuzzy approximation methods, and the most known method is called a fuzzy transform [9].

In [10], the authors introduced linearization methods that used the large deviation principle, utilizing the Donsker–Varadhan entropy of a Gaussian measure and the relative entropy of two probability measures. In [1], the author presented an easy and general method for constructing and solving linearization problems. A spline algorithm to construct the approximant and the interior point method to solve the linearization problem was created. In [11], the Wiener models were composed of a linear dynamical system

**Citation:** Škorupová, N.; Raunigr, P.; Bujok, P. Usage of Selected Swarm Intelligence Algorithms for Piecewise Linearization. *Mathematics* **2022**, *10*, 808. https://doi.org/10.3390/ math10050808

Academic Editor: Jian Dong

Received: 30 January 2022 Accepted: 28 February 2022 Published: 3 March 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

together with a nonlinear static part. If the nonlinear part is invertible, the inverse function is approximated by a piecewise linear function estimated by the usage of the genetic algorithm and evolution strategy. The linear dynamic system part is estimated by the least square method.

In [12], the authors introduced a method to find the best piecewise linearization of nonlinear functions based on an optimization problem that is reduced to linear programming. Another algorithm that is used to find the optimal piecewise linearization for a predefined number of linear segments with particle swarm optimization (PSO), without the knowledge of the function and without ideal partitions, is introduced in [13]. Further, the authors of [14] introduced a genetic algorithm-based clustering approach to obtain the minimal piecewise linear approximation applied on nonlinear functions. The technique uses a trade-off between higher approximation accuracy and low complexity of the approximation by the least number of linearized sectors. Another piecewise linearization based on PSO is applied to piecewise area division; the control parameter optimization of the model was introduced in [15]. In [16], an effective algorithm to solve the stochastic resource allocation problem that designs piecewise linear and concave approximations of the recourse function of sample gradient information was introduced. In [17], the authors presented a range of piecewise linear models and algorithms that provided an approximation that fits well in their applications. The models involve piecewise linear functions using a constant maximum number of linear segments, border envelopes, strategies for continuity, and a generalization of the used models for stochastic functions.

We can already find some piecewise linearization problems solved by evolutionary algorithms, where specific kinds of functions or the number of piecewise linear parts are required. In this experiment, a kind of function is not restricted, and the number of linear segments does not need to be predefined. Nevertheless, only the 2D functions are used to be approximated in this experiment. Besides evolutionary algorithms, the traditional mathematical approaches were mentioned in this section to solve the piecewise linearization problems, however, these approaches will not be addressed in this paper, and a comparison with our methods will be mentioned in future work.

The aim of this paper is to contribute to the problem of piecewise linearization using popular swarm intelligence algorithms with automatic parameters tuning. The linearization of a given nonlinear function is an approximation problem leading to the determination of appropriate points. The goal is to find the best distribution of the points to minimize the distance between the original function and the approximated piecewise linear function. As there is no acceptable analytical solution of this optimization problem, using the stochastic-based (swarm intelligence) algorithms promises sufficient accuracy. Moreover, the selected swarm intelligence algorithms will be enhanced by the automatic parameter tuning approach and compared to provide an insight into the algorithm's performance.

The rest of the paper is arranged as follows. In Section 2, the basic terms used in this paper are introduced. In Section 3, swarm intelligence algorithms selected for the comparison are briefly described. In Section 4, the application of the original swarm intelligence algorithms on piecewise linearization with their parameters is proposed. Finally, in Section 5, the application of the newly proposed automatic parameter tuning of the swarm intelligence algorithms is introduced. A compendious discussion of the algorithm's efficiency and precision is assumed in Section 6.

#### **2. Preliminaries**

In this paper, we work with a real-value problem in a continuous search area. We consider the objective function *f*(*x*), where *x* = (*x*1, *x*2, ... , *x*-), - ∈ N is defined on the search domain *X* = [*a*, *b*]. The problems solved in a discrete search space could require some modifications of the presented methods. Through this paper, for the simplicity of the demonstrated method, the domain [0, 1] is used, but the problem can be easily reduced to the more general domain.

#### *2.1. Piecewise Linear Function*

Through this paper, piecewise linear functions are used, therefore the piecewise linear function should be defined.

A *piecewise linear function f* : [0, 1] → [0, 1] given by finite number of points (*xi*, *yi*) ∈ [0, 1] × [0, 1] for *i* = 1, ... , -, is a function *f* : [0, 1] → [0, 1] such that *x*<sup>1</sup> = 0, *x*- = 1, and *f*(*xi*) = *yi* for each *i* = 1, 2, ... , -, and *f* |[*xi*,*xi*+1] is linear for every *i* = 1, 2, ... , - − 1. Points *x* are called *turning points*. More precisely,

$$f(\mathbf{x}) = \begin{cases} y\_1 + (y\_2 - y\_1) \frac{(\mathbf{x} - \mathbf{x}\_1)}{(\mathbf{x}\_2 - \mathbf{x}\_1)}, & \mathbf{x}\_1 \le \mathbf{x} \le \mathbf{x}\_2, \\ y\_2 + (y\_3 - y\_2) \frac{(\mathbf{x} - \mathbf{x}\_2)}{(\mathbf{x}\_3 - \mathbf{x}\_2)}, & \mathbf{x}\_2 \le \mathbf{x} \le \mathbf{x}\_3, \\ \vdots \\ y\_{\ell - 1} + (y\_\ell - y\_{\ell - 1}) \frac{(\mathbf{x} - \mathbf{x}\_{\ell - 1})}{(\mathbf{x}\_\ell - \mathbf{x}\_{\ell - 1})}, & \mathbf{x}\_{\ell - 1} \le \mathbf{x} \le \mathbf{x}\_\ell. \end{cases}$$

#### *2.2. Metrics*

The difference between the original function and the approximated piecewise linear function is calculated with chosen metrics. In this paper, two different metrics are applied to achieve more complex results of compared methods.

Let (**x**, **y**) are vectors, where **x** = {*x*1, *x*2, ... , *xn*}, **y** = {*y*1, *y*2, ... , *yn*}. A *Manhattan metric* is a function *<sup>d</sup>*<sup>1</sup> : <sup>R</sup>*<sup>n</sup>* <sup>×</sup> <sup>R</sup>*<sup>n</sup>* <sup>→</sup> <sup>R</sup> in *<sup>n</sup>*-dimensional space, which gives the distance between vectors **x**, **y** by the sum of the line segments projection lengths between the points onto the coordinate system. More precisely,

$$d\_1(\mathbf{x}, \mathbf{y}) = \sum\_{i=1}^n |\mathbf{x\_i} - \mathbf{y\_i}|.$$

A metric defined on a vector space, induced by the supremum or uniform norm, where the two vectors distance is the biggest of the differences, is called *Maximum metric*. More precisely,

$$d\_2(\mathbf{x}, \mathbf{y}) = \max\_{\mathbf{i} = \mathbf{1}, \mathbf{2}, \dots, \mathbf{n}} |\mathbf{x\_i} - \mathbf{y\_i}|.$$

#### **3. Selected Swarm Intelligence Algorithms**

Now, the use of swarm algorithms for searching a global optima of an interval map *<sup>f</sup>* : [0, 1] → [0, 1] will be demonstrated. A population is represented as a finite set of *<sup>n</sup>* ∈ N randomly chosen points *x* ∈ [0, 1]. Further, the population moves in the domain with the help of the given algorithm strategy. The processes are mostly combined with the help of stochastic parameters, adapted towards the required solution. These algorithms stop after a certain number of iterations or under some predefined condition. The algorithms were selected based on the popularity of the methods measured by the frequency of their real applications and also based on our previous experiments [18,19].

#### *3.1. Swarm Intelligence Algorithms*

Swarm intelligence algorithms are stochastic algorithms from the group of evolutionary algorithms. These algorithms are used for solving global optimization problems that model the social behaviors of a group of individuals. The inspiration comes mostly from nature, especially from biological systems inspired by biological evolution, such as selection, crossover, and mutation. Most swarm intelligence in nature-based systems involve algorithms of ant colonies, bird flocking, hawk hunting, animal herding, fish schooling, and others. The difference between evolutionary algorithms and swarm intelligence is that there is an interaction of more candidate solutions, but they differ in the model between individuals. In swarm intelligence algorithms, there is also a group (population), but its move in the domain is followed by the group behavior rules of a given population. In this section, the swarm algorithms used in this manuscript will be introduced.

#### 3.1.1. Particle Swarm Optimization

A population in the PSO algorithm is composed of particles that move in a predefined search area according to evolutionary processes. In each step, several characteristics are computed and employed to illustrate how the particles are toward the solution [20–22].

The population is represented by a finite set of *<sup>n</sup>* ∈ N points *<sup>x</sup>* ∈ [0, 1] called *particles* is given randomly. Then, the population is evaluated, and each particle controls its movement in the search area according to its personal best position *pbest*, the best neighbour position **p***<sup>g</sup>* and with the stochastic parameters (acceleration coefficients Φ1, Φ2, constriction factor *χ*). There exist several variants of the PSO algorithm. In this paper, the original PSO algorithm from [21] with the modification by constriction factor *χ* is used. Parameter *χ* does not change during the algorithm's run and it has a restrictive impact on the result. When the PSO algorithm runs without restraining velocities, it can rapidly increase to unacceptable levels within a few iterations. The elements *U*Φ<sup>1</sup> , *U*Φ<sup>2</sup> are represented by random points from a uniformly distributed intervals [0, <sup>Φ</sup>1], [0, <sup>Φ</sup>2], where <sup>Φ</sup>1, <sup>Φ</sup><sup>2</sup> ∈ R. At first, *pbesti* and *f*(**x***i*) are compared, and if *pbesti* ≤ *f*(**x***i*), then **p***<sup>i</sup>* = **x***<sup>i</sup>* and to the value of *pbesti* is saved a value of the function (*pbesti* = *f*(**p***i*)). In the next step, it finds the best neighbour **p***<sup>g</sup>* of *i*-th position and assign it *j*-th position, if *f*(**p***g*) ≥ *f*(**x***j*), then **p***<sup>g</sup>* = **x***<sup>j</sup>* and *f*(**p***g*) = *f*(**x***j*). The main part of the calculation consists of computing the velocity and updating the new particle positions which are given by the following formulas:

$$\mathbf{v}\_{i} = \chi(\mathbf{v}\_{i} + \mathbf{U}\_{\Phi\_{1}}(\mathbf{p}\_{i} - \mathbf{x}\_{i}) + \mathbf{U}\_{\Phi\_{2}}(\mathbf{p}\_{\mathcal{S}} - \mathbf{x}\_{i}))\_{\prime},$$

$$\mathbf{x}\_{i} = \mathbf{x}\_{i} + \mathbf{v}\_{i}.$$

#### 3.1.2. Self-Organizing Migrating Algorithm

A self-organizing migrating algorithm (SOMA) is a simple model of the hunting pack. The individuals move across the domain, such that each individual in each migration round goes straight to the leader of the pack, checks the place of each jump, and remembers the best-found place for the next migration round. SOMA has several strategies, we use the' AllToOne' strategy, where all individuals move towards the best one from the population. Each individual *xi* ∈ [0, 1] is evaluated by the fitness, and the one with the highest fitness is chosen as a leader for the loop. Then the rest of the individuals jump towards the leader and each of them is evaluated by the cost function after each jump. SOMA has three numerical parameters defining a way of moving an individual behind the leader. These parameters are the relative size of the skipping leader, the size of each jump, and the parameter which determines the direction of movement of the individual behind the leader. The jumping approach continues until a new individual-position restricted by *PathLength* is reached. The new individual position at the end of each jump is determined by

$$\mathbf{x}\_{i}^{ML+1} = \mathbf{x}\_{i,start}^{ML} + (\mathbf{x}\_{L}^{ML} - \mathbf{x}\_{i,start}^{ML}) \mathbf{SetpSize} \cdot \text{PRTVector}.$$

Then each individual moves toward the best position on its jumping trajectory. Before an individual continues to jump towards the leader, a set of random numbers from the interval [0, 1] is generated, and each member of this set is compared with *PRT*, where *PRT* ∈ [0, 1]. If the generated random number is greater than *PRT*, the corresponding *i*th coordinate will be taken from the new position (*PRTVectorj* = 1) otherwise it will be taken from the original individuals position (*PRTVectorj* = 0). During the process, each individual attempts to find its best position and the best position from all individuals [23].

#### 3.1.3. Cuckoo Search

The Cuckoo search (CS) algorithm is inspired by brood parasitism of cuckoos which give eggs in the nests of the host birds. The host bird throws the egg away from the nest or abandons the nest whereas build a new nest by the fraction *pa*. The idea of the algorithm is to create new and better solutions (cuckoos) that replace worse solutions from the nests. Each egg represents one solution and a cuckoo egg gives a new possible solution. In our case, we use the easiest form of the algorithm when each nest has one egg.

Cuckoo search uses the Mantegna Lévy flight *Lévy*(*β*), which is given by the following equation: *step* = *u*/|*v*| 1/*β*. The parameter *β* is taken from the interval [0.3, 2), parameters *u*, *v* are normally distributed stochastic variables and *u* is calculated as *u* · *σ*, where *σ* is the standard deviation. The main part of the algorithm is the application of Lévy flights and random walks in the equation that generates new solutions:

$$\mathbf{x}\_{i}^{t+1} = \mathbf{x}\_{i}^{t} + \mathfrak{a}L\acute{e}vy(\beta).$$

The parameter *α* > 0 is the step size, and mostly, the value *α* = 1 can be used. The total number of possible host nests is constant, where the probability that the host bird discovers a cuckoo's egg is *pa* ∈ [0, 1] [24].

#### 3.1.4. Firefly Algorithm

Firefly algorithm (FFL) is inspired by the flashing behavior of fireflies that produce light at night. Fireflies are unisexual; therefore, they are attracted to each other no matter their sex. A new generation of fireflies is given by the random walk and their attraction. Fireflies can communicate with their light intensity that informs the swarm about its features as species, location, and attractiveness. Between any two fireflies *i* and *j* the Euclidean distance *r*(*i*, *j*) at positions **x***<sup>i</sup>* and **x***<sup>j</sup>* is defined. The attractiveness function of a firefly *j* should be selected as any monotonically decreasing function with a distance to the selected firefly defined as *β* = *β*<sup>0</sup> · *e* <sup>−</sup>*γ*·*r*<sup>2</sup> *ij* , where *rij* is the distance, *β*<sup>0</sup> is the initial attractiveness at a distance *rij* = 0, and *γ* represents an absorption coefficient characterizing the variation of the attractiveness value. The movement of each *i*th firefly attracted by a more firefly *j* with higher attractiveness is given by the equation

$$\mathbf{x}\_{i}^{t+1} = \mathbf{x}\_{i}^{t} + \beta(\mathbf{x}\_{j}^{t} - \mathbf{x}\_{i}^{t}) + \alpha(\sigma - 0.5).$$

The first component represents the *i*th firefly position, the second part enables the model of the attractiveness of the firefly, and the last part is randomization, with *α* ∈ [0, 1] represented by the problem of interest. The parameter *σ* represents a scaling factor that determines the distance of visibility, and mostly *σ* ∈ [0, 1] that is given by a random uniform distribution in the space can be used [25].

#### 3.1.5. Grey Wolf Optimizer

Grey wolf optimizer (GWO) is inspired by wolves living in a pack. In the mathematical model of the wolves' social hierarchy, the best solution is represented by *α*, the second-best solution is *β*, and *δ* represents the third-best solution. The rest of the candidate solutions are in a group *ω*. The optimization in this algorithm is guided by the best three wolves *α*, *β*, *δ*, and the wolves in *ω* follow these three wolves. The model is given by the following equations:

$$\mathbf{d} = |\mathbf{c} \cdot \mathbf{x}\_P^{(t)} - \mathbf{x}^{(t)}|$$

and

$$\mathbf{x}^{(t+1)} = \mathbf{x}\_p^{(t)} - \mathbf{a} \cdot \mathbf{d}\_{\prime t}$$

where *t* is the current iteration, variables **a** and **c** are coefficient vectors, **x***<sup>p</sup>* is the prey's position vector, and **x** represents the position vector of a grey wolf. The vectors **a** and **c** are determined as follows: **a** = 2*ar*<sup>1</sup> − *a*, **c** = 2*r*2. Components of **a** are linearly decreased from 2 to 0 over the course of iterations by the formula 2 <sup>−</sup> (<sup>2</sup> *FES maxFES* ), where *maxFES* is a total count of fitness value evaluations, and *r*1,*r*<sup>2</sup> are random values from interval [0, 1] [26].

#### 3.1.6. Artificial Bee Colony

The artificial bee colony (ABC) is inspired by the foraging behavior of honey bees, and it employs three types of bees: employed foraging bees, onlookers, and food sources. Employed foraging bees and onlookers search for food sources, and for one food source equals to one employed bee. It means the number of employed bees is the same as the number of food places around the hive. The algorithm randomly places a population of initial vectors, which is iteratively improved. The possible solution is represented by the position of the food, and the food source gives the quality (fitness) of a given problem [27]. The ABC algorithm is quite simple because it uses only three control parameters that should be determined (size of the population, limit of scout *L*, dimension of the problem).

The new solution is given by the following formula

$$\mathbf{v}\_{i} = \mathbf{x}\_{i} + \phi\_{i}(\mathbf{x}\_{i} - \mathbf{x}\_{k})\_{\prime}$$

where *k* and *j* are randomly selected indexes and *φ* is random number from the range [−1, 1]. Then, it computes the probability value *p* for the solutions *x* with the help of the fitness value. The next step is to produce and evaluate new onlookers solutions **v***i*, which are based on the solutions **x***<sup>i</sup>* that depends on **p***i*.

#### 3.1.7. Bat-Inspired Algorithm

The inspiration of the bat-inspired algorithm (BIA) comes from the echolocation behavior of microbats, which use varying pulse rates controlled by emission and loudness. Each bat flies randomly with a given velocity, and it has its position with a varying frequency or wavelength and loudness. All bats use echolocation; thus, they know the distance and the difference between food.

The population of bats is placed randomly, and after that, they fly randomly with a given velocity **v***<sup>i</sup>* to the position **x***<sup>i</sup>* with a given frequency *fmin*, changing wavelength *λ*, and loudness parameter *A*0. The bats automatically adjust the proper wavelength of the emitted pulses, and also the pulse emission rate *r* ∈ [0, 1]. The loudness can vary, for example, between *A*<sup>0</sup> and a minimum value *Amin*. The frequency *f* is in a range [ *fmin*, *fmax*] and it corresponds to the range of wavelengths [*λmin*, *λmax*]. Here, the wavelengths are not used, instead, the frequency varies whereas the wavelength *λ* is fixed. This is caused by the relation between *λ* and *f* , where *λ* · *f* is constant. For simplicity, the frequency is set from *f* ∈ [0, *fmax*]. It is clear that higher frequencies give short wavelengths and provide a shorter distance. The rate of the pulse can be in the interval [0, 1], where 0 denotes no pulses, and 1 marks the maximum rate of pulse emission. The new solutions **x***<sup>t</sup> <sup>i</sup>* and velocities **<sup>v</sup>***<sup>t</sup> <sup>i</sup>* at current time *t* are given by

$$f\_i = f\_{\min} + (f\_{\max} - f\_{\min})\beta\_{\prime\prime}$$

$$\mathbf{v}\_i^t = \mathbf{v}\_i^{t-1} + (\mathbf{x}\_i^t - \mathbf{x}\_{best})f\_{i\prime}$$

$$\mathbf{x}\_i^t = \mathbf{x}\_i^{t-1} + \mathbf{v}\_{i\prime}^t$$

where *β* ∈ [0, 1] represents a uniformly distributed random vector, and value **x***best* demonstrates the current global best position detected after comparing all bat-solutions. The local search strategy generates a new solutions for each bat using random walk **x***new* = **x***old* + **A***<sup>t</sup>* , where <sup>∈</sup> [−1, 1] is a random number, while **<sup>A</sup>***<sup>t</sup>* <sup>=</sup> {*A<sup>t</sup> i* } is the mean loudness of whole bats population in the current time step. The loudness parameter **A***<sup>i</sup>* and the pulse emission rate *ri* are updated during the iterations proceed. Now we have *At*+<sup>1</sup> *<sup>i</sup>* = *<sup>α</sup>A<sup>t</sup> i* , *rt*+<sup>1</sup> *<sup>i</sup>* = *<sup>r</sup>*<sup>0</sup> *<sup>i</sup>* [<sup>1</sup> <sup>−</sup> *<sup>e</sup>*−*γ<sup>t</sup>* ], where *α* and *γ* are constants. For any 0 < *α* < 1 and *γ* > 0, we have *A<sup>t</sup> <sup>i</sup>* <sup>→</sup> 0,*r<sup>t</sup> <sup>i</sup>* <sup>→</sup> *<sup>r</sup>*<sup>0</sup> *<sup>i</sup>* , as *t* → ∞ [28].

#### 3.1.8. Tree-Seed Algorithm

The tree-seed algorithm (TSA) is based on the relationship observed between trees and seeds, where seeds gradually grow, and new trees are created from them. The trees' surface is represented by a search area, and the tree and seed locations are mentioned as possible solutions of the optimization problem. It employs two peculiar parameters as the total number of trees and the seed production. The main and important problem is to obtain the seed location produced from a tree. The first equation finds the tree location used for the production of the seed, whereas the second employs the locations of two different trees to produce a new seed for the tree:

$$s\_{i,j} = t\_{i,j} + \alpha\_{i,j} \times (b\_j - t\_{r,j})\_r$$

$$s\_{i,j} = t\_{i,j} + \alpha\_{i,j} \times (t\_{i,j} - t\_{r,j})\_r$$

where *si*,*<sup>j</sup>* is *j*th dimension of *i*th seed position to produce *i*th tree and *ti*,*<sup>j</sup>* is the *j*th dimension of the *i*th tree, *bj* represents the *j*th dimension of the best tree, where *b* is computed as *b* = min{ *f*(*ti*)}, the *j*th dimension of *r*th tree *tr*,*<sup>j</sup>* is selected from the population randomly. The scaling factor *α* is produced randomly from [−1, 1], *i* and *r* are different indices.

First, the initial tree locations that give us trial solutions of the optimization problem are designed by using:

$$t\_{i,j} = l\_{j,min} + r\_{i,j}(h\_{j,max} - l\_{j,min})$$

where, *lj*,*min* represents the lower bound of the search area, *hj*,*max* denotes the upper bound of the search area, and *ri*,*<sup>j</sup>* ∈ [0, 1] is a uniformly distributed random number. The best solution is selected from the population using *b* where *n* represents the size of the trees population. The number of seeds can be higher than the number of trees [29].

#### 3.1.9. Random Search

Random search (RS) is the simplest stochastic algorithm for global optimization, which was proposed by Rastrigin in 1963. In every iteration, it generates a new point from the uniform distribution in the search area. Then, the function value of this point is compared with the best point found so far. If the new trial point is better, it replaces the old best point. There is not used any learning mechanism or exploitation of knowledge from the previous search [30]. This algorithm does not belong to the group of swarm algorithms, but because it can have fast and good convergence to the given solution, it can be used as a comparing algorithm.

#### **4. Piecewise Linearization Using Swarm Intelligence Algorithms**

Our implementation of the swarm algorithms consists of searching for a linearization *lf* (the piecewise linear function definition is above) of a fixed interval map *f* : [0, 1] → [0, 1]. To allocate a suitable solution, the optimization function (objective function) is represented by a distance function given by the metric between *f* and its linearization *lf* . Every possible linearization is represented by a finite number of points (- ∈ N), every population contains *n* particles (--dimensional vectors), where the stochastic parameters are adapted.

In this section, we introduce the testing functions, provide the setting of the algorithm's parameters that can be used for the problem of linearization, and we look at which algorithm can give us the best results.

#### *4.1. Test Functions*

For testing, we chose continuous functions *f* : [0, 1] → [0, 1], where for simplicity, we work only in the space [0, 1]. These functions were chosen from the most basic to the complicated ones (see Figure 1), to demonstrate the algorithm behavior on different levels of functions. The functions are given by the following formulas:

$$f\_1(\mathbf{x}) = 4\mathbf{x} - 4\mathbf{x}^2 \tag{1}$$

$$f\_2(x) = \frac{1}{2} (\sin\left(\frac{\frac{3}{2}}{x + \frac{1}{10}}\right) + 1) \tag{2}$$

$$f\_3(\mathbf{x}) = \frac{1}{25} (\sin 20\mathbf{x} + 20\mathbf{x} \cdot \sin 20\mathbf{x} \cdot \cos 20\mathbf{x}) + \frac{1}{2} \tag{3}$$

$$\begin{aligned} f\_4(\mathbf{x}) &= 0.9 + (-1 + \mathbf{x})(0.9 + (-0.16 + (5.4 + (-27 + \mathbf{x})))) \\ &\quad \left(36 + (510 + (-120 - 2560(-0.9 + \mathbf{x}))(-0.1 + \mathbf{x}))\right) \\ &\quad (-0.6 + \mathbf{x}))(-0.2 + \mathbf{x}))(-0.8 + \mathbf{x}))(-0.4 + \mathbf{x})) \mathbf{x}) \end{aligned} \tag{4}$$

$$f\_5(\mathbf{x}) = \sin\left(\frac{\frac{3}{2}}{\mathbf{x} + \frac{13}{200}}\right) + 1\tag{5}$$

$$f\_6(x) = \left(x - \frac{1}{2}\right) \sin\left(\frac{1}{x - \frac{1}{2}}\right) + \frac{1}{2} \tag{6}$$

**Figure 1.** Graphs of the functions *f*1, *f*2, *f*3, *f*4, *f*5, and *f*6.

#### *4.2. Parameter Settings of the Chosen Algorithms*

The proper choice of parameters can have a large influence on the optimization performance and, therefore, for each algorithm, in comparison, we tested which parameters were suitable for our problem, in regard to searching for the best possible linearization. In this subsection, we will discuss the setting of parameters for the algorithms introduced in Section 3.1. Each algorithm proceeded 50 times with a fixed dimension - = 16 and also a fixed population size set to *n* = 25. Each algorithm runs as long as it takes to find a solution with a good enough *fitness*\_*value*, but it has to find it before *maxFES* = 10,000. This *fitness\_value* threshold was set to value 0.2.

In PSO, the setting of acceleration coefficients Φ1, Φ<sup>2</sup> and constriction factor *χ* should be set. Parameter Φ<sup>1</sup> controls the importance of the particle's personal best value, whereas the importance of the neighbor best value is controlled by parameter Φ2. The algorithm can be unstable when these parameters are too high because the velocity can grow up faster. The equation Φ = Φ<sup>1</sup> + Φ2, where Φ > 4, should be satisfied and the authors of the algorithm recommended Φ1, Φ<sup>2</sup> set to 2.05. Parameter *χ* has a restrictive effect on the result, and it does not change in time. In the original version, PSO has *χ* = <sup>2</sup> (Φ−2+ √ <sup>Φ</sup>2−4Φ) .

In SOMA, there are a few parameters in which the settings should be considered and tested. Parameter *PathLength* ∈ (1, 5] is a parameter that defines how far an individual stops behind the leader. The *StepSize* is from the interval (0, *PathLength*] and step is from the interval (0, 1]. One of the most sensitive parameters is *PRT* ∈ [0, 1], which represents the perturbation and decides if an individual will travel towards the leader directly or not [23].

In the original version of the cuckoo search algorithm, parameter *β* is taken from the interval [0.3, 2). The step size *α* > 0 is dependent on the scales of the problem, and mostly, the value *α* = 1 is used. The total number of possible host nests is restricted, where the probability that the host bird discovers a cuckoo's egg is *pa* ∈ [0, 1].

In the firefly algorithm, the parameter *β*<sup>0</sup> = 1 is the initial attractiveness for a distance *rij* = 0. Parameter of *γ* represents an absorption coefficient that characterizes the variation of the attractiveness value of a firefly. If *β*<sup>0</sup> = 0, it becomes a simple random walk.

All parameters used in the grey wolf optimizer are given by random values, so there is no need for more detailed parameter testing.

The control parameters in the ABC algorithm, which should be set, are the size of the population *CS* and the limit for scout (*L* = (*CS*·*D*) <sup>2</sup> , where *D* is the dimension of the problem).

In the original version of the bat-inspired algorithm, the values *fmin* = 0 and *fmax* = 100 depends on the size of the search area dimension. Each bat has a randomly assigned frequency with the uniform distribution of [ *fmin*, *fmax*]. For simplicity, it can be used *α* = *γ*, and in the original version author used *α* = *γ* = 0.9, but for our case, we had to do experimental testing. For each bat individual, different values of loudness and pulse emission rate are recommended, based on randomization. In the tree-seed algorithm, importance is given to the selection of an equation that will produce a new seed location. Control parameter *ST* ∈ [0, 1], called search tendency, is used for the selection. The seed number of each tree is determined randomly, and it should not be less than 1. The recommended number of randomly generated seeds is between 10 and 25% of the number of trees.

For a better overview of the algorithm configuration, the settings of the numerical parameters are assumed in Table 1. The Optimal Interval column presents the intervals containing the achieved acceptable values of the parameters. In the Chosen Value column, the final values of the parameters are presented.


**Table 1.** Parameters setting.

#### *4.3. Examples of Piecewise Linearization*

In this subsection, we will demonstrate an example of linearization for Function *f*<sup>5</sup> given by all chosen evolutionary algorithms. To demonstrate how the algorithm works, the graph of results for Function *f*<sup>5</sup> of each evolutionary algorithm is presented (Figure 2). Each evolutionary algorithm has set its input parameters in accordance with Section 4.2.

**Figure 2.** Each graph represents the best result of a specific algorithm. The total number of points was set to 16 and *maxFES* = 10,000.

#### *4.4. Summary of Results*

In this subsection, we will introduce a set of tables showing the results of each evolutionary algorithm, and to make the results clearer, we chose to use only the four testing functions introduced in Section 4.1.

The following tables (Tables 2–5) show results of min, max, mean, median, and standard deviation values computed from 50 independent runs for each function. The following functions run with the same settings of parameters as was introduced in Section 4.2. Figures 3 and 4 show graphs of distance convergence means for a selected *maxFES* where checkpoints are taken each 100th iteration.


**Table 2.** Comparison of algorithms for Function *f*1.

**Table 3.** Comparison of algorithms for Function *f*3.


**Table 4.** Comparison of algorithms for Function *f*4.


**Table 5.** Comparison of algorithms for Function *f*5.


**Figure 3.** Graphs of distance convergence means for *maxFES* = 2500.

**Figure 4.** Graphs of distance convergence means for *maxFES* = 10,000.

From the results (see Tables 2–5) it is obvious that there is simply no evolutionary algorithm suitable to use for linearization of all functions. Results show that for the function *f*<sup>1</sup> the best algorithm is CS. On the other hand, *f*<sup>3</sup> and *f*<sup>4</sup> are best handled by the ABC algorithm. In the case of *f*5, there is no clear which method performs the best.

If we disregard the best values for each tested function, other evolutionary algorithms can get the job done with sufficient results. We are talking mainly about PSO, SOMA, and TSA algorithms, which have results across all tested functions similar to the best results. It means that we have some degree of freedom to choose which evolutionary algorithm to use.

#### **5. Piecewise Linearization Using Swarm Intelligence Algorithms with Automatic Parameters Tuning**

In this section, we introduce an algorithm that is used for automatic detection of points used for linearization, and automatic detection of discretization points *discr*\_*step* representing the length between equidistant points. This algorithm tries to find the number of input points and a set of equidistant points given by a discretization step *disc*\_*step* to achieve the best possible solution evaluated based on an algorithm output fitness function value.

The main goal of this algorithm is to run a total number of six evaluations of an evolutionary algorithm in every loop iteration using parallel computing as long as it is needed to find an optimal solution. We chose to use this approach because we need to try several combinations of and *discr*\_*step* in each iteration.

In the beginning, our algorithm sets the input parameters for an evolutionary algorithm and default values for - = 10 and *discr*\_*step* = 1/100. Then, the loop starts where every iteration consists of setting up six different *discr*\_*step* values and six different - values, and each of the six parallel runs takes one *discr*\_*step* and one -. The six and *discr*\_*step*, where *<sup>i</sup>*−<sup>1</sup> and *discr*\_*stepi*−<sup>1</sup> is a and *discr*\_*step* of the best result from the previous iteration, can be seen in the Table 6.


**Table 6.** Six variants of and *discr*\_*step*.

Each of these six parallel runs is processed, and then their results are evaluated. This evaluation is done by computing the linearization provided *final*\_*points* of all six results given as an output of an evolutionary algorithm run and a fixed *discr*\_*step* = 1/200. We need to ensure that all six results are evaluated using the same evaluation criterion, which is done by setting up *discr*\_*step* the same for all results. Based on this evaluation, it gives the results *final*\_*distance* computed with Manhattan metric, and it keeps the result with the smallest *final*\_*distance* value. The *distance* value serves as a *fitness*\_*value* for all selected evolutionary algorithms. The best results *discr*\_*step*, *final*\_*distance*, *final*\_*points*, and are saved to be used in the next iteration.

Next, we examine all linear segments created from *final*\_*points*. This examination consists of creating three equidistant points on a linear segment and calculating the distance of these points from the initial function. We always take the output of the maximum metric from all points of all linear segments. Linear segments whose slope value is too high and a maximum metric value of their equidistant points are under the set threshold are ignored. It is because they tend to get a high maximum metric output value even when these linear

segments sufficiently overlap with the initial function part. The maximum of all linear segments illustrates whether the linearization is close enough to the real function or not.

At the end of every iteration, it checks whether the *fitness*\_*value* value, and a maximum metric value are good enough. Thus, we check whether we should continue with the next iteration.

There is a special case when we set -1,...,6 = *<sup>i</sup>* and *discr*\_*step*1,...,6 = *discr*\_*stepi*−<sup>1</sup> before we process parallel runs. This special case occurs when the best result from the previous iteration has a good enough *fitness*\_*value* (but not a good enough maximum metric value). The idea is that the previous iteration's best result was almost the optimal result, but the algorithm placed the final points a little off. Thus, for the next three iterations, we take all six parallel runs and use them to determine if the current *discr*\_*step* and are sufficient to get an optimal solution.

It also keeps *final*\_*points* of the best result and provides them to all six evolutionary algorithms run in parallel in the next iteration. This approach enables speeding-up the process of finding the optimal result by enabling an evolutionary algorithm in the next iteration to start from a position where the best result of the previous iteration ended up. The only scenario where we do not provide *final*\_*points* is when the special case is triggered because we do not want to influence the parallel runs with the previous result because it was close but not close enough. Pseudocode of parameter tuning approach is in Algorithm 1.

*5.1. Examples of Tuning Algorithm*

In this subsection, the evolutionary algorithms selected in Section 3 will be applied to the algorithm introduced in Section 5 on Function *f*5. Therefore, two sets of results were achieved, where the first one was achieved for *maxFES* = 2500 and the second for *maxFES* = 10,000. In both sets of results, each evolutionary algorithm optimal result should have a *final*\_*distance* calculated with Manhattan distance to be equal or better than 2.0. All evolutionary algorithms use input parameter values from Section 4.2. Each experiment is executed 50 times in Python 3.8 on a computer with the CPU: AMD 2920X, RAM: 32 GB DDR4, GPU: AMD RX VEGA64. The time complexity of the compared algorithms of each run is estimated in seconds.

**Example 1.** *This example consists of the results of selected evolutionary algorithms with a value of maxFES* = 2500*. The graph of results for Function f*<sup>5</sup> *approximation of each algorithm is presented (Figure 5). The best result of each parallel run is saved as a checkpoint result. Then, we take the best run out of all 50 runs and illustrate its checkpoint results in graphs in Figure 6 and Table 7.*

**Figure 5.** Graphs of algorithms with a value of *maxFES* = 2500.

**Figure 6.** *Cont.*

**Figure 6.** Checkpoints of each algorithm automatic parameter tuning with *maxFES* = 2500. Red lines are values of *final*\_*distance* and blue ones show the maximum value of Manhattan distance. Grey dashed lines show values thresholds.

**Table 7.** The best results of evolutionary algorithms with a value of *maxFES* = 2500.


**Example 2.** *The second example consists of results with maxFES* = *10,000. The graph of results for Function f*<sup>5</sup> *approximation of each algorithm is presented (Figure 7). The best result of each parallel run is saved as a checkpoint result. Then, we take the best run out of all 50 runs and illustrate its checkpoint results in graphs in Figure 8 and Table 8.*

**Figure 7.** Graphs of algorithms with a value of *maxFES* = 10,000.

**Figure 8.** Checkpoints of each algorithm automatic parameter tuning with *maxFES* = 10,000. Red lines are values of *final*\_*distance* and blue ones show the maximum value of Manhattan distance. Grey dashed lines show values thresholds.


**Table 8.** The best results of evolutionary algorithms with a value of *maxFES* = 10,000.

#### *5.2. Comparison Results Summary*

The comparison of the algorithms mentioned in this section was done on all testing functions from Section 4.1. For demonstration, we chose only four functions, to not overwhelm readers with tables (see Functions *f*1, *f*3, *f*4, and *f*5).

We calculated the values of arithmetic mean of -, *discr*\_*step*, *final*\_*distance*, and *time* what has estimated time complexity measured in seconds from 50 runs for each testing function and each evolutionary algorithm. All 50 runs were calculated for *maxFES* = 2500 and *maxFES* = 10,000. Results show that the *maxFES* = 10,000 variant offers almost the same results as *maxFES* = 2500 variant or there is a slight decrease in a total number of points but at the expense of increasing time.

The most important value is always *final*\_*distance*, but we cannot decide which algorithm is the best one based only on this value. In general, we need to find the balance between *final*\_*distance* and the time it takes to achieve this value where time is affected by -(the higher, the worse) and *discr*\_*step* (the lower, the worse). We even have to consider a situation when *final*\_*distance* is good enough, but that is due to bad linearization, which results in taking longer to accomplish an optimal result. It also means that an evolutionary algorithm will need more iterations to get to this optimal result and, thus, it will have more points -, which improves *final*\_*distance*, which takes more time. Finally, *final*\_*distance* is so good thanks to the evolutionary algorithm's inability to create a good linearization. In some cases, it is recommended to favor an evolutionary algorithm with a worse *final*\_*distance* but with significantly better -, *discr*\_*step*, and *time*.

In Table 9, the best three algorithms for the testing functions *f*1, *f*3, *f*4, and *f*<sup>5</sup> are presented.


**Table 9.** The three best evolutionary algorithms for each testing function.

Based on the finding, we decided to show only one set of results of *maxFES* = 10,000 variant (see Tables 10–14). It is obvious that the best overall evolutionary algorithm to use for linearization is PSO. Except for *f*<sup>5</sup> (variant *maxFES* = 2500) where PSO ended up the second best, it was always the best evolutionary algorithm to use. Based on the results, we can also see that the ABC algorithm can also be considered as a suitable evolutionary algorithm to overall use for linearization.

**Table 10.** The mean results for Function *f*<sup>1</sup> for 50 runs of evolutionary algorithms with a value of *maxFES* = 2500.


**Table 11.** The mean results for Function *f*<sup>3</sup> for 50 runs of evolutionary algorithms with a value of *maxFES* = 2500.



**Table 12.** The mean results for Function *f*<sup>4</sup> for 50 runs of evolutionary algorithms with a value of *maxFES* = 2500.

**Table 13.** The mean results for Function *f*<sup>5</sup> for 50 runs of evolutionary algorithms with a value of *maxFES* = 2500.


**Table 14.** The mean results for Function *f*<sup>5</sup> for 50 runs of evolutionary algorithms with a value of *maxFES* = 10,000.


#### *5.3. Statistical Results Summary*

In this section, the results of the compared algorithms will be statistically assessed. We assess whether or not our proposed method of automatic parameters tuning algorithm proves itself successful or not.

The mean ranks from the Friedman tests [31] can be seen in Table 15. The results of the algorithms without tuning to the value *maxFES* = 10,000 are labeled without any upper/lower index. The tuning algorithm results using *maxFES* = 10,000 are labeled with an upper asterisk index. The tuning algorithm results where *maxFES* = 2500 was set are labeled with an upper asterisk index and lower *s* letter index. The best three performing algorithms are variants with *maxFES* = 2500 using a tuning algorithm. They also achieved similar results in the Friedman test, so it is a good indicator that they are all good to use for the piecewise linearization of functions.


**Table 15.** The mean ranks of all algorithms from the Friedman test.

In Table 16 are the median values for each algorithm and each setting. There are four different settings: O10 and O25 are the original algorithms with *maxFES* = 2500 (O25) and 10,000 (O10). The T10 and T25 settings represent the automatic-parameter-tuning versions with *maxFES* = 2500 (T25) and 10000 (T10). For O10, the best algorithm is ABC. On the other hand, O25 is the best to use with PSO. Results of setting variants T10 and T25 are not that straightforward, but it is obvious that the PSO algorithm is the overall best choice to use.

**Table 16.** The median values for each algorithm and each set.


The next level of statistical comparison provides the Kruskal–Wallis test [32] (see Table 17). This method provides us with the same results as Table 16. The variant O10 is the best to use together with the ABC algorithm. The variant O25, on the other hand, is the best to use together with the PSO algorithm. We cannot say which algorithm to use together with T10 and T25 variants because there is no straightforward choice suitable for all tested functions. Overall, the best choice would be the PSO algorithm which can be found in the first place most times compared to the rest of the algorithms.


**Table 17.** The first, second, third, and the last position of all algorithms and each setting from the Kruskal–Wallis tests.

Finally, the results of the Wilcoxon rank-sum statistical test [33] are presented (see Table 18). The variant T25 was selected as a reference method, and it is compared against variants T10 and O10. The symbol of '+' is used for significantly better results of counterpart, a symbol of '−' is used for significantly better results of reference method, and finally, the symbol of '≈' illustrates no significant difference between algorithms. The comparison of T25 and T10 shows only several significant differences between settings, so it is obvious that these two variants score very similarly. Both variants seem to find good results very quickly, and thanks to this, the importance of *maxFES* is not that high. Comparing T25 and O10, there are 15 cases where the O10 variant performs significantly better than the T25 variant (especially for problem *f*1), but there are 21 cases in total when the O10 variant is significantly worse than the variant T25. We can interpret the results slightly different, though. The variant T25 delivers more consistent results across all cases, whereas the O10 variant delivers either very good or very bad results.

Results also show that the non-tuning algorithm suffers from lowering *maxFES* from 10,000 to 2500, and that is the reason we did not include the O10 variant at all. On the other hand, the automatic parameters tuning algorithm is resistant to the length of the optimization process, therefore, it does not matter if we choose the value of *maxFES* = 2500 or 10,000.


**Table 18.** The median values and significance of all algorithms from the Wilcoxon rank-sum tests.


**Table 18.** *Cont.*

#### **6. Conclusions**

In this paper, we introduced two ways to solve the optimization of the piecewise linearization of a given function. In the first approach, the usage of optimization algorithms for searching piecewise linearization with a predefined number of piecewise linear parts and discretization points with the calculation of the distance between the original and piecewise linear function is proposed. The second method extends the previous approach by the automatic selection of the number of piecewise linear parts and discretization points.

Based on the experimental part of the paper, the following conclusions were achieved. Enhancing the swarm-based optimization algorithms by the proposed tuning approach enables a significant increase in the performance of the optimization process. When the algorithms were applied to the piecewise linearization problem with 2500 function evaluations, the variants of PSO and GWO provided sufficient results, where GWO performs substantially better (Table 7). For *maxFES* = 10,000, GWO, PSO, and ABC provide results with a similar quality, where ABC was the fastest method (Table 8).

From the results of the optimization problems, it is obvious that the variant of PSO was always located on the best positions (Tables 10–14). Moreover, a variant of ABC provided an acceptable quality solution (Table 9). Studying the complexity of the compared algorithms, the variants of PSO and ABC achieved mostly low time demands.

Results of the application of the proposed tuning approach illustrate the substantially increasing performance of the compared swarm algorithms (Table 15). In eight algorithms out of nine, the better overall performance was achieved by the tuning approach, where the biggest difference was achieved in variant of RS, which provided second-best results. It is also obvious that several swarm methods provide worse results compared to the simple RS method, which generates random solutions (BAT, FFL, GWO).

Comparing the achieved median values in three out of four optimization problems, the best performance is provided with the proposed tuning approach (Table 16). Surprisingly, the best results of the piecewise linearization problem *f*<sup>4</sup> provided the RS algorithm with the tuning mechanism.

The main benefit of our tuning approach is the lower time complexity of the optimization process (measured by a number of function evaluations) with sufficient solutions. Using the PSO algorithm as the main swarm intelligence algorithm for solving piecewise linearization problems is, without a doubt, the best choice. This conclusion was clearly demonstrated in Section 5.3.

The proposed tuning approach performs worse than the original swarm intelligence algorithms, especially in problem *f*1. This provides motivation to further study the approach settings. The first step in the research is to generalize the proposed algorithms for functions with higher dimensions and use them for solving other real problems.

The next natural step is to extend this algorithm into higher dimensions and to compare the swarm-based optimization algorithms with the classical mathematical approaches.

**Author Contributions:** Conceptualization, N.Š. and P.R.; methodology, N.Š.; software, P.R.; validation, N.Š., P.R. and P.B.; formal analysis, P.B.; investigation, N.Š.; resources, P.R.; data curation, P.B.; writing—original draft preparation, N.Š., P.R. and P.B.; writing—review and editing, N.Š., P.R. and P.B.; visualization, P.R.; supervision, P.B.; project administration, P.B.; funding acquisition, P.B. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Department of Informatics and Computers, University of Ostrava and also by an Internal Grant Agency of the University of Ostrava grants, numbers SGS17/PˇrF-MF/2021 and SGS17/PˇrF-MF/2022.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** All data were measured in MATLAB during the experiments.

**Conflicts of Interest:** The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **References**

