**Contents**

Reprinted from: *Math. Comput. Appl.* **2023**, *28*, 12, doi:10.3390/mca28010012 ............ **209**

### **Xilu Wang and Yaochu Jin**


### **About the Editors**

### **Carlos Coello**

Carlos Coello received a PhD in Computer Science from Tulane University (USA) in 1996. His research has mainly focused on the design of new multi-objective optimization algorithms based on bio-inspired metaheuristics (e.g., evolutionary algorithms), which is an area he has made pioneering contributions to. He has received several awards, including the National Research Award (in 2007) from the Mexican Academy of Science and the 2012 National Medal of Science in Physics, Mathematics and Natural Sciences from Mexico's presidency, the 2013 IEEE Kiyo Tomiyasu Award, the 2016 The World Academy of Sciences (TWAS) Award in "Engineering Sciences", and the 2021 IEEE Computational Intelligence Society Evolutionary Computation Pioneer Award. Since January 2011, he has been an IEEE Fellow. He is currently the Editor-in-Chief of the IEEE Transactions on Evolutionary Computation. He is Full Professor with distinction (Investigador Cinvestav 3F) at the Computer Science Department of CINVESTAV-IPN in Mexico City, Mexico.

### **Erik Goodman**

Erik Goodman was PI and Director of BEACON Center for the Study of Evolution in Action, an NSF Center headquartered at Michigan State University, from 2010-2018. He was Professor of Electrical and Computer Engineering and Mechanical Engineering and Computer Science and Engineering until he retired in 2022. He co-founded Red Cedar Technology (1999, now part of Siemens), and developed the HEEDS SHERPA commercial design optimization software now widely used in industry. Honors include Michigan Distinguished Professor of the Year, 2009; MSU Distinguished Faculty Award, 2011; Senior Fellow, International Society for Genetic and Evolutionary Computation, 2004; Founding Chair, ACM SIG on Genetic and Evolutionary Computation (SIGEVO), 2005-2007, and continuing service on its Executive Committee and Advisory Committee.

### **Kaisa Miettinenn**

Kaisa Miettinen is Professor of Industrial Optimization at the University of Jyvaskyla. Her research interests include theory, methods, applications and software of nonlinear multiobjective optimization including interactive and evolutionary approaches, in particular, different types of hybrid methods. She heads the Research Group on Multiobjective Optimization and is the director of the thematic research area called Decision Analytics, utilizing Causal Models and Multiobjective Optimization (DEMO, www.jyu.fi/demo). She has authored over 200 refereed journal, proceedings, and collection papers, edited 19 proceedings, collections and special issues and written a monograph on 'Nonlinear Multiobjective Optimization'. She is a member of the Finnish Academy of Science and Letters, Section of Science, and has served as the President of the International Society on Multiple Criteria Decision Making (MCDM). She belongs to the editorial boards of seven international journals and the Steering Committee of Evolutionary Multiobjective Optimization. She has previously worked at IIASA, International Institute for Applied Systems Analysis in Austria, KTH Royal Institute of Technology in Stockholm, Sweden and Helsinki School of Economics, Finland. She has received the Georg Cantor Award of the International Society on MCDM for independent inquiries to develop innovative ideas in the theory and methodology of MCDM.

### **Dhish Saxena**

Dhish Saxena is an Associate Professor at the Department of Mechanical and Industrial Engineering and Mehta Family School of Data Science and Artificial Intelligence at the Indian Institute of Technology Roorkee. Prior to joining IIT Roorkee, Dhish worked with Cranfield University and Bath University, UK (2008-12). His research has focused on development of evolutionary multi- and many-objective optimization algorithms; their performance enhancement through machine learning; their termination criterion; decision support based on redundancy determination and preference ranking of objectives and constraints. He is also an Associate Editor for Elsevier's *Swarm and Evolutionary Computation* journal.

### **Oliver Sch ¨utze**

Oliver Schutze is a Full Professor at the Cinvestav-IPN in Mexico City, Mexico. His main ¨ research interests are numerical and evolutionary optimization. He is the co-author of more than 150 publications, including two monographs, five school textbooks, and ten edited books. He is recipient of the C. S. Hsu Award 2022. Two of his papers have received the IEEE Transactions on Evolutionary Computation Outstanding Paper Award (in 2010 and 2012). He is the founder of the Numerical and Evolutionary Optimization (NEO) workshop series. He is Editor-in-Chief of the journal *Mathematical and Computational Applications* and is a member of the Editorial Board of the journals *Engineering Optimization, Computational Optimization and Applications, IEEE Transactions on Evolutionary Computation, Research in Control and Optimization*, and *Applied Soft Computing*. He is a member of the Mexican Academy of Sciences and the National System of Researchers (SNI III).

### **Lothar Thiele**

Lothar Thiele joined ETH Zurich, Switzerland, as a full Professor of Computer Engineering in 1994. His research interests include bioinspired optimization techniques; models, methods, and software tools for the design of real-time embedded systems; cyberphysical systems; embedded software. In 1986, he received the "Dissertation Award" of the Technical University of Munich, in 1987, the "Outstanding Young Author Award" of the IEEE Circuits and Systems Society, in 1988, the Browder J. Thompson Memorial Award of the IEEE, and in 2000-2001, the "IBM Faculty Partnership Award". In 2004, he joined the German Academy of Sciences Leopoldina. In 2005, he was the recipient of the Honorary Blaise Pascal Chair of University Leiden, the Netherlands. Since 2010, he is a member of the Academia Europea. In 2013, he joined the National Research Council of the Swiss National Science Foundation SNF. Lothar Thiele received the "EDAA Lifetime Achievement Award" in 2015. Since 2017, Lothar Thiele has been Associate Vice President of ETH for Digital Transformation. Lothar Thiele has been elected IFIP Fellow by the International Federation for Information Processing (IFIP) as part of its first cohort of fellows in 2020. In 2021, he received the IEEE TCRTS Achievement and Leadership Award.

## **Preface to "Evolutionary Multi-objective Optimization: An Honorary Issue Dedicated to Professor Kalyanmoy Deb"**

This volume is a reprint of the Honorary Special Issue dedicated to the 60th birthday of Professor Dr. Kalyanmoy Deb, published in the journal Mathematical and Computational Applications (MCA). Kalyanmoy Deb is a pioneer and highly impactful and influential proponent of Evolutionary Multi-objective Optimization (EMO) since 1994. He is currently a Koenig Endowed Chair Professor and University Distinguished Professor in the Department of Electrical and Computer Engineering at Michigan State University, USA, and holds additional appointments in Mechanical Engineering and in Computer Science and Engineering. Professor Deb's research interests are in evolutionary optimization and its application in multi-objective optimization, modeling, machine learning, and in multi-objective decision making. He has been a visiting professor at various universities across the world, including IITs in India, Aalto University in Finland, the University of Skovde in Sweden, and Nanyang Technological University in Singapore. He was awarded the IEEE Evolutionary Computation Pioneer Award, the Infosys Prize, the TWAS Prize in Engineering Sciences, the CajAstur Mamdani Prize, the Distinguished Alumni Award from IIT Kharagpur, the Edgeworth Pareto Award, the Bhatnagar Prize in Engineering Sciences, and the Bessel Research Award from Germany. He is a fellow of IEEE, ASME, and three Indian science and engineering academies.

He has published over 600 research papers, with Google Scholar citations of over 180,000 and with an h-index of 131. More information about his research contributions can be found from https://www.coin-lab.org. This volume contains one interview, one review paper, and 11 regular papers. We briefly present all of these contributions in the following. The regular papers are organized chronologically by their publication times in MCA.

In Chapter 1, Kalyanmoy Deb gives an interview to the guest editors of the Honorary Special Issue. In this interview, Dr. Deb talks about formation, development and challenges of the Evolutionary Multi-objective Optimization (EMO) community, important points in his career, and issues he has faced in getting his work published.

In Chapter 2, Sinha and Wallenius first review the classic interactive approaches from the field of Multiple Criteria Decision Making (MCDM), followed by the underlying idea and methods in the field of EMO. Next, they consider and discuss several promising MCDM and EMO hybrid approaches that aim to capitalize on the strengths of these two domains. Finally, they conclude with discussions on important behavioral considerations related to the use of such approaches and possible paths of future work.

In Chapter 3, Hernandez Castellanos and Sch ´ utze propose a new bounded archiver, ¨ ArchiveUpdateHD, aiming for Hausdorff approximations of the Pareto front of a multi-objective optimization problem. It is shown that an application of this archiver yields, under certain (mild) assumptions with a probability one after finitely many steps, a Δ<sup>+</sup> approximation of the Pareto front, where Δ<sup>+</sup> is computed by the archiver within the run of the algorithm without any prior knowledge of the Pareto front. Numerical results using ArchiveUpdateHD as an external archiver within state of the art multi-objective evolutionary algorithms (MOEAs) indicate the benefit of the novel strategy.

In Chapter 4, Xing and Sun study a multi-objective optimization problem related to the viscous boundary condition of an elastic rod. For the numerical treatment of this problem they use the algorithm GA-SCN, a hybrid of the multi-objective evolutionary algorithm NSGA-II and cell mapping techniques. Finally, several optimal designs are illustrated and discussed.

In Chapter 5, Pannerselvam et al. use importance sampling to deal with scarce sample based reliability estimation and optimization. More precisely, they propose to approximate the probability density function and the cumulative distribution function using kernel functions and employ these approximations to find the parameters of the importance sampling density to eventually estimate the reliability. The proposed approach is finally tested on several benchmark reliability examples.

In Chapter 6, Nebro et al. investigate the applicability of the evolutionary algorithm NSGA-II to larges cale multi-objective optimization problems. To this end, the authors use the automated algorithmic tuning method irace together with a highly configurable version of NSGA-II available in the jMetal framework. The resulting tuned algorithm is then tested on the continuous ZDT test functions with up to 217 = 131, 072 decision variables and on a particular binary problem with thousands of bits. Results show that significant improvements can be obtained compared to the original NSGA-II.

In Chapter 7, Smedberg et al. investigate a novel knowledge driven optimization (KDO) approach to speed up the convergence in reconfigurable manufacturing systems (RMS) applications. This approach generates generalized knowledge from previous scenarios, which is then applied to improve the efficiency of the optimization of new scenarios. The proposed approach is then applied to a multi-part flow line RMS. The results demonstrate how a KDO approach leads to convergence rate improvements in a real world RMS case.

In Chapter 8, Ramos Figueroa et al. present a comparative experimental study of different mutation operators for a Grouping Genetic Algorithm 2 (GGA) designed to solve the parallel machine scheduling problem with unrelated machines and makespan minimization. The focus is on identifying the strategies involved in the mutation operations and their adaptation to the characteristics of the studied problem. Experimental results indicate that the state of the art GGA performance considerably improves by replacing the original mutation operator with the new one.

In Chapter 9, Wang et al. extend the recently proposed Hypervolume Newton Method (HVN) to the treatment of constrained multi-objective optimization problems with in principle any number of objectives. This Newton method is defined on the space of (vectorized) fixed cardinality sets of decision space vectors for a given multi-objective problem (MOP) with the aim to maximize the hypervolume indicator. Numerical results are presented of the method both as standalone algorithm as well as local search engine within a multi-objective evolutionary algorithm.

In Chapter 10, Sinisterra Sierra et al. propose an evolutionary multi-objective algorithm based on NSGA-II to guide the mining process in datasets. In particular they consider the dataset composed of 15.5 million records with official data describing the COVID-19 pandemic in Mexico. The proposed algorithm generates, recombines, and evaluates patterns, focusing on recovering promising high quality rules with actionable cause effect relationships among the attributes to identify which groups are more susceptible to disease or what combinations of conditions are necessary to receive certain types of medical care.

In Chapter 11, Wang and Jin propose a novel particle filter based multi-objective optimization algorithm (PF-MOA) by transferring knowledge acquired from the search experience. The key insight adopted here is that, if one can construct a sequence of target distributions that can balance the multiple objectives and make the degree of the balance controllable, one can approximate the Pareto optimal solutions by simulating each target distribution via particle filters. Experimental results on several test functions show that PF-MOA achieves competitive performance compared to state of the art MOEAs on most test instances.

In Chapter 12, Gaspar-Cunha et al. propose a novel machine learning methodology, called

FS-OPA, to reduce the number of objectives within a many-objective optimization problem. The new method is first assessed using the DTLZ benchmark problems, where the method shows good performance compared to similar algorithms. Finally, the strength of the method is demonstrated on a difficult real world application coming from polymer processing.

Finally, in Chapter 13, Biswas and Sharma propose a single loop multi-objective reliability based design optimization formulation that approximates 3 reliability analysis using Karush Kuhn Tucker (KKT) optimality conditions. Further, chaos control theory is used to avoid convergence issues. Numerical results demonstrate that the proposed method, MORBDO, is highly competitive to double loop variants of multi-objective differential evolution algorithms.

We warmly thank the authors who contributed to this special issue as well as the reviewers for their constructive comments. We hope the readers enjoy to study these works as much as we enjoyed editing them. Among others, these works demonstrate that Evolutionary Multi objective Optimization is still an active and fruitful research field after three decades of its existence.

### **Carlos Coello, Erik Goodman, Kaisa Miettinen, Dhish Saxena, Oliver Schutze ¨ and Lothar Thiele** *Editors*

### **Interview: Kalyanmoy Deb Talks about Formation, Development and Challenges of the EMO Community, Important Positions in His Career, and Issues Faced Getting His Works Published**

**Carlos Coello 1, Erik Goodman 2, Kaisa Miettinen 3, Dhish Saxena 4, Oliver Schütze 1,\* and Lothar Thiele <sup>5</sup>**


### **1. Introducing Kalyanmoy Deb**

Kalyanmoy Deb was born in Udaipur, Tripura, the smallest state of India at the time, in 1963. He is the eldest of four siblings. Like him, his other brothers are also engineers, one in academics, one in an industry, and the other is a freelancer. Educated in the IIT system in India, he worked for two years in a reputed engineering design company, before heading for his graduate studies in the USA. After his return to India, he taught at IIT Kanpur for 20 years. He is currently a University Distinguished Professor and Koenig Endowed Chair Professor in the Department of Electrical and Computer Engineering at Michigan State University, USA. Prof. Deb's research interests are in evolutionary optimization and its application in multi-criterion optimization, modeling, and machine learning. He has been a visiting professor at various universities across the world including the University of Skövde in Sweden, ETH Zurich in Switzerland, Aalto University in Finland, Nanyang Technological University in Singapore, and a few IITs in India. He was awarded the IEEE Evolutionary Computation Pioneer Award for his pioneering work in EMO, Infosys Prize, TWAS Prize in Engineering Sciences, CajAstur Mamdani Prize, Distinguished Alumni Award from IIT Kharagpur, Edgeworth-Pareto Award, Bhatnagar Prize in Engineering Sciences, and Bessel Research Award from Germany. He has received an honorary doctorate degree from the University of Jyvaskyla, Finland. He is a fellow of ACM, ASME, IEEE, and three Indian science and engineering academies. He has published over 600 research papers. He is married to Debjani Sarkar, who is an academic specialist at Michigan State University. Their son runs a start-up on AI and their daughter works in a reputed company as a marketing manager.

### **2. Introducing Evolutionary Multi-Objective Optimization (EMO)**

Multi-objective optimization (MO) problems give rise to not one, but a set of Paretooptimal solutions, each of which makes a trade-off among the associated objectives with another solution. Between a pair of solutions, if one is better on one objective, it must be worse in at least one other objective. Although a single solution is desired as an outcome of a multi-objective optimization task, finding a representative set of Pareto-optimal solutions can be helpful in the process of making a decision. There exist different scalarizationbased multi-objective optimization methods that scalarize multiple objectives into a single

**Citation:** Coello, C.; Goodman, E.; Miettinen, K.; Saxena, D.; Schütze, O.; Thiele, L. Interview: Kalyanmoy Deb Talks about Formation, Development and Challenges of the EMO Community, Important Positions in His Career, and Issues Faced Getting His Works Published. *Math. Comput. Appl.* **2023**, *28*, 34. https://doi.org/ 10.3390/mca28020034

Received: 23 February 2023 Accepted: 24 February 2023 Published: 1 March 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

parameterized one and apply a single-objective optimization method to find the respective optimal solution. Most scalarization techniques ensure that the resulting optimal solution is a Pareto-optimal one but the scalarization technique must be selected carefully to be able to reach any Pareto-optimal solution.

Evolutionary multi-objective optimization (EMO) methods work with a population of solutions in every iteration and can find multiple well-diversified solutions simultaneously. Because of their heuristic nature, they cannot usually guarantee Pareto-optimality, but they approximate Pareto optimal solutions. Early EMO methods could handle two and three objectives well, but the new methods, known as evolutionary many-objective optimization (EMaO) methods, are demonstrated to handle as many as 15 to 20 objectives. EMO and EMaO methods use an implicitly parallel search process introduced by the evolutionary operators, and a partial ordering and diversity-preserving-based selection mechanism. Aided by modular and flexible structures, EMO and EMaO methods are regularly used to solve challenging academic and industrial problems. They have also been commercialized into software packages and public-domain codes for their use at large. The discovery of a representative set of Pareto-optimal solutions has a number of advantages for users. First, the set of solutions can be analyzed to understand the comprehensive nature of possible variations of objectives and their trade-offs, which can provide useful information to the users to follow an informed decision-making task for picking a single preferred solution for deployment. Second, the knowledge of alternate Pareto-optimal solutions can utilize them to use in a platform-based solution philosophy, in which every Pareto-optimal solution can become a potential solution for a different hardware or system platform. Third, an application of machine learning techniques to multiple Pareto-optimal solutions can bring out essential common principles hidden in them. These common principles can reveal valuable insights for constructing optimal solutions for a problem. Fourth, the EMO and EMaO philosophies are increasingly being used to introduce helper objectives in the search process to find optimal solutions for original objectives faster and with more accuracy.

EMO and EMaO methods have uniquely utilized evolutionary algorithm's (EO) population approach to finding and storing multiple optimal solutions. The matching of MO and EO philosophies could not be any better. MO gives rise to multiple alternate solutions and EO's population approach provides a platform to find and capture them. For the past three decades, EMO researchers have not only exploited this match to develop efficient MO algorithms, but they also have launched various related studies to make EMO a field of study with hundreds of PhD theses, commercial and public-domain software, dedicated conference/seminar series, and a record number of publications. Many new ideas for improving existing algorithms, new areas of applications, and new ways to utilize them for various problem-solving areas are continuously emerging. EMO has undoubtedly become a unique and ubiquitous medium for solving multi-objective problems.

It is perhaps an excellent time to celebrate the moment and recognize every EMO researcher's hard work, passion, and collaborative efforts over the past three decades.

### **3. Interview**

The following is an interview with Prof. Kalyanmoy Deb. The editor's question is stated first, followed by Deb's response.

**1. Kalyan, thank you very much for taking some of your valuable time for this interview that we are doing as part of the Special Issue dedicated to your 60th birthday. The title of this SI is "Evolutionary Multi-objective Optimization" (EMO) which leads us directly to the first questions, since you are a pioneer and highly impactful and influential proponent of EMO since 1994. Can you recall for us your first steps that, looking back, helped in the formation of what we today call the EMO community?**

First of all, I am touched and humbled by your initiative in compiling this Special Issue for the occasion of my 60th birthday. It is a great honor for me. I also take this opportunity to thank all authors and reviewers of the papers published in this Special Issue. My appreciation also goes to the MDPI journal on Mathematical and Computational Applications for publishing this Special Issue.

It has been a long journey, hasn't it! The birth of EMO studies and the start of my academic career as an assistant professor at Indian Institute of Technology Kanpur (IITK) in India, happened almost concurrently. After completing my graduate studies and a short post-doctoral stay in the USA, I returned to India in early 1993 and took the Assistant Professor position at IITK. During my graduate studies (1987–1991), I was fortunate enough to have been exposed to genetic algorithms (GAs)—a fascinating concept for solving search and optimization problems using principles of natural selection and genetics–from the Evolutionary Computation pioneer David Goldberg. A 10-line outline of a plausible GAbased multi-objective optimization algorithm in Goldberg's 1989 pioneering book (Addison-Wesley) caught my attention, while I took the GA course from Goldberg. In an earlier attempt by David Schaffer in 1985, Goldberg observed that a proactive diversity-preserving operator was missing in Schaffer's vector-evaluated GA (VEGA). Having worked on nichebased GAs in my master's thesis, I immediately realized that Goldberg's suggestion for building a working EMO algorithm was just on the horizon. However, by that time, I was already quite advanced with my PhD topic on the development of messy GAs—a variable-length GA that could solve complex problems including deceptive problems, which were found difficult to solve by standard GAs. I temporarily put off my interest on multi-objective optimization research and waited until I had my first graduate student, Nidamurthy Srinivas, at IITK, to begin working on Goldberg's suggestion. The use of non-domination sorting (NS) and niche- preservation based on a sharing function approach in the GA's selection operator confirmed Goldberg's intuition. There came one of the first EMO algorithms—NSGA. We submitted our paper to Evolutionary Computation Journal of MIT Press in 1993 and the paper appeared in print in 1995. Those days, the internet was not that accessible and soon thereafter I came across two other papers which used Goldberg's idea in slightly different ways and produced two other successful EMO algorithms: multi-objective GA (MOGA) and niched Pareto GA (NPGA). Each of these methods showed that a stable population of Pareto- optimal solutions could be found and maintained for successive generations on two-objective problems. In my opinion, these three studies during 1993–1995 have initiated the birth of the EMO field, although there were a few other EMO studies that came soon thereafter which did not use Goldberg's idea literally.

Of course, a few papers or even one great idea does not often fan out to be a successful field of research and application which has lasted for about three decades now. I narrate some of the systematic and chronological developments in which I took a major part. First, more efficient EMO algorithms with fewer tunable parameters and elite preservation appeared. My 2002 NSGA-II paper (IEEE Transactions on Evolutionary Computation (TEVC)) is one such EMO algorithm, in addition to Zitzler and Thiele's Strength Pareto Evolutionary Algorithm (SPEA) and Knowles and Corne's Pareto-Archived Evolution Strategy (PAES). The simplicity and modularity in these algorithms and the availability of their public-domain codes make EMO accessible to researchers and applicationists within and outside the computer science and engineering communities. These algorithms have helped mature the EMO field and attracted many newcomers. Second, with every new algorithm being proposed, I started to realize the need for a test suite through which algorithms can be tested and compared with each other. I found a mechanism in my 1999 test problem construction paper (MIT Press's ECJ) by which existing single-objective challenging test problems can be channeled to construct similarly difficult test problems for multi-objective optimization. That study led to a collaboration with Eckart Zitzler and Lothar Thiele to formulate a two- objective Zitzler-Deb-Thiele (ZDT) test suite with discernable Pareto-optimal fronts. Although largely concurred, ZDT problems are still used as the first problems to test a new algorithm on. Third, with the existence of efficient algorithms to apply to challenging test problems, researchers proposed various performance metrics to measure convergence and diversity of obtained solutions. In my opinion, this three-pronged development of

"Algorithm-Test-suite–Performance-metric" allowed more researchers to introduce new ideas and industries to venture into solving their problems for multiple objectives. All these activities started the EMO revolution, and there was no stopping it.

**2. The growing interest in EMO even led to, among other things, a new conference series dedicated to this topic, called Evolutionary Multi-Criterion Optimization. Its first edition was held in Zurich (Switzerland) in 2001, and is since been held biannually and very successfully until now. Can you tell us a bit about the evolutionary history of this event series?**

The opportunity offered by EMO algorithms to solve problems for multiple objectives attracted many bright PhD students. Journals started to accept EMO papers and major evolutionary computation (EC) conferences accepted EMO-related papers in their regular tracks. It became clear to everyone that EC's population approach provided a unique niche for solving multi-objective problems and EMO was being flagged as a success story of EC. To push the EMO activities further and to let everyone know about others' work closely, I realized that a dedicated conference on EMO was the need of the time. It was December of 1999 and I was on a flight from Delhi to travel to Zurich to examine the PhD thesis of Eckart Zitzler. I pondered on how nice it would be to hold the first EMO conference in Zurich. I expressed my thoughts to Eckart and Lothar, and before I realized it, Lothar was on the phone to find an available date of ETH's auditorium to host the proposed conference. The first international conference on EMO was held in March of 2001 at ETH Zurich with about 50 papers presented. Springer agreed to publish the proceedings under its Lecture Notes in Computer Science (LNCS) series. We prepared the conference for about 60 participants, but 90+ participants attended the conference. If I recall correctly, we had to order extra proceedings from Springer and post them later to many participants. The conference was a huge success, and three proposals for hosting the second one were received. To involve more key EMO researchers in the decision-making of future EMO conference events, Eckart, Lothar and I decided to form an EMO Steering Committee with a total of seven members, which has been recently extended to have 11 members. The steering committee decided to host the conference every two years and adopted a couple of practices from the very first EMO conference: (i) there will be no parallel sessions, so everyone is in the same room for all presentations, thereby giving every paper a wide attention, and (ii) there will be Multi-Criterion Decision-Making (MCDM) events within EMO conferences. Since 2001, the EMO conference series has been held every odd year.

**3. For the treatment of multi-objective optimization problems you mainly use evolutionary techniques. However, you have always promoted the use of mathematical programming and multi-criterion decision-making (MCDM) techniques within EMO, which has had a significant impact on the formation of the community. Could you comment on that?**

I consider myself a problem solver rather than particularly an EC or an optimization researcher. I strongly believe that a successful researcher should always acquire a good knowledge of the fundamentals associated with the topic before starting to work on it. This not only provides a deeper understanding of the topic for making any fundamental changes, but it also paves the way to know other contemporary approaches as possible alternatives. As you have correctly pointed out, mathematical programming and MCDM fields are two related and contemporary fields which deal with multi-objective problemsolving. While I understand that it is not easy and always uncomfortable to go out of one's own comfort zone and mingle with people in a different field to understand their trades, the trouble is worth taking for two reasons. First, it allows one to evaluate one's methods with other competing methods, and the process can eventually motivate developing hybrid methods. Second, it helps to propagate one's methods to the other contemporary fields.

It was evident from the beginning that multi-objective problem-solving tasks should end up or involve somehow a decision-making activity in arriving at a single preferred solution. I was fortunate to be invited to attend a few MCDM events in 1999 and the years following thereafter, and I came to know the existence of an MCDM field which had been addressing multi-objective problem solving since the early seventies. While they were mainly interested in scalarizing multiple objectives into a single one and in involving a decision-maker directly to provide preference information to move to new scalarized problems iteratively, I realized that EMO studies could definitely benefit by working with MCDM researchers. EMO's ability to find multiple representative near-Pareto solutions can be combined with MCDM-based preference incorporation ideas to make the whole EMO-MCDM approach holistic. To create this merger, I planned a few events.

First, at the EMO-2001 conference, we invited two prominent MCDM researchers: Kaisa Miettinen and Ralph Steuer, both being authors of popular MCDM books, to give a tutorial and a keynote speech on MCDM topics for EMO researchers to be aware of. This tradition has been followed in a number of future EMO conferences.

Second, in 2004, during my Bessel Research Prize visit to the University of Karlsruhe, Germany, I joined hands with my hosts Juergen Branke and Hartmut Schmeck, along with the above-mentioned MCDM researchers, to propose a Dagstuhl Seminar at Schloss Dagstuhl, Saarbrucken, Germany by inviting 30 EMO and 30 MCDM researchers. It was the first time these two groups met and openly exchanged ideas with each other. Of course, EMO being about 20 years younger than MCDM in terms of its inception, EMO researchers strikingly found that many of their ideas were already proposed by their elder counterparts. However, the seminar provided a breeding ground for the two groups to plan future collaborative studies. I must say that MCDM researchers were also exposed to the EMO philosophy and the later publication records of some of the leading MCDM researchers clearly support my assertion. The success of the first Dagstuhl seminar motivated us to repeat it at regular intervals. The epitome of the merger was the publication of an edited book (under Springer's LNCS series, edited by four founding organizers) in which most chapters were jointly written by EMO and MCDM authors.

Third, I was invited to visit Helsinki School of Economics (now Aalto University School of Business) as a Finland Distinguished Professor for two years and to collaborate with Kaisa Miettinen, Jyrki Wallenius and Pekka Korhonen – three stalwarts in the MCDM community. With this collaboration, I had a better appreciation of the MCDM philosophy and met other prominent MCDM researchers who regularly visited the university. I began to combine EMO and MCDM methods, a process which resulted in reference-point-based NSGA-II, reference-direction-based NSGA-II, light-beam-search-based EMO, progressively interactive EMO, and others which also combined EMO with MCDM methods to find a single preferred Pareto-optimal solution at the end.

Fourth, during my Helsinki visit, I also worked with Jyrki and others to make EMO an area topic for the Journal of Multi-Criterion Decision Analysis (Wiley) and served as an area editor from 2009 until 2018.

Fifth, in 2008, I worked with the International Society on MCDM to establish an EMO track within their bi-annual MCDM conferences and reciprocated the same, with the advice of the EMO steering committee, by instituting an MCDM track within EMO conferences soon thereafter. I am happy that these practices are still being continued.

My quest for fundamental understanding has helped me tremendously in evaluating EC's scope as an optimization algorithm compared to mathematical optimization literature, although I must admit that I do not have the adequate mathematical background to understand all of their detailed theoretical intricacies. However, I have been fortunate to have a few colleagues in mathematical optimization and operations research areas with whom I have not only pursued some fundamental convergence studies, but also co-taught multi-objective optimization courses, exposing students to both mathematical and computational worlds of optimization. Using variational principles, we were able to estimate a Karush-Kuhn-Tucker Proximity Measure (KKTPM) for any feasible or infeasible

solution from the KKT-based Pareto-optimal set without actually knowing the location of the Pareto-optimal set. Although the KKTPM measure requires computation of derivatives of objective and constraint functions, the idea brought in useful EMO operators aiding guaranteed convergence to EMO studies.

To reiterate the importance of associated knowledge around a field, the next example is illustrative. I was exposed to a resource allocation problem from an industry which involved about 50,000 integer decision variables. While it was a linear programming problem, the integer restriction of variables made all the differences between a fast and guaranteed solution methodology for the real-parameter version of the problem and an exponentially worse algorithm for its integer version. Well-known operations research software packages could not find the optimal solution for 2000 or more variables. We developed a customized EC-based procedure that recombined two or more solutions meaningfully in the context of the problem and used local adjustments to try to make infeasible solutions feasible. The procedure not only found near-optimal solutions (within a maximum of 0.03% deviation from the true optimum) in 2000 or even 50,000 variables, but to a staggering one billion variable version of the problem in polynomial computational time. I believe more such defining contributions are possible and are worth pursuing, but this will require a good understanding of the associated literature and strengths and weaknesses of various alternative methodologies.

### **4. How do you see the current development of the EMO community?**

I am absolutely certain that the EMO field is in good hands. I am happy that a simple idea on the use of a population-based optimization method to find multiple Pareto-optimal solutions simultaneously survived almost three decades and provided EMO researchers with plenty of opportunities to formulate new research ideas, extended to solve various types of problems, and helped merge multiple fields together.

Looking at the recent publications, a major thrust in EMO research today is clearly in the area of evolutionary many-objective optimization (EMaO), which focuses on addressing four or more objectives. While several efficient EMaO algorithms are in place based on reference vectors, the idea is interesting enough to be pursued further.

Another current development in EMO is in the use of machine learning (ML) methods for enhancing performance of EMO and EMaO algorithms. In the past two decades, ML has experienced a surge of activities, mainly due to the availability of data and the need for finding intelligence from data. Evolving a population of solutions and their objective/constraint values within an EMO algorithm can also be seen as a series of evolving data. ML methods can mine the data to reveal interesting search patterns and directions, which in turn can help make EMO methods faster and more reliable. Various such efforts in utilizing ML to improve EMO are underway. On a different note, EMO researchers should also find ways to utilize EC and EMO algorithms for enhancing ML's performance to make EMO an integral part of the current ML revolution.

Surrogate-assisted EMO is another area which is getting significant attention for its own right. Optimizing for a budget of solution evaluations will keep EMO applications practically viable.

Challenging test-problem development for benchmarking EMO algorithms should always be a constant thrust of EMO researchers. I am happy to see the original ZDT and DTLZ-based philosophies are being constantly extended to create more challenging test problems and EMO algorithms are improving consistently as a result.

### **5. What do you think are the most important challenges EMO has to face in the future?**

EMO and EMaO algorithms are now quite capable of addressing different kinds of multi- and many-objective problems, although further improvements are always necessary. They have performed well on challenging test problems and some small-sized engineering problems, but their real test will come when they are extensively applied to large-scale realworld problems. Industries are slowly but surely embracing EMO algorithms for solving

two- and three- objective problems mostly (thanks to the use of dedicated commercial software and public-domain codes on EMO!) and it may be a while before they move to addressing more objectives. In the meantime, EMO researchers should advance the current practices as well.

First, more representative problems from real-world problems need to be identified and used to test our best EMaO algorithms for their working. In this direction, a direct collaboration with commercial software companies and researchers in application industries would be helpful.

Second, many-objective problems demand an easy and insightful visualization technique to understand trade-offs among Pareto-optimal solutions. There is a lack of a suitable visualization technique for understanding trade-offs, feasible search spaces, Pareto boundaries, etc., conveniently. Let us accept that the standard parallel coordinate plot (PCP) or radial visualization (RadViz) or scatter plots do not cut it. We may get influenced by high-dimensional data analysis literature for a clue here, but let us understand that our data have a special property – they possess a trade-off among the dimensions, in which generic data analysis folks may not be particularly interested. Hence, EMO researchers may have to find a solution for many-objective Pareto-optimal data visualization themselves.

Third, I strongly believe optimization algorithms must be customized for specific problem classes to make them more efficient both in terms of computational time and solution accuracy. While ML methods can be of help here (as alluded to before), practical use of EMO algorithms must be accompanied by an interactive platform which enables monitoring and aiding in the solution process by real users during the optimization process. Users' many years of experience on the problem can be utilized to customize an algorithm on the fly. Optimization algorithms discover useful variable interactions and patterns through their iterations, and a user's interaction can be made more fruitful if such discoveries can be shared with the user for their feedback on the relevance of the discoveries. Preference-related feedback can also be integrated here for multi-objective problems. We should soon see more such interactive EMO platforms being developed.

In most EMO and EMaO studies, we have focused on developing selection mechanisms for handling multiple objectives and have not spent much time on creation mechanisms for finding new and effective solutions. Unless new and diverse solutions are created by EMO's generation process, the multi-objective selection operator cannot do much. We should start focusing on hybrid genetic and local search methods and focus on creating more solutions directly in places where there is a lack of non-dominated solutions in the current population.

EMO has matured enough now to be applied to address large-scale societal and industrial problems. Problems affecting societies, such as climate change, obesity, forest management, agricultural management problems involving water, energy and food, and others, involve many conflicting objectives in terms of operating and installation costs, environmental effects, sustainability issues, etc., having numerous variables that can be adjusted with time and having constraints which must be satisfied to make a solution implementable. Finding a few alternative Pareto-optimal solutions by EMO algorithms customized to such problems can provide policy-makers with a new and transparent solution approach. Industrial problems such as supply-chain management, large manufacturing system operation, and integrated multi-level design tasks are other areas.

EMO algorithms, like single-objective EC methods, are stochastic and cannot ever have a theoretical convergence proof for any arbitrary problem, as supported by the no-freelunch theorem. However, an EMO algorithm's population approach and its recombination operator help establish an implicit parallel search, which makes the EMO algorithm unique and different from other optimization methods. Collectively, we should find and focus on addressing problems that are difficult to solve by existing point-based methods, but a clever design of an EMO method can help find acceptable solutions.

**6. During your career, you have held numerous important positions. You have already mentioned your times in Dortmund and the ETH Zurich as visiting professor. Your**

### **main affiliation has been at the IIT Kanpur in India. After 15 years of service you decided to take a position in Helsinki (Finland). What was your main motivation for that?**

I started my professional academic career at IIT Kanpur in India in 1993, when GAs were then mostly unheard of and their practice was questioned in engineering departments. I kept working on some key issues needed to popularize EC and make EC an effective tool for search and optimization in practice. I am happy that a few of these contributions have become popular over the years, including my parameter-less constraint handling approach, real-parameter recombination (SBX) and polynomial mutation operators, multi-objective optimization algorithm (NSGA series), multi-objective test problem construction, two textbooks on optimization, and others. I had the good fortune to have extremely dedicated students with excellent programming skills to help me execute these studies.

From time to time I realized that I needed to get feedback and have real discussions with experts in the field. I took a few opportunities that came my way to visit and interact with key EC experts: University of Dortmund with the Humboldt Fellowship from Alexander von Humboldt Foundation, Germany during 1998–1999, ETH Zurich with visiting professorship in 2001, University of Karlsruhe with Bessel Research Prize Award from Alexander von Humboldt Foundation, Germany in 2003, Nanyang Technological University, Singapore with A\* project visit in 2006, Helsinki School of Economics with Finland Distinguished Professorship from the Academy of Finland during 2007–2009, and a number of bilateral project visits between India and European countries. These extended visits not only put me on the right track, but also exposed my work to experts in the field. Although such frequent visits came at the expense of relocating my family, I would recommend to young and isolated researchers to embrace such research visits as opportunities, rather than a disadvantage. I thank my family for their sacrifice and adjustments which I sincerely hope have given them better exposure and made them better individuals.

### **7. The next—and until now last—major change came in 2011 where you moved to East Lansing (USA) to become Professor and Koenig Endowed Chair at Michigan State University, which definitely came with new challenges for you and your family.**

The genetic algorithms research was started in Michigan in early sixties. Michigan State University (MSU) is one of the few universities in USA which traditionally had a strong focus in evolutionary computation field. The BEACON center for the study of evolution in action funded by National Science Foundation (NSF) at MSU enabled a major research collaboration opportunity in various aspects of evolution led by Prof. Erik Goodman. When an endowed chair faculty position was offered to me at MSU, I did not have any second thought. Thus, far, I had the opportunity to work with several MSU colleagues from various disciplines, visiting researchers from various countries, and automobile and chemical industries in Michigan to have a better fulfillment of my research career. The move also provided great educational opportunities to my children at a critical time of their careers.

**8. Finally, we come to another topic that might be very interesting, in particular, for younger scholars. We recall a Keynote Talk of yours where you presented a new evolutionary algorithm for a particular resource allocation problem. While the results were amazing, you mentioned that you have faced major issues to get the related paper published. Many readers might assume that publication of a paper that contains such great results and that comes from a renowned researcher like you should just be a formality. Apparently, this is not always the case. Could you comment on that?**

Most researchers may have faced such incidents in their careers. Since you mentioned it, let me address it to hopefully make a remark on the current paper review system in our field. What I thought was a great EC-application study which showcased an EC-based solution methodology to solve a billion-variable resource allocation problem (never done before), editors and reviewers of a leading EC journal suggested that I 'compare' my approach with a few recommended existing EC methods. Upon a survey, I found that the

suggested EC methods addressed completely different kinds of problems having only 500 to 1000 variables. It was obvious that these methods were generic and would not have worked on a specific problem class involving million to billion-variable integer variables. We developed a customized EC algorithm for solving such large-scale problems and our purpose was to demonstrate that the population-based approach with customized recombination and mutation operators was a better answer to this type of exa-scale optimization problems rather than the standard point-based structured algorithms. I really wanted the paper to appear in an EC-based journal so we, as a community, could celebrate and propagate EC techniques with such defining studies. Anyway, the paper was eventually published in a non-EC journal, after I withdrew the paper from the EC journal.

With this experience and from a few other recent reviews on my papers, I am increasingly convinced that most of our current reviewers expect that every article, to be published, must fall into certain patterns. A paper should have a new idea, but no matter how small or incremental the idea is, it must be compared with many existing algorithms, it must produce page-long tables presenting comparative results, and it must end by citing papers from most renowned authors in the field. Such a mindset of reviewers is harmful for the field in the long run. While there is a need for comparative studies, there is also a need for new and direction-providing papers, addressing bigger issues of the field, providing first-time ideas which cannot be compared with anything from the past, and defining applications that will keep EC alive and meaningful to practitioners. Let us be more inclusive and open-minded.

### **9. NSGA-III is one of the most cited and most widely used multi-objective evolutionary algorithms. Rumors say that it was also not easy for you to get the two initial works on this algorithm published. Is this true?**

It is true that the NSGA-III paper was rejected at first. Apparently, the paper exceeded the strict maximum two-time review policy restriction. Apparently, we failed to follow the suggestion of a reviewer to remove one of the three application problems, as the reviewer thought the paper was too long. I blame it to the lack of patience everyone has these days to pick signals from noise, but it is disturbing to think how many such trivial but harsh decisions are ruining the fate of important studies. I am glad that the decision was overturned eventually and the paper made its way to see the light of day, enriching the journal and EMO community and receiving significant attention to date.

Another not-so-fortunate outcome occurred with the Deb-Thiele-Laumanns- Zitzler (DTLZ) scalable test suite development paper, which never appeared in a journal due to its rejection, but its book-chapter version is probably one of the most highly-cited EMO articles today. I am sure everyone has such examples to cite, but we should all collectively plan for ways and means to reduce such unfortunate events, as these important studies, if can be envisioned by editors and reviewers about their possible future impact, could not only help the field, they will enhance the citation profile of our journals and conferences.

### **10. Finally, do you have a message for the authors out there that are struggling to get their research published?**

I actually have messages for both authors, reviewers and editors. I believe as an author of any work, we should first be "satisfied" and "happy" with our work. If the author is not happy about its content, how can the author convince reviewers or readers to pay attention to it? Thus, my message for authors is to keep improving your work until you think you have tried enough to bring the work to a logical conclusion and in your opinion the work contributes to advance the field. Then, look for a journal/conference which is most suitable for the work. If you are a budding researcher, I understand that you need a good "quantity" of papers, so work on as many ideas as you can, collaborate with as many researchers as you can, and publish. However, once in a while, take a break, and think big and look at your field from 10,000 feet above and identify areas that need deeper attention. Work on

these challenging ideas and see if you can make a crack. These works will give you fame, inspire you, and keep you alive.

As to the reviewers, my message is to have a bit of patience. Every article to be conceived worked on and written needs a lot of effort, taking many months to years, which every one of us has experience with. Treat others' papers the same way you would expect your articles to be treated. Here is an idea! Instead of assuming that the article you are reviewing is a reject to start with and looking for positive aspects to decide if you would accept the paper, think the other way. Assume every article is an accept to start with and then evaluate to see if it has enough new messages/results for it to be an accept or reject. Know that every author expects some constructive comments, particularly when the paper is rejected. If you are rejecting the article, please provide enough feedback so that authors find directions to modify it. As a reviewer, always know that you are in some sense in charge of what should be published and what should not be. You need to elevate yourself to decide the article's contribution to the overall growth and advancement of the field. You are a key component in this endeavor and everyone in the EMO community thanks you profusely for your time and efforts.

In my opinion, editors of journals and proceedings are the most influential persons in a field, indirectly controlling the focus of the field. They should not be intermediaries who simply count the number of accepts and rejects to decide the fate of a submission. They are the leaders of the field. They can judge a paper on their own very well and should be courageous enough to change a reviewer's comments and decisions if they think otherwise. Let every stakeholder in the review system (authors, editors and reviewers) care only about our field, its overall advancement and acceptance to contemporary other fields, rather than any other matter.

We have come a long way with all-round and well-grounded activities. Let us all together make the EMO research and application unbiased, top-notch, rewarding, and enjoyable. Let us all feel proud to be a part of the EMO revolution.

**Acknowledgments:** The opinions of Kalyanmoy Deb presented in this article are solely his own and does not reflect the views of anyone else, including his current and past employers and collaborators. His frank opinion and criticism, if any, is meant for no specific individual or entity, and the sole purpose of their mention here was to improve general quality of the state-of-the art of the EMO field.

**Conflicts of Interest:** The authors declare no conflict of interest.

**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

### *Review* **MCDM, EMO and Hybrid Approaches: Tutorial and Review**

**Ankur Sinha 1,\* and Jyrki Wallenius <sup>2</sup>**


**Abstract:** Most of the practical applications that require optimization often involve multiple objectives. These objectives, when conflicting in nature, pose both optimization as well as decision-making challenges. An optimization procedure for such a multi-objective problem requires computing (computer-based search) and decision making to identify the most preferred solution. Researchers and practitioners working in various domains have integrated computing and decision-making tasks in several ways, giving rise to a variety of algorithms to handle multi-objective optimization problems. For instance, an *a priori* approach requires formulating (or eliciting) a decision maker's value function and then performing a one-shot optimization of the value function, whereas an *a posteriori* decisionmaking approach requires a large number of diverse Pareto-optimal solutions to be available before a final decision is made. Alternatively, an *interactive* approach involves interactions with the decision maker to guide the search towards better solutions (or the most preferred solution). In our tutorial and survey paper, we first review the fundamental concepts of multi-objective optimization. Second, we discuss the classic interactive approaches from the field of Multi-Criteria Decision Making (MCDM), followed by the underlying idea and methods in the field of Evolutionary Multi-Objective Optimization (EMO). Third, we consider several promising MCDM and EMO hybrid approaches that aim to capitalize on the strengths of the two domains. We conclude with discussions on important behavioral considerations related to the use of such approaches and future work.

**Keywords:** evolutionary multi-objective optimization; multi-criteria decision making; interactive optimization

### **1. Introduction**

Multi-Criteria Decision Making (MCDM) as a scientific field is some 60 years old. Its roots are in Goal Programming [1] and Multi-Attribute Utility Theory (MAUT) [2]. A subsequently popular subfield, interactive man–machine multi-objective optimization, developed greatly during the 1970s. The common frameworks used a discrete set of choices and a mathematical programming problem formulation (optimization) to solve multi-objective problems. With the interactive approaches, phases of decision making and computing would alternate. The aim was to converge towards the most preferred solution on the Pareto-optimal frontier.

Independently from MCDM, the Evolutionary Multi-Objective Optimization (EMO) approaches started developing during the 1980s [3]. Many of the EMO scholars had an engineering or a computer science background. EMO algorithms [4,5] have been applied to problems with multiple objectives for the task of finding a well-representative set of Paretooptimal solutions. These methods [6,7] have been successful in solving a wide variety of problems with two or three objectives. However, these methodologies are criticized for their excessive computational expense, and they often tend to suffer while solving problems with objectives higher than three [8,9]. The major hindrances in handling a higher number of objectives relate to stagnation in search, increased dimensionality of Pareto-optimal front, large computational cost, and difficulty in visualization of the objective space. These

**Citation:** Sinha, A.; Wallenius, J. MCDM, EMO and Hybrid Approaches: Tutorial and Review. *Math. Comput. Appl.* **2022**, *27*, 112. https://doi.org/10.3390/ mca27060112

Academic Editors: Carlos Coello, Erik Goodman, Kaisa Miettinen, Dhish Saxena, Oliver Schütze and Lothar Thiele

Received: 31 October 2022 Accepted: 14 December 2022 Published: 19 December 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

difficulties are inherent to optimization problems having a large number of objectives and are not easy to eliminate; rather, procedures to handle such difficulties need to be explored. EMO methods that are better equipped at handling a larger number of objectives are being continuously explored [10–12]. Some of these approaches aim for solutions that are near Pareto-optimal and provide a discretized and diverse representation of the highdimensional frontier for many-objective (i.e., more than two or three objectives) problems. However, the level of discretization for an accurate and well-represented many-objective frontier would require a very large number of points. Even if a fine-grained discretization is achieved with a large number of points, the decision-making challenges still remain.

The areas of MCDM and EMO were solving similar problems; therefore, the researchers working in these domains decided to pursue active collaboration through formal channels such as common conferences and seminars. As a result, Branke, Deb, Miettinen, and Słowi ´nski organized the first Dagstuhl seminar [13] in 2004 to allow collaboration between the two communities. This led to researchers combining ideas from MCDM to EMO and vice-versa. Since then, the Dagstuhl seminar has been organized every few years to enhance the collaboration and flow of ideas from one research community to the other. In this article, we evaluate the classic studies in MCDM and EMO and also the hybrid approaches that have been proposed for handling many-objective problems. Some of the review papers that talk about interactive multi-objective optimization are [14,15]. This article takes a tutorial-cum-review approach to discuss the classic ideas published in the areas of MCDM, EMO, and their intersection and is structured as follows. In Section 2, we cover the theoretical concepts on optimization and decision making that arise in the multi-objective literature. This is followed by Section 3, where we discuss how search and decision making can be integrated together in various ways to find the most preferred point for the decision maker (DM). Thereafter, we discuss the classic MCDM (Section 4), EMO (Section 5), and hybrid (Section 6) approaches that have been discussed in the literature over the past few decades. We conclude the article in Section 7 with discussions on behavioral considerations and future work.

### **2. Multi-Objective Optimization**

Multi-objective optimization [4,16–18] involves two or more conflicting objectives that are supposed to be simultaneously optimized subject to a given set of constraints. These problems arise in various fields of science, engineering, economics, and mathematics and have been widely studied in the literature. However, modern applications keep posing challenges with an increasing level of complexity. The complexity depends on a number of factors, such as number of objectives, number of decision variables, type of decision variables (continuous, discrete), number of constraints, and functional form of the functions in the optimization problems (linear, convex, non-convex, non-differentiable, etc.) that may lead to non-separability and multi-modality. While many of the above difficulties are common to single-objective optimization as well, multi-objective optimization poses additional challenges as such problems do not have a single solution which would simultaneously maximize/minimize each of the objectives; instead, there is a set of solutions from which a rational DM should choose. These solutions are called Pareto-optimal solutions. Choosing the most preferred solution from the set of Pareto-optimal solutions requires an additional step of decision making, which is often subjective and not straightforward to model. The challenges posed by multi-objective optimization often include inability to generate a complete ordering of points and requirement of maintaining a pool of non-dominated points. A feasible point in multi-objective optimization is considered to be non-dominated within a set when there does not exist any other feasible point that is better than the former in terms of some objective and is not worse than the former in terms of other objectives. The concept is discussed in detail in Section 2.1. Difficulty in representation and visualization of the solutions in objective space, especially while working with many objectives, makes decision making difficult, and therefore requires preference learning while searching for

the point most preferred by the DM. Below, we describe a general multi-objective problem (*p* ≥ 2):

$$\begin{array}{ll}\text{Maximize} & \mathbf{f}(\mathbf{x}) = \left(f\_1(\mathbf{x}), \dots, f\_p(\mathbf{x})\right) \\ \text{subject to} & \mathbf{g}(\mathbf{x}) \ge \mathbf{0}, \mathbf{h}(\mathbf{x}) = \mathbf{0} \\ & \mathbf{x}^{(L)} \le \mathbf{x} \le \mathbf{x}^{(L)} \end{array} \tag{1}$$

In the above formulation, **x** = (*x*1, *x*2, ... , *xn*) is the *n*-dimensional decision variable vector which represents the decision space. A search is expected to be performed within the constrained region of the decision space that is determined by the inequality constraints (**g**(**x**) <sup>≥</sup> **<sup>0</sup>**), equality constraints (**h**(**x**) = **<sup>0</sup>**) and box constraints (**x**(*L*) <sup>≤</sup> **<sup>x</sup>** <sup>≤</sup> **<sup>x</sup>**(*U*)). We refer to the set of solutions which are feasible with respect to the constraints and are *non-dominated* with respect to all feasible solutions, as *Pareto-optimal* solutions. Among the Pareto-optimal solutions, the solution that is the most preferred by the DM will be referred to as the *most preferred solution*. We provide formal definitions for these terms in the next sections.

Note that the objective vector **f**(**x**) is the image of the decision vector **x** under the objective function **f**. In a single-objective optimization (*p* = 1) problem, the feasible set is completely ordered according to the objective function **f**(**x**) = *f*1(**x**), such that for solutions, **<sup>x</sup>**(1) and **<sup>x</sup>**(2) in the decision space, either *<sup>f</sup>*1(**x**(1)) <sup>≥</sup> *<sup>f</sup>*1(**x**(2)) or *<sup>f</sup>*1(**x**(2)) <sup>≥</sup> *<sup>f</sup>*1(**x**(1)). Therefore, for two solutions in the objective space, there are two possibilities with respect to the ≥ relation. However, when several objectives (*p* ≥ 2) are involved, the feasible set is not necessarily completely ordered but partially ordered. In multi-objective problems, for any two objective vectors, **<sup>f</sup>**(**x**(1)) and **<sup>f</sup>**(**x**(2)), the relations <sup>=</sup> <sup>&</sup>gt; and <sup>≥</sup> can be extended as follows:


While comparing the multi-objective scenario with the single-objective case, we find that for two solutions in the objective space there are three possibilities with respect to the <sup>≥</sup> relation. These possibilities are: **<sup>f</sup>**(**x**(1)) <sup>≥</sup> **<sup>f</sup>**(**x**(2)), **<sup>f</sup>**(**x**(2)) <sup>≥</sup> **<sup>f</sup>**(**x**(1)) or **<sup>f</sup>**(**x**(1)) - **<sup>f</sup>**(**x**(2)) <sup>∧</sup> **<sup>f</sup>**(**x**(2)) **f**(**x**(1)). If any of the first two possibilities are met, it allows to rank or order the solutions independent of any preference information (or a DM). On the other hand, if the first two possibilities are not met, the solutions cannot be ranked or ordered without incorporating preference information (or involving a DM). Drawing analogy from the above discussion, the relations < and ≤ can be extended in a similar way.

### *2.1. Dominance Concept*

Based on the established binary relations for two vectors in the previous section, the following dominance concept [16] can be constituted:


In the case of weak dominance, it is common to drop the word *weak* and refer to it only with *dominance*, which is why we use the word *weak* in brackets. Dominance of **x**(1) over **x**(2) essentially means that no component of **f**(**x**(1)) is less than the corresponding component of **f**(**x**(2)), and at least one component of **f**(**x**(1)) is greater than the corresponding component of **f**(**x**(2)). The above dominance concept is also explained in Figure 1 for a two-objective maximization case. In Figure 1, two shaded regions are shown in reference to point A. The shaded region in the north-east corner (excluding the lines) is the region which strongly dominates point A, the shaded region in the south-west corner (excluding the lines) is strongly dominated by point A, and the non-shaded region is the non-dominated region. Therefore, point A strongly dominates point B, and points A, E, and D are non-dominated with respect to each other. Note that point A weakly dominates point C. From hereon, we only talk about dominance by avoiding the word *weak*.

**Figure 1.** Dominance concept for a maximization problem where A dominates B and C; A, D, and E are non-dominated.

Many of the existing evolutionary multi-objective optimization algorithms use the dominance principle to converge towards the Pareto-optimal set of solutions. The concept allows us to partially order two decision vectors based on the corresponding objective vectors in the absence of any preference information. The algorithms which operate with a sparse set of solutions in the decision space and the corresponding images in the objective space usually give priority to a solution which dominates another solution. The solution which is not dominated with respect to any other solution in the sparse set is referred to as a *non-dominated solution* within that set.

In case of a discrete set of solutions: the subset whose solutions are not dominated by any solution in the discrete set is referred to as the *non-dominated set* within the discrete set. When the set in consideration is the entire search space, the resulting non-dominated set is referred to as a *Pareto-optimal set*, or the frontier formed with these points is referred to as the *Pareto-optimal front*. To formally define a Pareto-optimal front, consider a set **X** which constitutes the entire decision space with solutions **x** ∈ **X**. The subset **X**<sup>∗</sup> : **X**<sup>∗</sup> ⊂ **X**, containing solutions **x**∗ which are not dominated by any **x** in the entire decision space, forms the Pareto-optimal front.

The concept of a Pareto-optimal front and a non-dominated set are illustrated in Figure 2. The shaded region in the figure represents **f**(**x**) : **x** ∈ **X**. It is the image in the objective space of the entire feasible region in the decision space. The bold curve represents the Pareto-optimal front for a maximization problem. Mathematically, this curve is **f**(**x**∗) : **x**<sup>∗</sup> ∈ **X**∗, which are all the optimal points for the two objective optimization problem. A number of points are also plotted in the figure, which constitute a discrete set. Among this set of points, the points connected by broken lines are the points which are not dominated by any point in the discrete set. Therefore, these points constitute a non-dominated set within the discrete set. The other points which do not belong to the non-dominated set are dominated by at least one of the points in the non-dominated set.

In the field of MCDM, a Pareto-optimal point **f**(**x**∗) in the objective space is often referred to as a non-dominated point, as it is not dominated by any feasible point in the objective space. The corresponding decision vector **x**∗ is referred to as an *efficient point*. Similarly, if **f**(**z**) is a dominated point in the objective space, then **z** would be referred to as an *inefficient point* in the decision space. In other words, a point is efficient if and only if it is the inverse image of a non-dominated objective vector, and it is inefficient if and only if it is an inverse image of a dominated objective vector.

**Figure 2.** Non-dominated set from a discrete set of points and a Pareto-optimal front that dominates the entire search space.

### *2.2. Decision Making*

Even though there are multiple potentially optimal solutions to a multi-objective problem, there is often just a single solution which is of interest to the DM; which is termed as the *most preferred solution*. Search and decision making are two intricacies [19] involved in handling any multi-objective problem. Search requires an intensive exploration in the decision space to get close to the Pareto-optimal solutions; on the other hand, decision making is required to provide preference information over the available non-dominated solutions in pursuance of the most preferred solution.

In a decision-making context, the solutions can be compared and ordered based on the preference information, though there can be situations where strict preference of one solution over the other is not obtained, and the ordering is partial. For instance, consider two vectors, **x**(1) and **x**(2), in the decision space, having their images, **f**(**x**(1)) and **f**(**x**(2)), in the objective space. A preference structure can be defined using three binary relations , ∼, and . The meaning of the binary relations are provided below:


where the preference relation, , is asymmetric, the indifference relation, ∼, is reflexive and symmetric, and the incomparability relation, , is irreflexive and symmetric. A weak preference relation can be established as = ∪∼ such that

• **<sup>x</sup>**(1) **<sup>x</sup>**(2) <sup>⇔</sup> **<sup>x</sup>**(1) is either preferred over **<sup>x</sup>**(2) or they are equally preferred.

As already mentioned, preference can easily be established for pairs where one solution dominates the other. However, for pairs which are non-dominated with respect to each other, a DM's input is required to establish a preference. The following is the inference for preference choice which can be drawn from dominance:

• If **<sup>x</sup>**(1) dominates **<sup>x</sup>**(2) <sup>⇒</sup> **<sup>x</sup>**(1) **x**(2).

It is common to emulate a DM with a value function, *V*(*f*1(**x**), ... , *fp*(**x**)), which is scalar in nature and assigns a value or a measure of satisfaction to each of the solutions. For two solutions, **x**(1) and **x**(2):


### *2.3. Preference Eliciting and Modeling*

There are several ways of eliciting preference information from the DM that can be used to create a preference model to be incorporated in the search process. Some of the approaches are listed below:


After the preferences are obtained from the DM, there are various ways in which the information is incorporated in the search process. For instance, value functions could be generated based on the preferences expressed by the DM. Methods differ based on the kind of value function, i.e., linear or non-linear, that is chosen to model preference information. While some methods generate a single maximum discriminating value function fitting preference information, others generate multiple value functions fitting the same preference information. Scalarizing functions (for example, see [20]), weighted sum of objectives (similar to a linear value function), and the *-*-constraint method [21] are other approaches to convert the multi-objective problem into a single-objective problem that aligns with the DM's preferences. Sometimes, the dominance principle is modified to search in a region that better fits the preferences of the DM.

There are other very interesting approaches to modeling preferences in MCDM. Such approaches are outranking relations and rule-based models. Outranking methods were developed by B. Roy in the late 1960s, originating from criticism of utility theory in solving practical problems (see [22,23]). An outranking relation is a binary relation. It is based on the ideas of concordance and discordance. "Loosely speaking", alternative **x** outranks **y**, if there are enough arguments (attributes favoring **x** over **y**) to declare that **x** is at least as good as **y** while there is no essential reason to refute this statement. Decision rules are expressions of the form "if, then" [24]. Procedures for generating decision rules use an inductive learning principle. The authors distinguish three types of rules: certain, possible, and approximate. Certain rules are generated from lower approximations of unions of classes; possible rules are generated from upper approximations of unions of classes and approximate rules are generated from boundary regions. To structure the data prior to the induction of rules, the authors suggest using the Dominance-based Rough Set Approach (DRSA) [25]. As an illustrative example, the authors consider the problem of evaluating high school students based on performance in some of the subjects using "if, then" rules. Multi-criteria classification and sorting are frequently considered problems in rule-based preference modeling, although the ideas can be extended for the problem of identifying the most preferred alternative. Both outranking relations and rule-base preference modeling were originally developed for the problem of choosing among discrete (known) alternatives, and not the mathematical programming or EMO context. Hence, we do not extensively cover them in our survey and tutorial. There are some exceptions, though. For example, the Light Beam search approach (which is based on utilizing outranking relations) was developed for solving multi-objective mathematical programming problems [26]. In a later section, we illustrate how the Light Beam approach is used in an EMO context.

In the later part of the paper, we discuss approaches that elicit and model the preferences of the DM in different ways while searching for the most preferred point.

### **3. Incorporating Decision Maker's Preferences**

Searching and decision making can be combined in various ways to generate procedures which can be classified into three broad categories.

### *3.1. A Priori Approach*

In this approach, DM's preferences are elicited before the start of the algorithm, then the optimization algorithm is executed by incorporating the preference information, and the most preferred solution is identified. Figure 3 shows the process followed to arrive at the most preferred solution. This approach has been common among MCDM practitioners, who realized the complexities involved in decision making for such problems. Their approach to the problem is to ask simple questions from the DM before starting the search process.

**Figure 3.** *A priori* approach.

After eliciting information from the DM, the multi-objective problem is usually converted into a single-objective problem. One of the early approaches, that is, MAUT [2], used the initial information from the DM to construct a utility function which reduced the problem to a single-objective optimization problem. Scalarizing functions (for example, [20]) are also commonly used by the researchers in this field to convert a multi-objective problem into a single-objective problem.

Since information is elicited towards the beginning, the solution obtained after executing the algorithm may not be close to the most preferred solution. Moreover, the DM's preferences might be different for solutions close to the Pareto-optimal front, and the initial inputs taken from them may not conform to it. Therefore, relying on this approach, it may be difficult to get close to the actual solution which meets the requirements of the DM. The approach is also highly error-prone, as even slight deviations in providing preference information at the beginning may lead to entirely different solutions. Such errors are common because of the inability of the DM to reliably express preferences in case of not knowing the solution space or having no precise understanding of own preferences at the beginning of the preference elicitation process. To avoid the errors due to deviations, researchers in the EMO field used the approach in a slightly modified way. They produced multiple solutions in the region of interest to the DM (often close to the Pareto-optimal front) [27–30], and then elicited the DM's preferences. We discuss this approach next.

### *3.2. A Posteriori Approach*

In this approach, after a set of (approximate) Pareto-optimal solutions are obtained using an optimization algorithm, decision making is performed to find the most preferred solution. Figure 4 shows the process followed to arrive at the final solution which is most preferred to a DM. This approach is based on the assumption that a complete knowledge of all the alternatives helps in taking better decisions. The research in the field of evolutionary multi-objective optimization has been directed along this approach, where the aim is to produce all the possible alternatives for the DM to make a choice. Until relatively recently, the community has largely ignored decision-making aspects and has been striving towards producing all the possible optimal solutions.

**Figure 4.** *A posteriori* approach.

There are enormous difficulties in finding the entire Pareto-optimal front for a manyobjective problem. Even if it is assumed that an algorithm can approximate the Paretooptimal front for a high-objective problem with a huge set of points, the herculean task of choosing the best point from the set still remains. For two and three objectives where the solutions in the objective space could be represented geometrically, making decisions might be easy (though even such an instance could be, in reality, a difficult task for a DM). Imagine a multi-objective problem with more than three objectives for which an evolutionary multiobjective algorithm is able to produce the entire front. The front is approximated with a large number of points and high accuracy. Since a graphical representation is not possible for the Pareto-points, how is a DM going to choose the most preferred point? There are of course decision aids available, but the limited accuracy with which the final choice could be made using these aids questions the purpose of producing the entire front with high accuracy. Binary comparisons can be a solution to choose the best point out of a set, but this can only be utilized if the points are very few in number. Therefore, offering the entire set of Pareto-points should not be considered as a complete solution to the problem. However, the difficulties related to decision making have been realized by EMO researchers only after copious research has already gone towards producing the entire Pareto-front for many-objective problems. Most of the EMO algorithms [6,7,10–12,31–33] that aim to produce the entire Pareto-optimal front would lie in this category.

### *3.3. Interactive Approach*

In this approach, the DM interacts with the optimization algorithm and has multiple opportunities to provide preference information to the algorithm. The interaction between the DM and the optimization algorithm continues until a solution acceptable to the DM is obtained. The process is represented in Figure 5. Based on the type of interaction of the DM with the optimization algorithm, this approach is often implemented in two ways.

**Figure 5.** Interactive approach.

The first approach involves elicitation of preference information and execution of the optimization algorithm to obtain one or many Pareto-optimal solutions. If a solution acceptable to the DM is obtained, the process is terminated; otherwise, the process is restarted and continued until a satisfactory solution is found. In this approach, the progression towards the most preferred solution may take place on the Pareto-optimal frontier. MCDM researchers following an interactive approach usually elicit preference information and find a solution conforming to the inputs given by the DM. They iterate this process until a satisfactory solution is obtained. For example, when using a scalarization function, multiple reference points (or starting points) could be provided by the DM. Once a reference point is available, the computer provides a projection of that point on the Pareto-optimal frontier. This process converts the problem into a single-objective optimization problem and produces one of the Pareto-optimal points as the solution. If the point finally produced is not to the liking of the DM, the search is continued with new reference points and projections. This process is continued until a solution acceptable to the DM is obtained. The iterations of a simple algorithm using this approach are shown in Figure 6. The figure shows that a DM is able to find a satisfactory solution in three iterations.

**Figure 6.** Interaction after a run.

EMO researchers have taken cues from their MCDM counterparts, wherein they used the powerful evolutionary search tool to produce multiple solutions in the region of interest to the DM or generate a small part of the Pareto-front which the DM finds interesting. This is a similar approach where interactions happen before and after a complete run of the EMO. The algorithm produces multiple solutions in a particular region or multiple regions of the Pareto-optimal front in a single run. Once the solutions are produced, another decision-making task is performed, and the solution to the liking of the DM is chosen. If none of the solutions are acceptable to the DM, the process of elicitation and search is repeated until a satisfactory solution is found. Some examples of evolutionary procedures which have used this approach are [34,35].

The second approach involves elicitation of preference information periodically from a DM while the optimization algorithm is progressing towards the Pareto-optimal frontier. In this approach, preference information is taken at the intermediate steps of the search algorithm, and the algorithm proceeds towards the most preferred point. This is an effective integration of the search and decision-making process, as both work simultaneously towards the exploration of the solution. Such an integration avoids multiple optimization runs and is therefore preferable for problems that are computationally expensive. It also allows the DM to better understand the consequences of their actions, as they can immediately see how the convergence direction changes. Some previous works which have been conducted in a similar vein in the MCDM field are [36,37], and in the EMO field, are [38–44]. The iterations of an algorithm that uses this approach, commonly referred to as a progressively interactive approach, is shown in Figure 7. The DM is presented with a set of points and is expected to choose one of the points to start the search. The choice of the DM gives clues to the search algorithm about the search direction, and the algorithm progresses towards the most preferred solution. The DM may change their preference structure as the search progresses, and the algorithm is able to adapt to such changes.

**Figure 7.** Progressive interaction during the run.

### **4. MCDM Interactive Techniques**

Linear Programming (LP) was rather popular in large Western companies in the 1960s and 1970s, as well as in Gosplan (central government agency) for government level planning in the Soviet Union. To address the need to solve multi-objective LPs, Charnes and Cooper developed Goal Programming [1] in late 1950s and coined the name in the early 1960s. In Goal Programming, the DM is asked to specify aspiration levels in terms of objectives. The algorithm then finds a feasible solution that would minimize the weighted deviations from the aspiration levels. The original version of Goal Programming was for solving multipleobjective LPs. Goal Programming was not an interactive approach, and there was not an

option to update the aspiration levels. In multi-objective linear programs, the concept of an optimum was being replaced by a "compromise" or a "non-dominated solution".

With simultaneous advances in computer technology (teletypes accessing main frame computers), the idea of interactively or progressively solving multi-objective optimization problems was proposed in early 1970s. In the interactive approach:


We review the following classic interactive multi-objective optimization methods, which all represented the state of the art at the time:


### *4.1. STEP Method (Benayoun et al., 1971) [45]*

The ancestor of the STEP method [45] was the Progressive Orientation Procedure (POP) by Benayoun and Tergny [48]. In the POP method, a subset of efficient extreme points is computed and presented to the DM for her evaluation. The DM can either choose the most preferred solution, or choose an attractive subset, and so forth. The STEP method was one of the first truly interactive approaches for solving multi-objective LPs. In this man–model symbiosis, phases of computation alternate with phases of decision. The process allows the DM to learn to recognize good solutions and the relative importance of the objectives.

In the STEP method, each objective is optimized one at a time to obtain the ideal point of the problem. For a maximization problem, the components of the ideal point describe the upper bounds of the individual objectives for the points corresponding to the Paretooptimal front. Similarly, the nadir point (not used in STEP method) is defined as the lower bounds of the individual objectives for the points corresponding to the Pareto-optimal front. Denote the ideal point as **M** = (*M*1, *M*2, ... , *Mp*). At each iteration, the following LP problem is solved to obtain the feasible compromise solution **x**(*k*) (*k* is the iteration counter), which is nearest in the *minimax* sense to **M**:

$$\begin{array}{ll}\text{Minimize} & q\\\text{subject to} & q \ge (M\_i - f\_i(\mathbf{x})) \lambda\_i \,\forall \, i \in 1, \dots, p\\ & \mathbf{x} \in B^k\\ & q \ge 0 \end{array} \tag{2}$$

where *B<sup>k</sup>* is the feasible region at iteration *k*, *fi*(**x**) is the function for the *i*th objective at decision **x**, and *λ<sup>i</sup>* is the set of normalized weights (not specified by the DM). At the decision phase, the objective function vector associated with the compromise solution **x**(*k*) is presented to the DM. Next, the DM must choose the objectives *fi*<sup>∗</sup> (if any), where *i* <sup>∗</sup> ⊂ {1, ... , *p*}, which they would be willing to worsen to allow an improvement in the unsatisfactory ones. Then, the DM must specify the maximal amount of relaxation in the above objectives. At the next iteration, the feasible region is modified as *<sup>B</sup>k*+<sup>1</sup> <sup>=</sup> {**<sup>x</sup>** : *fj*(**x**) <sup>≥</sup> *fi*(**x***k*) <sup>−</sup> <sup>Δ</sup>*fj*, *fi*(**x**) <sup>≥</sup> *fi*(**x***k*) <sup>∀</sup> *<sup>i</sup>* <sup>∈</sup>/ *<sup>i</sup>* <sup>∗</sup>, *j* ∈ *i* ∗}. The weights of the objectives to be relaxed are set to 0, and the next calculation phase is performed. The process is terminated as soon as the DM has found a satisfactory solution. The solutions at termination are not necessarily always non-dominated, but with modifications, they can all be made non-dominated. Note that the *minimax* operation corresponds to minimizing the Chebycheff norm.

### *4.2. GDF Algorithm (Geoffrion, Dyer, and Feinberg, 1972) [36]*

In Geoffrion, Dyer, and Feinberg's algorithm [36], the problem is formulated as follows:

$$\begin{array}{ll}\text{Maximize} & \mathcal{U}(f\_1(\mathbf{x}), \dots, f\_p(\mathbf{x})) \\ \text{subject to} & \mathbf{x} \in \mathbf{X} \end{array} \tag{3}$$

where **X** is the feasible set (convex and compact), *fi* are objective functions of the decision vector **x**, and *U* is the overall utility (or value) function defined over the values of the objectives, assumed to be concave (under maximization) and differentiable. Everything else, except for *U*, is assumed to be explicitly known. *U*, however, is only assumed to be implicitly known. (If *U* were explicitly known, the problem would be an ordinary non-linear program.)

The GDF algorithm uses a modification of the Frank–Wolfe [49] algorithm from 1956. Note that the Frank-Wolfe algorithm is a steepest ascent algorithm. Two problems alternate: the direction-finding problem and the step-size problem. Let us ignore for the moment that *U* is not explicitly known, then the algorithm progresses as follows:


$$\begin{array}{ll}\text{Maximize} & \nabla \mathbf{x} \mathcal{U}(f\_1(\mathbf{x}), \dots, f\_{\mathcal{P}}(\mathbf{x})) \mathbf{y} \\\text{subject to} & \mathbf{y} \in \mathbf{X}. \end{array} \tag{4}$$


$$\begin{array}{ll}\text{Maximize} & \mathcal{U}(f\_1(\mathbf{x}^{(k)} + t\mathbf{d}^{(k)}), \dots, f\_p(\mathbf{x}^{(k)} + t\mathbf{d}^{(k)})) \\\text{subject to} & 0 \le t \le 1. \end{array} \tag{5}$$

5. Set **x**(*k*+1) = **x**(*k*) + *t <sup>k</sup>***d**(*k*) , *k* = *k* + 1, and return to the direction-finding problem. Theoretical termination criterion is satisfied if **x**(*k*) and **x**(*k*+1) are equal.

Now, assume that we do not know *U*. The gradient of *U* can be replaced with the sum of the product of weights *w<sup>k</sup> <sup>i</sup>* times the gradient of *f* in terms of **x**.

$$\begin{array}{ll}\text{Maximize} & \Sigma\_{i=1}^{p} w\_{i}^{k} \nabla\_{\mathbf{x}} f\_{i}(\mathbf{x}^{(k)}) y\_{i} \\ \text{subject to} & \mathbf{y} \in \mathbf{X} \end{array} \tag{6}$$

where we define *w<sup>k</sup> <sup>i</sup>* <sup>=</sup> *<sup>∂</sup>U*/*<sup>∂</sup> <sup>f</sup>* (*k*) *i ∂U*/*∂ f* (*k*) *j* , *i* = 1, ... , *p* with *fj* being arbitrarily chosen as the reference criterion. The weights reflect the DM's tradeoff between *fj* and *fi* (at the current point), and must be elicited from the DM. We determine what change Δ*fj* in the reference criterion exactly compensates for a change Δ*fi*: *w<sup>k</sup> <sup>i</sup>* <sup>=</sup> <sup>−</sup>Δ*fj* Δ*fi* . This is the Marginal Rate of Substitution (MRS) between the objectives.

The step-size problem must be solved directly by the DM. In early work, the computer would tabulate the values of the objectives at selected intervals and let the DM choose from this numerical display their most preferred solution.

### *4.3. ZW Method (Zionts and Wallenius, 1976) [37]*

The Zionts–Wallenius method [37] is a simple-to-use multi-objective "simplex method", which companies could easily adopt for relatively large-scale problems. The authors initially made the assumption that the DM's underlying (implicit) value function would be

linear (in terms of the objectives). LP theory suggests that the optimal solution would be a non-dominated extreme point solution. Hence, it would be sufficient to operate with efficient extreme point solutions. The authors first developed a *naïve* approach, which starts with an efficient extreme point and asks the DM about neighboring extreme points: Do you prefer any of the neighboring points to the current point? If yes, the DM is moved to one of the preferred neighbors and the method continues. If not, the optimal solution (or most preferred solution) is assumed to be found. The problem with the *naïve* approach is that the convergence was awfully slow for even moderately large problems. Therefore, a more elaborate approach had to be thought through to make the algorithm more efficient.

In the elaborate approach, the process starts by assuming some arbitrary (positive) weights for the objectives. If no other information exists, one may start with equal weights. The method uses the current set of weights to generate a non-dominated solution, and then asks the DM to tell whether any of the "efficient" neighboring solutions are preferred to the current solution (or a unit movement in that direction = trade-offs). If not, the most preferred solution is found, otherwise, the process continues. Note that the trade-offs can be obtained from the simplex table corresponding to the objective function rows and the non-basic variable columns.

The following so-called "*λ*-problem" tells how the weights are updated based on the DM's yes/no answers:

$$\begin{array}{ll}\text{Maximize} & \epsilon\\\text{subject to} & \Sigma\_{i=1}^{p}\lambda\_{i}\mathbf{x}\_{i}^{(r)} - \epsilon \geq \Sigma\_{i=1}^{p}\lambda\_{i}\mathbf{x}\_{i}^{(s)} \; \forall \; \mathbf{x}^{(r)} \in \mathbf{X}\_{r}, \mathbf{x}^{(s)} \in \mathbf{X}\_{s} \\ & \Sigma\_{i=1}^{p}\lambda\_{i} = 1\\ & \lambda\_{i} > 0, i \in 1, \ldots, p \end{array} \tag{7}$$

The sets **X***<sup>r</sup>* and **X***<sup>s</sup>* contain points where every element in **X***<sup>r</sup>* is preferred to every element in **<sup>X</sup>***s*, i.e., **<sup>x</sup>**(*r*) **<sup>x</sup>**(*s*) <sup>∀</sup> *<sup>r</sup>*,*s*. The updated weights are used to generate an improved non-dominated extreme point solution and the process is repeated. The process terminates when none of the neighboring extreme point solutions are preferred to the current solution, which is assumed to be the optimal solution. Note that, in this approach, it is not necessary to ask the DM about all neighboring extreme point solutions, but only the *efficient* ones. The algorithm was tested for moderately sized LP problems with 3–4 objectives.

### *4.4. Reference Point Method (Wierzbicki, 1980) [20]*

The Reference Point method [20] asks the DM to provide aspiration levels for the objectives. The aspiration point is then projected to the non-dominated frontier. Note that it does not matter whether the aspiration point provided by the DM is feasible or not. In the projection, Wierzbicki used the so-called Achievement Scalarizing Function (ASF), which was minimized as:

$$\begin{array}{ll}\text{Min} & \text{Max}^p\_{i=1}(\frac{\underline{g}\_i - f\_i(\mathbf{x})}{w\_i}) + \rho \Sigma^p\_{i=1} \frac{\underline{g}\_i - f\_i(\mathbf{x})}{w\_i} \\ \text{subject to} & \mathbf{x} \in \mathbf{X} \end{array} \tag{8}$$

where *wi* > 0 is a set of weights, *ρ* is a small number, and *gi* is the vector of aspiration levels. Note that when *ρ* = 0, the indifference contours being optimized are orthogonal (90 degree angle); when *ρ* > 0, the indifference contours being optimized form an angle between 90 and 180 degrees. Once the non-dominated projection of the aspiration levels is found, the method asks the DM to update the aspiration levels. The method stops when the DM is satisfied with the solution. In contrast with the GDF and ZW methods, no assumptions of *U* are made.

### *4.5. Reference Direction Approach (Korhonen and Laakso, 1986) [46]*

Instead of projecting a single reference point using Wierzbicki's ASF, Korhonen and Laakso [46] suggested projecting multiple directions to the efficient frontier. The projection was determined by solving the following parametric program:

$$\begin{array}{ll}\text{Min} & \epsilon \\ \text{subject to} & f\_i(\mathbf{x}) + \epsilon w\_i \ge q\_i + td\_i, \ i \in 1, \ldots, p \\ & \mathbf{x} \in \mathbf{X} \end{array} \tag{9}$$

where *wi* > 0 is a set of weights, *qi* is any vector in the criterion space, and *di* = *gi* − *qi* is a reference direction, with *gi* being an aspiration level or a reference goal in the spirit of Wierzbicki's reference point approach. When the parameter *t* in the above problem is varied from zero to infinity, an efficient curve emanating from point **q** is obtained.

The interface of the reference direction method is similar to the GDF method. When the DM has identified the most preferred solution along the projection, then they are asked to revise their aspiration point, and the process is repeated. An extension and application of the reference direction approach on multi-objective quadratic linear programming can be found in [50].

### *4.6. Pareto Race (Korhonen and Wallenius, 1988) [47]*

Pareto Race [47] is a visual, dynamic search procedure for exploring the non-dominated frontier of a multi-objective LP problem. It is based on the idea of projecting reference directions on the efficient frontier. However, no aspiration levels are elicited from the DM. Instead, if the DM wants to improve the value of a certain objective, they press the number key (one or more times, depending on the relative desired improvement in that objective) of the corresponding objective.

There is an analogy to driving an automobile (on the efficient frontier). The user sees the objective function values on a display in numeric form and as bar graphs as they travel along the non-dominated frontier. Keyboard controls include accelerator, gears, breaks, and a steering mechanism. Technically, two parameters are used to control the motion: the reference direction (direction) and step size (speed). Figure 8 shows the interactive dashboard used in the Pareto Race approach.

**Figure 8.** Pareto Race interface.

### **5. EMO Introduction and History**

An evolutionary algorithm is a general population-based optimization algorithm which uses a mechanism inspired by biological evolution, i.e., selection, crossover, mutation, and replacement. The common underlying idea behind an evolutionary technique is

that, for a given population of individuals, the environmental pressure causes natural selection, which leads to a rise in fitness of the population. A comprehensive discussion of the principles of an evolutionary algorithm can be found in [51–55]. In contrast to classical algorithms, which iterate from one solution point to the other until termination, an evolutionary algorithm works with a population of solution points. Each iteration of an evolutionary algorithm results in an update of the previous population by eliminating inferior solution points and including the superior ones. In the terminology of evolutionary algorithms, an iteration is commonly referred to as a generation and a solution point as an individual. A pseudo-code for a general genetic algorithm, which is a type of evolutionary algorithm, is provided below:

Step 1: Create a random initial population (i.e., a set of solution points in the decision space).


Substep 1: Select the fitter individuals (referred to as parents) from the population for reproduction (i.e., producing new solution points through genetic operators of crossover and mutation).

Substep 2: Produce new individuals (referred to as offspring) through crossover and mutation operators.

Substep 3: Evaluate the new individuals and assign fitness.

Substep 4: Replace the low-fitness individuals in the population with high-fitness individuals that may have been generated through crossover and mutation.

Step 4: Report the highest fitness individual as the output.

Along with the pseudo-code presented above, a flowchart for a general evolutionary algorithm is presented in Figure 9. In evolutionary algorithms, to begin with, a pool of individuals is generated by randomly creating points in the search space, which is called the population. Each individual in the population is evaluated on objective(s) and constraints (if any) and is assigned a fitness. For instance, while solving a single-objective maximization problem, a solution point with a higher function value is better than a solution point with lower function value when both solutions are feasible. Therefore, in such cases, the individual with higher function value is assigned a higher fitness. While comparing two infeasible solutions, the solution with a smaller constraint violation is often assigned a higher fitness as compared with the solution with larger constraint violation. In the presence of multiple constraints, the constraint violation for a particular point is defined as the sum of violation of those constraints that are infeasible with respect to that point. While comparing a feasible solution against an infeasible solution, a feasible solution is often assigned a higher fitness as compared with the infeasible solution. There can, of course, be other ways to assign fitness. For an unconstrained maximization problem, the function value itself can be treated as the fitness value and, for an unconstrained minimization problem negative of the function value, may serve the purpose of fitness. In all such cases, the algorithm searches for a higher fitness solution.

In a multi-objective context, the requirement is to produce a set of solutions that approximate the Pareto-optimal front. Fitness assignment based on constraint violation can be performed in the multi-objective case in a similar manner as the single-objective case. Moreover, a feasible solution point which dominates another feasible solution point can be assigned a higher fitness. However, fitness assignment for two solutions that are non-dominated with respect to each other is tricky. In such cases, algorithms often consider a measure of diversity or crowdedness [4] in the objective space to assign fitness and prefer one solution over the other. The measure for crowdedness prefers solutions that are isolated over solutions that are in crowded regions to enhance diversity in the population and to obtain a "well-spread" set of solutions approximating the Pareto-optimal front. A multiobjective evolutionary procedure, therefore, assigns fitness to each of the solution points based on their superiority over other solution points in terms of constraints, dominance, and diversity in the objective space. Different algorithms use different quality functions to assign fitness to an individual in a population. Once an initial population is generated and the fitness is assigned, a few of the better candidates from the population are chosen as parents. Crossover and mutation are performed to generate new solutions. Crossover is an operator applied to two or more selected individuals and results in one or more new individuals. Mutation is applied to a single individual and results in one new individual. Executing crossover and mutation lead to offspring that compete, based on their fitness, with the individuals in the population, for a place in the next generation. An iteration of this process often leads to a rise in the average fitness of the population and, over iterations, helps the algorithm converge towards the optimum in a single-objective case and towards the Pareto-optimal front in a multi-objective case.

**Figure 9.** A flowchart for a general evolutionary algorithm.

Using the described evolutionary framework, a number of algorithms have been developed which successfully solve a variety of optimization problems. Their strength is particularly observable in handling two- to three-objective optimization problems and generating the entire Pareto-front. The aim of an EMO algorithm is to produce solutions which are (ideally) Pareto-optimal and uniformly distributed over the entire Pareto-front, so that a complete representation is provided. In the domain of EMO algorithms, these aims are commonly referred to as convergence and diversity. Figure 10 shows the working of a typical EMO algorithm that starts with a random initial population and aims to converge to the efficient frontier with a diverse set of solutions. The researchers in the EMO community have so far regarded an *a posteriori* approach to be an ideal approach where a representative set of Pareto-optimal solutions are found and then a DM is invited to select the most preferred point. The assertion is that only a DM who is well-informed is in a position to take a right decision. A common belief is that decision making should be based on complete knowledge of the available alternatives; current research in the field of EMO algorithms has taken inspiration from this belief. Though the belief is true to a certain extent, there are inherent difficulties associated with producing the entire set of alternatives and performing decision making thereafter, which many a time renders the approach ineffective.

**Figure 10.** The working of a general evolutionary multi-objective optimization (EMO) algorithm.

The EMO approaches can be divided into three broad categories based on the idea that they use to achieve convergence and diversity. The categories are:


While Pareto-based approaches have been popular for solving two- or three-objective test problems, their efficiency deteriorates on problems with a higher number of objectives. Many of these methods are based on the approach of non-dominated sorting of the population as the primary driver. In problems with a large number of objectives, most of the solutions generated by these approaches are non-dominated in the comparison set leading to deterioration in progress towards the Pareto-frontier. Indicator-based approaches attempt to optimize a particular indicator that accounts for both convergence and diversity but did not become popular because of high computational costs involved in computing the indicator metric (for example, Hypervolume or Inverted Generational Distance) in manyobjective problems. Despite these issues, both Pareto-based and indicator-based methods still hold promise, as non-dominated sorting in Pareto-based approaches is one of the fundamental ideas for partial ordering that cannot be ignored; similarly, faster computation of indicator metrics would make the indicator-based approaches competitive.

An alternative to Pareto-based approaches and indicator-based approaches are decomposition-based approaches, which have been effective in handling a larger number of objectives by decomposing the original problem into a set of subproblems, either multiple single-objective problems or multiple simplified multi-objective problems. These multiple problems are solved simultaneously in a collaborative manner and lead to better convergence and diversity, as the convergence is guaranteed by ensuring that each subproblem is properly optimized, and diversity is guaranteed by implicitly distributing the subproblems in an even manner. Interestingly, the decomposition-based methods utilize MCDM approaches while decomposing the multi-objective problems into subproblems. For instance, a distributed set of reference directions from the ideal point (or sometimes from the nadir point) towards the Pareto-front would lead to a well-distributed set of Pareto-optimal solutions if the front is uniform in shape. The methods that rely on decomposition solve these subproblems in a parallel manner and differ mostly on the basis of how the subproblems are created, how information between subproblems is shared during the generations, and how the subproblems may adapt during intermediate generations. However, note that if one considers a 10-objective problem, with a discretization of 10 along

each objective, one would need 10<sup>10</sup> points to approximate the frontier. Moreover, even if the points are produced by a computationally efficient algorithm, the decision-making challenge still remains. If the Pareto-front is not found with sufficient discretization, the DM may expect the method to explore additional solutions. For the purpose of evaluating the EMO approaches, a large body of literature exists on test problem toolkits [62–65] and performance assessment metrics [66–70] that allow the developers to compare the performance of various algorithms.

### **6. Hybrid Methods**

In this section, we focus on hybrid approaches that incorporate decision making within EMO. As already highlighted, the aim of the EMO algorithms is to find a diverse set of solutions close to the Pareto-optimal front, and the DM is then expected to choose the most preferred point from the objective space. However, approximating the entire Pareto-optimal front with a set of points is not always easy and may not serve the purpose, especially in the context of problems with a large number of objectives. To alleviate these problems associated with *a posteriori* EMO approaches, some EMO researchers taking cues from their MCDM counterparts have attempted an *a priori* approach, where a small set of Pareto-optimal points in the region of interest to the DM is targeted. As soon as the region of interest becomes smaller, certain problems associated with the high dimensionality of the problem in the objective space gets alleviated. Greenwood et al. [30] used an evolutionary approach to optimize a linear value function obtained from the DM through ranking of a few alternatives. In this method, the preference information is employed before optimization, and therefore this qualifies as an *a priori* method. Other studies in this direction are the cone-dominance-based EMO [71], biased-niching-based EMO [27], the light beam approach based EMO [35], and reference-point-based EMO approaches [28,29].

In [71], the authors modify the dominance principle based on interactions with the DM. For every pair of objectives, the DM specifies maximally acceptable trade-offs, i.e., what is the improvement of one unit in one objective (say *f*1) worth in terms of degradation of another objective (say *f*2). If the degradation is worth at most *a*<sup>12</sup> in *f*<sup>2</sup> when *f*<sup>1</sup> improves by unity, and at most *a*<sup>21</sup> in *f*<sup>1</sup> when *f*<sup>2</sup> improves by unity, then the dominance scheme *x y* is modified as follows with a strict inequality in at least one case:

$$(f\_1(\mathbf{x}) + a\_{12}f\_2(\mathbf{x}) \le f\_1(y) + a\_{12}f\_2(y)) \land (a\_{21}f\_1(\mathbf{x}) + f\_2(\mathbf{x}) \le a\_{21}f\_1(y) + f\_2(y))$$

Incorporating the above principle in an EMO is straightforward, as one can simply replace objectives *f*<sup>1</sup> and *f*<sup>2</sup> with Ω<sup>1</sup> and Ω2, respectively, where Ω<sup>1</sup> and Ω<sup>2</sup> are defined below, and solve the problem with the standard dominance principle.

$$\begin{aligned} \Omega\_1(\mathbf{x}) &= f\_1(\mathbf{x}) + a\_{12} f\_2(\mathbf{x}) \\ \Omega\_2(\mathbf{x}) &= a\_{21} f\_1(\mathbf{x}) + f\_2(\mathbf{x}) \end{aligned}$$

The approach can be incorporated in any EMO and does not lead to any increase in complexity.

Figures 11 and 12 indicate the working of the light beam approach based EMO [35] and the reference-direction-based EMO [28,29] approaches, respectively. In their study, Jaskiewicz and Branke [40] showed that it is difficult for an EMO algorithm alone to find a good spread of solutions in five- to ten-objective problems, and when solutions around the most preferred point are targeted, the hybrid approaches are able to find satisfactory solutions.

**Figure 11.** A light beam approach integrated within an EMO that finds a crowded set of points close to the Pareto-frontier based on the aspirations of the DM.

**Figure 12.** Projection of a feasible and infeasible reference point on the Pareto-optimal frontier within an EMO.

Preference-based EMO algorithms can differ from each other based on the following aspects:


Apart from a priori and a posteriori approaches, a more seamless and effective way to incorporate DMs' preferences in the EMO would be to collect and incorporate preferences at the intermediate generations of the EMO algorithm to guide the search towards the most preferred point. Such an approach is commonly referred to as a progressively interactive EMO approach. We discuss, in detail, some of the progressively interactive techniques studied in the literature.

### *6.1. Phelps and Köksalan (2003) [38]*

Phelps and Köksalan [38] presented one of the first hybrid approaches, where they optimized a linearly weighted utility function during the iterations of an evolutionary algorithm. The decision maker makes a number of binary comparisons that leads to the weights of the utility function. For a given parameter *t* and ideal point *f* ∗ *<sup>k</sup>* = Max*x*∈*X*{ *fk*(*x*)}, the authors solve the following optimization problem to obtain the weights *wk*, *k* = 1, . . . , *p*.

Max *ε* s.t. ∑*<sup>p</sup> <sup>k</sup>*=<sup>1</sup> *wk* = 1 ∑*p <sup>k</sup>*=<sup>1</sup> *wk f* ∗ *<sup>k</sup>* − *fk* **x**(*i*) *t* <sup>≤</sup> <sup>∑</sup>*<sup>p</sup> <sup>k</sup>*=<sup>1</sup> *wk f* ∗ *<sup>k</sup>* − *fk* **x**(*j*) *t* <sup>−</sup> *<sup>ε</sup>* <sup>∀</sup> **<sup>x</sup>**(*i*) <sup>&</sup>gt; **<sup>x</sup>**(*j*) *wk* ≥ *ε* ∀ *k* = 1, . . . , *p*

The above problem is an LP that leads to **w**∗ used in calculating the fitness of each point using the following utility function:

$$\mathcal{U}(\mathbf{x}) = -\sum\_{k=1}^{p} w\_k^\* (f\_k^\* - f\_k(\mathbf{x}))^{\mathrm{f}}$$

The preference from the DM is taken initially or during the execution of the algorithm to modify the fitness function. The authors considered linear utility functions in their study.

To incorporate the properties of an implicit quasi-concave utility function into an EMO, Fowler et al. [39] developed an interactive EMO approach using convex preference cones. They used feasibility, dominance, and preference cones to order the population members and used that information for fitness calculation. They tested their algorithm on multidimensional (up to four dimensions) knapsack problems using a similar interactive genetic algorithm framework to that of Phelps and Köksalan [38]. Jaszkiewicz [40] constructed an achievement scalarizing function using random weights. The random weights are preferred if the scalarizing function generated conforms to the preference information provided by the DM. The EMO search is then guided by the scalarizing function generated with random weights.

### *6.2. Branke, Greco, Słowi ´nski, and Zielniewicz (2009) [41]*

Branke et al. [41] implemented the GRIP [72] methodology, in which the DM-provided pairwise information is used to find all possible compatible additive value functions (not necessarily linear). A preference-based dominance relationship and a preference-based diversity preserving operator is used in an EMO to find new solutions for the next few generations. In their approach, they make pairwise comparisons after every few generations in order to develop the preference structure. It is also possible for the DM to specify the intensities of preference. The authors use robust ordinal regression on information obtained through interaction with the DM to determine the set of all compatible value functions. Thereafter, the EMO procedure performs a parallel search for all non-dominated solutions that are preferred with respect to the compatible value functions. The authors demonstrated their procedure on a two-objective test problem. The study was later extended to solve up to five-objective test problems in [73]. The study takes a robustness approach to avoid arbitrary selection of a value function, which makes it different from most of the other studies that determine the single most discriminating value function. The use of preference information in a robust manner in EMO is a significant contribution of this study. Other recent studies that use a set of instances of the preference model compatible with the DM's preference information are [74,75]. These studies generate multiple instances of the preference model using Monte Carlo simulation and utilize the instances as search directions in a decomposition-based EMO approach.

### *6.3. Deb, Sinha, Korhonen, and Wallenius (2010) [42]*

Based on earlier interactive MCDM approaches [76,77], this paper proposes a preferencebased EMO to guide a DM to the most preferred solution by creating non-linear value functions in the intermediate generations of the algorithm. The approach accepts preference information in the form of complete or partial ranking, i.e., the DM may prefer one solution over the other or the DM may be indifferent between two solutions. The authors do not consider the situation where the DM is unable to compare two solutions. Through an extensive computational study on two- to five-objective problems, the authors evaluated the performance of their approach when the DM interacts less/more frequently with the EMO approach, as well as the impact on the quality of the solution produced when the DM provides erroneous preference information. The approach utilized the approximated value function in an innovative manner by partitioning the objective space into two areas using the value function. They also utilized the value function for performing local search and termination of the method.

In this paper, the authors fit a polynomial value function with the following structure for two objectives.

$$\begin{array}{ll} V(f\_1, f\_2) = (f\_1 + k\_1 f\_2 + l\_1)(f\_2 + k\_2 f\_1 + l\_2) \\ \text{where} \quad f\_1, f\_2 & \text{are the objective functions} \\ \text{and} \quad k\_1, k\_2, l\_1, l\_2 & \text{are the value function parameters} \end{array}$$

For a higher number of objectives, they use a higher-order polynomial function of the following kind:

$$V(\mathbf{f}) = \prod\_{i=1}^{p} \left( \sum\_{j=1}^{p} \left[ k\_{ij} f\_j + k\_{i(p+1)} \right] \right)^2$$

where ∑*<sup>p</sup> <sup>j</sup>*=<sup>1</sup> *kij* = 1 for all *i*, and *kij* ≥ 0 for *j* ≤ *p* and for all *i*. The value function is fitted by solving the following optimization problem with respect to the value function parameters when preference information is available:

Maximize *-* subject to *V* is non-negative at every point **x**(*i*) *V* is strictly increasing at every point **x**(*i*) *<sup>V</sup>*(**x**(*i*)) <sup>−</sup> *<sup>V</sup>*(**x**(*j*)) <sup>≥</sup> *-*, for all (*i*, *<sup>j</sup>*) pairs satisfying **<sup>x</sup>**(*i*) **x**(*j*) *<sup>V</sup>*(**x**(*i*)) <sup>−</sup> *<sup>V</sup>*(**x**(*j*)) <sup>≤</sup> *<sup>δ</sup>V*, for all (*i*, *<sup>j</sup>*) pairs satisfying **<sup>x</sup>**(*i*) <sup>∼</sup> **<sup>x</sup>**(*j*)

A look into the above optimization problem reveals that it attempts to find a value function for which the minimum difference in the value function values between the ordered pairs of points is maximum. At the same time, it also ensures that the difference in the value function values for a pair of indifferent points is smaller than a threshold that is proposed to be *δ<sup>V</sup>* = 0.1*-*. Figures 13 and 14 show how the preference structure is captured using the value function when points have a complete or a partial order, respectively. An extension of this study suggested a generalized polynomial value function [78]; however, any attempt to fit a very complex value function to user preference is not always advisable. Unless there are errors or conflicts in preference information, the preference structure in a region can often be captured using relatively simple value functions.

**Figure 13.** Value function fitting when the points are ordered.

**Figure 14.** Value function fitting when the points are partially ordered.

### *6.4. Sinha, Deb, Korhonen, and Wallenius (2014) [43]*

In this study, Sinha et al. [43] generate the most preferred solution on the Paretooptimal frontier in a fixed budget of decision-making calls. Most of the earlier hybrid approaches did not assume that the DM will be available for providing preferences only for a fixed number of times. In fact, in most of the procedures, there is no control on the number of DM calls, or the DM calls are not utilized effectively. The assumption in most of the interactive approaches is that the DM would be available for as many interactions as desired by the method until a satisfactory solution is found; however, this is not a wise assumption to make. The approach discussed in this section attempts to address this concern by solving the problem in a fixed number of interactions with the DM. The study also deviated from constructing value functions and, instead, suggested constructing polyhedral cones heuristically to guide the EMO. The authors tested their approach on two- to five-objective test problems and studied the impact of increasing or decreasing the budget of DM calls on the performance of the algorithm in getting close to the most preferred point.

The algorithm requires the ideal point at start. Once the ideal point is known, the initial random population is created, and the point in the initial population closest to the ideal point is chosen. Let the distance be denoted as *DI*. This distance *DI* is divided into certain equal parts (say *dI*) based on the budget of DM calls available. Thereafter, the EMO run starts, and preference from the DM is elicited only after a progress of *dI* has been made. During the progress, the algorithm stores all non-dominated solutions produced in an archive set, and preference from the DM is taken in terms of the most preferred point from the archive set.

The method heuristically constructs a polyhedral cone using the most preferred solution suggested by the DM from the archive set and the end points along each of the objectives. For a *p*-objective problem, a polyhedral cone is formed using *p* + 1 points. Figures 15 and 16 show the construction of cones in two and three dimensions. Once the

polyhedral cone is determined, it provides an idea for a search direction. The normal unit vectors (*Vi*) of all the *p* hyperplanes can be summed up to obtain a heuristic search direction (*W*- = ∑*<sup>p</sup> <sup>i</sup>*=<sup>1</sup> *Vi*), which is used in the algorithm for the purpose of local search.

As an extension, a mathematically driven preference-cone-based approach was later proposed in [79], where the user's preferences were assumed to follow an unknown quasiconcave and increasing value function. In addition to considering the preference cones as a tool for eliminating non-preferred solutions, the authors also presented how the cones could be leveraged in approximating the steepest ascent direction to guide an evolutionary algorithm. A merit function is proposed that the authors use for fitness calculations in the algorithm. In addition to test problems, a mixed-integer facility location problem was solved in the later study.

**Figure 15.** Polyhedral cone in 2 dimensions.

**Figure 16.** Polyhedral cone in 3 dimensions.

### **7. Interaction Styles, Behavioral Considerations, and Future Work**

Given that preferences can be elicited in a number of ways, developers of a specific method normally think that the interaction style embedded in their approach is the best. However, we need more research to answer what kind of cognitive load is caused by the interaction style (preference elicitation) and which interaction style leads closer to the true most preferred solution. An interesting approach is to use neuro-physiological measurement instruments to measure the DM's cognitive and emotional load. Scholars in the 1970s pioneered the idea of interactively solving multi-objective problems, which was remarkable considering the state of the art of computer technology in the early to mid 1970s. No personal computers, nor computer graphics capabilities, were available before the early 1980s. Scholars had to access the main frame computer via teletypes (time sharing). Their contribution was not the development of the concept of efficient or Pareto-optimal solution, rather, it was Pareto who introduced the ideas of non-dominance. However, the scholars during the period made it possible to explore or move around the non-dominated frontier in an effective way. Problems studied were largely limited to LP framework (convex and compact feasible sets).

With computation becoming faster, newer applications arising in practice, and numerical or computational techniques for more general classes of problems being developed, researchers started looking beyond LPs. However, the decision-making difficulties did not receive the attention they deserved. For instance, there has been only limited effort in trying to solve multi-objective problems in a fixed number of interactions with the DM [43,80]. The decision-making calls are often assumed to be an unlimited resource, with the expectation that the DM is available for a large number of interactions.

Termination criterion for methods involving human–machine interaction is a challenge. Optimization methods may terminate based on gradient-based criterion, Karush–Kuhn– Tucker-based criterion, or improvement-based criterion. However, with the DM interacting with the method, it is difficult to terminate the process, as one does not know in advance the proximity of the current best solution for the DM to the true most preferred solution. Effort is also required on visualization techniques to reduce the burden on the DM during the decision-making process. Many of the visualization techniques focus on commonly used descriptive approaches, such as scatter plots, bar charts, value plots, etc. However, very few of the techniques offer an immersive experience to the DM, where the DM can easily navigate in the search space, understand trade-offs, possible improvements, and then make a decision. A comprehensive review of visualization-based approaches can be found in [81,82].

Other challenges in the decision-making context which have led to difficulties in preference modeling are as follows:


Significant effort is still required towards developing decision-making and search techniques that are robust to the above mentioned issues.

**Author Contributions:** Authors A.S. and J.W. have contributed equally in conceptualization, investigation and writing of the manuscript. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


### *Article* **A Bounded Archiver for Hausdorff Approximations of the Pareto Front for Multi-Objective Evolutionary Algorithms**

**Carlos Ignacio Hernández Castellanos 1,\*,† and Oliver Schütze 2,\*,†**


**Abstract:** Multi-objective evolutionary algorithms (MOEAs) have been successfully applied for the numerical treatment of multi-objective optimization problems (MOP) during the last three decades. One important task within MOEAs is the archiving (or selection) of the computed candidate solutions, since one can expect that an MOP has infinitely many solutions. We present and analyze in this work ArchiveUpdateHD, which is a bounded archiver that aims for Hausdorff approximations of the Pareto front. We show that the sequence of archives generated by ArchiveUpdateHD yields under certain (mild) assumptions with a probability of one after finitely many steps a Δ+-approximation of the Pareto front, where the value Δ<sup>+</sup> is computed by the archiver within the run of the algorithm without any prior knowledge of the Pareto front. The knowledge of this value is of great importance for the decision maker, since it is a measure for the "completeness" of the Pareto front approximation. Numerical results on several well-known academic test problems as well as the usage of Archive-UpdateHD as an external archiver within three state-of-the-art MOEAs indicate the benefit of the novel strategy.

**Keywords:** evolutionary multi-objective optimization; archiving; convergence

### **1. Introduction**

This work is dedicated to the 60th birthday of Professor Kalyanmoy Deb, a pioneer and highly impactful and influential proponent of Evolutionary Multi-Objective Optimization (EMO) for the last three decades. In particular, the seminal work *Combining Convergence and Diversity in Evolutionary Multiobjective Optimization* by Marco Laummans, Lothar Thiele, Kalyanmoy Deb, and Eckart Zitzler [1] has been a motivation of the second author to consider the challenging and fruitful field of archiving in EMO.

Multi-objective optimization problems (MOPs), i.e., problems where several conflicting objectives have to be optimized concurrently, naturally arise in many real-world applications (e.g., [2–8]). While one can expect *one* optimal solution if "only" one objective is being considered, the solution set of an MOP (the so-called Pareto set, respectively, its image, the Pareto front) typically forms at least locally a manifold of a certain dimension [9]. One important task in multi-objective optimization (MOO) is hence to identify a "suitable" finite size approximation of these solution sets. Multi-objective evolutionary algorithms (MOEAs) represent an important class of algorithms for the numerical treatment of such problems. MOEAs have caught the interest of researchers and practitioners due to their global nature and robustness and since they require only minimal assumptions on the model (e.g., [2,10]). The process to elect a subset of the candidate solutions generated by the MOEA is called *selection* or *archiving*. Existing archiving/selection strategies can be roughly divided into two classes (see subsequent section for more details): (i) mechanisms that maintain sets those cardinalities are equal or do not exceed a certain pre-defined cardinality—which we will call bounded archivers in the following—and (ii) archivers that

**Citation:** Hernández Castellanos, C.I.; Schütze, O. A Bounded Archiver for Hausdorff Approximations of the Pareto Front for Multi-Objective Evolutionary Algorithms. *Math. Comput. Appl.* **2022**, *27*, 48. https://doi.org/10.3390/ mca27030048

Academic Editors: Sebastian Peitz and Gianluigi Rozza

Received: 30 April 2022 Accepted: 28 May 2022 Published: 1 June 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

are based on the concept of *-*-dominance. Such archivers generate sequences of archives *Ai* with monotonic behavior, i.e., no deterioration of cyclic behavior can be observed during the run of an algorithm. Furthermore, for *i* → ∞, these archives yield certain limit approximation qualities that can be adjusted a priori, mainly by choosing the values of *-* <sup>∈</sup> <sup>R</sup>*<sup>k</sup>* which comes rather naturally at least if the MOP arises from a real-world application. On the other hand, the magnitudes of the final archives are entirely determined by  and some other design parameters, which are set a priori and are supposed to remain fixed during the computation, and the size of the Pareto front, which is a priori of course unknown. It has turned out that most EMO researchers prefer to have a fixed number of elements in the archives, e.g., for the sake of a better comparison to other methods but also to avoid the necessity of storing an unexpected large amount of candidate solutions. The latter problem is apparently by construction not given by bounded archives. For most strategies from class (i), however, no theoretical analyses such as convergence properties are known. For many distance-based methods, it is further known that cyclic behavior and deterioration can occur during the run of the algorithm. It is hence fair to say that these methods do not tap the full potential, since any MOEA using such a strategy will not converge regardless of the regions they explore during the run of the algorithm. An exception is the bounded archive proposed in [11], which yields under certain (mild) assumptions and with the probability of one *-*-Pareto set in the limit, where *-* ∈ R is the smallest possible value with respect to to the bound of the archives.

In this paper, we propose a bounded archiver that is based on distance, dominance and *-*-dominance that offers quasi-monotonic behavior and yields approximation qualities in the limit. More precisely, ArchiveUpdateHD aims for Hausdorff approximations of the Pareto front (i.e., evenly spread solutions along the Pareto front). Under certain (mild) assumptions on the generation process, it will be shown that the Hausdorff distances of the images of the archives *F*(*Ai*) and the Pareto front *F*(*PQ*) are bounded by a value Δ+, which is computed by the archiver during the run of the algorithm. Numerical experiments show that this value indeed represents a good approximation of the actual Hausdorff distance (while a better strategy is proposed for bi-objective problems). During the run of the algorithm, two design parameters are adjusted adaptively during the run of the algorithm (one being the value of  for the *-*-dominance). Since these values will become stationary during the search process, one can expect monotonic behavior from a certain stage of the search process. The knowledge of the Hausdorff distance of *F*(*Ai*) and *F*(*PQ*) is important information for the decision maker (DM), since it represents the maximal error in the approximation. If not needed (i.e., depending on the chosen initial design parameters), the magnitudes of the archivers will not reach the pre-defined size *N*. Else, the value Δ<sup>+</sup> computed by the archiver is an important piece of information, since it tells the DM if the approximation is "complete enough" or not. In the latter case, the computation may have to be repeated using an increased value of *N*. A preliminary version of this work can be found in [12], which is restricted to bi-objective problems and contains fewer empirical results.

The rest of this document is structured as follows: Section 2 briefly summarizes the background that is required for the understanding of the sequel and presents the related work. In Section 3, the new archiver ArchiveUpdateHD is discussed and analyzed, first for bi-objective problems and after that for the general number of objectives. In Section 4, some numerical results and comparisons are presented and discussed. Finally, in Section 5, conclusions are drawn, and possible paths for future research are mentioned.

### **2. Background and Related Work**

Here, we consider continuous multi-objective optimization problems (MOPs) that can be expressed as follows:

$$\min\_{\mathbf{x}\in Q} F(\mathbf{x}).\tag{\text{\textquotedblleft}OCP})\tag{\text{\textquotedblleft}OCP}$$

The map *F* is defined by the individual objective functions *fi*, i.e.,

$$F: \mathbb{Q} \to \mathbb{R}^k, \qquad F(\mathbf{x}) = (f\_1(\mathbf{x}), \dots, f\_k(\mathbf{x}))^T,\tag{1}$$

where each *fi* : *Q* → R, *i* = 1, ... , *k*, is assumed to be continuous. We stress, however, that the archiver presented below can also be applied to discrete problems. *Q* is the domain or feasible set of the problem, which is typically expressed by equality and inequality constraints. We assume *Q* to be compact (i.e., closed and bounded). If *k* = 2 objectives are considered, the problem is also termed a bi-objective problem (BOP).

In order to define optimality in multi-objective opimization, the concept of dominance can be used: for two vectors *<sup>x</sup>*, *<sup>y</sup>* <sup>∈</sup> <sup>R</sup>*<sup>k</sup>* we say that *<sup>x</sup>* is less than *<sup>y</sup>* (*<sup>x</sup>* <sup>&</sup>lt;*<sup>p</sup> <sup>y</sup>*) if *xi* <sup>&</sup>lt; *yi*, *i* ∈ {1, ... , *k*}, analogously for the relation ≤*p*. We say that *y* ∈ *Q* is *dominated* by *x* ∈ *Q* (*x* ≺ *y*) with respect to (MOP) if *F*(*x*) ≤*<sup>p</sup> F*(*y*) and *F*(*x*) = *F*(*y*), else we say that *y* is *non-dominated* by *x*. Finally, *x*<sup>∗</sup> ∈ *Q* is called Pareto optimal or simply optimal with respect to (MOP) if there exists no *y* ∈ *Q* that dominates *x*. The Pareto set *PQ* is the set of all optimal solutions with respect to (MOP), and its image *F*(*PQ*) is called the Pareto front. One can expect that both the Pareto set and Pareto front are from at least locally objects of dimension *k* − 1, where *k* is the number of objectives considered in the MOP [9].

For the convergence analysis, we will consider a very general class of algorithms, called Generic Stochastic Search Algorithm (GSSA), first considered by Laumanns et al. [1]. An algorithm of this class consists of a process to generate new candiate solutions together with an update strategy. Algorithm 1 shows the pseudocode of GSSA.

### **Algorithm 1** Generic Stochastic Search Algorithm

1: *P*<sup>0</sup> ⊂ *Q* drawn at random 2: *A*<sup>0</sup> = *ArchiveU pdate*(*P*0, ∅) 3: **for** *j* = 0, 1, 2, . . . **do** 4: *Pj*+<sup>1</sup> = *Generate*(*Pj*) 5: *Aj*+<sup>1</sup> = *ArchiveU pdate*(*Pj*+1, *Aj*) 6: **end for**

In the following, we define the Hausdorff distance *dH* and the averaged Hausdorff distance Δ*p*, which we will use to assess the approximation qualities of the obtained Pareto front approximations (toward the actual Pareto fronts).

**Definition 1.** *Let <sup>u</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup> and <sup>A</sup>*, *<sup>B</sup>* <sup>⊂</sup> <sup>R</sup>*n. The semi-distance dist*(·, ·) *and the* Hausdorff distance *dH*(·, ·) *are defined as follows:*


**Definition 2** ([13])**.** *Let A*, *<sup>B</sup>* <sup>⊂</sup> <sup>R</sup>*<sup>n</sup> be finite sets. The value*

$$\Delta\_p(A, B) = \max(GD\_p(A, B), IGD\_p(A, B)),\tag{2}$$

*where*

$$\begin{aligned} GD\_p(A, B) &= \left( \frac{1}{|A|} \sum\_{a \in A} dist(a, B)^p \right)^{1/p} \\ IGD\_p(A, B) &= \left( \frac{1}{|B|} \sum\_{b \in B} dist(b, A)^p \right)^{1/p} \end{aligned} \tag{3}$$

*and p* ∈ N*, is called the* averaged Hausdorff distance *between A and B.*

We further define some objects that specify certain approximation qualities of Pareto front approximations. All of these objects are based on the concept of *-*-dominance, which we will define first.

**Definition 3** (*-*-dominance)**.** *Let -* = (*-*1, ... , *<sup>k</sup>*)*<sup>T</sup>* <sup>∈</sup> <sup>R</sup>*<sup>k</sup>* <sup>+</sup> *and <sup>x</sup>*, *<sup>y</sup>* <sup>∈</sup> <sup>R</sup>*n. <sup>x</sup> is said to --dominate y (in short: x* ≺*y) with respect to (MOP) if*

$$F(\mathbf{x}) - \mathfrak{e} \le\_p F(y) \qquad \text{and} \qquad F(\mathbf{x}) - \mathfrak{e} \ne F(y). \tag{4}$$

**Definition 4** (*-*-(approximate) Pareto front, [1])**.** *Let -* <sup>∈</sup> <sup>R</sup>*<sup>k</sup>* <sup>+</sup> *and A* <sup>⊂</sup> <sup>R</sup>*n.*

*(a) F*(*A*) *is called an -*-approximate Pareto front *of (MOP) if every point x* ∈ *Q is --dominated by at least one a* ∈ *A, i.e.,*

$$
\forall \mathbf{x} \in Q \;:\; \exists a \in A \;:\; \quad a \prec\_{a} \mathbf{x}.\tag{5}
$$

*(b) F*(*A*) *is called an -*-Pareto front *if F*(*A*) *is an --approximate Pareto front and if every point a* ∈ *A is a Pareto point of (MOP).*

**Definition 5** (Δ-tight *-*-(approximate) Pareto front, [14])**.** *Let -* <sup>∈</sup> <sup>R</sup>*<sup>k</sup>* <sup>+</sup> *and A* <sup>⊂</sup> <sup>R</sup>*n.*

*(a) F*(*A*) *is called a* Δ-tight *-*-approximate Pareto front *of (MOP) if A is an --approximate Pareto front of (MOP) and if in addition*

$$\text{dist}(F(P\_{\mathbb{Q}}), F(A)) \le \Lambda. \tag{6}$$

*(b) F*(*A*) *is called a* Δ-tight *-*-Pareto front *if A is an --Pareto front of (MOP) and if in addition*

$$d\_H(F(P\_Q), F(A)) \le \Delta. \tag{7}$$

The archiver we propose in this work, ArchiveUpdateHD, aims for Δ-tight *-*-Pareto fronts for particular values of Δ and *-*. The sole usage of *-*-dominance for the Pareto front approximations may lead to gaps in particular when parts of the front are flat. The Δ-tight *-*-(approximate) Pareto fronts also take into account the distance of the Pareto front toward the candidate set, leading to better approximations in the Hausdorff sense (see Figure 1).

**Figure 1.** Gaps in the approximation can occur when *-*-dominance is used exclusively in the selection/archiving of the candidate solutions (**left**). Δ-tight *-*-(approximate) Pareto fronts also consider the distance of the Pareto front toward the archive (**right**).

Since one can expect infinitely many solutions for a continuous MOP, it is inevitable in (continuous) evolutionary multi-objective optimization (EMO) that not all promising solutions can be kept during the run of an algorithm. Instead, one has to elect a subset of candiate solutions in each iteration so that this sequence eventually leads to a "suitable" representation of the Pareto set/front of the given problem. This process is typically called "selection" within MOEAs and "archiving" if an external set of candidate solutions (archive) is maintained during the run of a MOEA (though of course both terms can be used interchangeably).

Three main classes of MOEAs exist: (a) dominance-based [15–18], (b) decompositionbased [19–25], and (c) indicator-based [26–30] algorithms. The selection strategies for MOEAs of class (b) or (c) are rather straightforward: the selection in a decompositionbased MOEA is done implicitly by the chosen scalarizing functions, and the selection in an indicator-based MOEA is typically handled via considering the indicator contributions. These two approaches come on the one hand with a monotonic behavior of the sequence of approximations (i.e., no deterioration can occur). On the other hand, these selection strategies do not guarantee convergence toward the best approximation (e.g., [29,31]). The selection of the first dominance-based MOEAs is based on non-dominated sorting in combination with niching techniques (e.g., [32–34]). Due to missing elite preservation, none of these methods converge in the mathematical sense. Later MOEAs such as SPEA [35], PAES [18], SPEA-II [16], and NSGA-II [15] include such elite preservation leading to much better overall performance. However, also for these algorithms, no convergence properties (again, in the mathematical sense) are known. Rudolph [36–39] and Hanne [40–43] have studied convergence properties of MOEA frameworks. These studies are mainly concerned with the convergence of individuals of the populations toward the Pareto set/front, while the magnitudes and the distributions of the resulting populations are not considered.

Archiving strategies with bounded archive size based on adaptive grid selection have been considered in [44–46]. Bounded archivers in particular for hypervolume approximations have been proposed in [47,48]. Both archivers yield monotonic behavior in the approximation qualities of the obtained sequence of archives.

Laumanns et al. considered the class of algorithms GSSA as described above [1,49] which allows to focus on the archiver under certain (mild) assumptions on the generator. In both studies, archivers were considered, aiming for several  approximations of the Pareto front, where finitely many iterations were considered. Later, further archivers have been proposed based on *-*-dominance using the framework of GSSA to perform convergence analysis [14,50–54].

In [11], Laumanns and Zenklusen propose two bounded archivers that use adaptive schemes to obtain  approximations of the Pareto front. Another adaptive archiving strategy is proposed in [55] that utilizes a particular discretization of the objective space of the given problem. A strategy that is based on the convex hull of individual minima in order to increase diversity of the solutions is proposed in [56].

Recently, the use of external archives has become more popular [57–62] in particular for the treatment of real-world applications where function evaluations are expensive, and where it is hence advisable to maintain all promising candidate solutions. Consequently, most of these archivers are unbounded [63–67]. For the treatment of in particular MOPs with many objectives—also called many objective problems—MOEAs have been proposed that utilize *two* archives, one aiming for convergence and one aiming for diversity [68,69].

### **3. ArchiveUpdateHD**

We will in this section propose and discuss the novel archiver ArchiveUpdateHD. Since the considerations of the distances as well as the Hausdorff approximations can be done more accurately for *k* = 2—where we can assume the Pareto front to locally form a curve, and hence, the elements of the approximations can be arranged via a sorting in objective space—we first address the bi-objective case and will afterwards consider the archiver for problems with *k* > 2.

### *3.1. The Bi-Objective Case*

The pseudocode of ArchiveUpdateHD for bi-objective problems is shown in Algorithm 2. This archiver aims for approximations of the Pareto front of a given BOP in the Hausdorff sense (i.e., for solutions that are evenly spread along the Pareto front). The archiver is based (i) on the distances among the candidate solutions (lines 18–36 of Algorithm 2), (ii) "classical" dominance or elite preservation (lines 5 and 9) as well as (iii) the concept of *-*-dominance (line 5). The archiver can roughly be divided into two parts: an acceptance strategy to decide if an incoming candidate solution *p* should be considered (line 5), and a pruning technique (mainly lines 18–36, but also lines 11–14) which is applied if the size of the archive has exceeded a predefined budget *N* of archive entries.

In the following, we will describe ArchiveUpdateHD as in Algorithm 2 in more detail. This algorithm contains several elements that have to be incorporated in order to guarantee convergence. After the convergence analysis (Theorem 1), we will discuss more practical realizations of the algorithm.

In line 5, it is decided if a candidate solution *p* should be (at least temporarily) added to the existing archive *A*. This is the case if (a) none of the entries *a* ∈ *A* Θ*-*-dominates *p* (Θ ∈ (0, 1) being a safety factor needed to guarantee convergence, see below for practical realizations), or if (b) none of the entries *a* ∈ *A* dominates *p* and for none of the entries *a* ∈ *A* the distance *F*(*a*) − *F*(*p*) is less or equal than ΘΔ. Throughout this work, · denotes the Euclidean norm. We stress that this acceptance strategy is identical to the one of the archiver ArchiveUpdateTight2 [14], which we will need for the upcoming convergence analysis.

If the candidate solution *p* is accepted, it will be added to *A*. Next, all other entries *a* ∈ *A* dominated by *p* will be discarded (lines 8–10). Hence, all archives generated by ArchiveUpdateHD only contain mutually non-dominated elements (elite preservation). If the distance *F*(*p*) − *F*(*a*) is larger than  for any of these dominated archive entries *a*, a "reset" is executed for Δ and *-*: Δ*min* is set to *κ*Δ*min* (where *κ* > 1 another safety factor). Next, Δ and  are updated using this new minimal value. The idea behind this reset is as follows: if *p* ≺ *a* and the distance of *F*(*a*) and *F*(*p*) is larger than Δ, then *p* and *a* could be located in different connected components of the set of (local) solutions of the bi-objective problem. Since the values both of Δ and  are determined by the length of the (known) Pareto front, their values have to be set back, since a "jump" to a new connected component may lead to a new length. See Figure 2 for a hypothetical scenario. The value of Δ*min* has to be (slightly) increased in each reset in order to avoid the possible of a cyclic behavior in the sequence of archives (which, in fact, has not been observed in our computations).

If |*A*| exceeds the predefined magnitude *N*, it is decided in lines 18–36 which of the elements of *A* has to be discarded (pruning). For *k* = 2 objectives, we can order all the entries of the archives (e.g., as done here: in ascending order wrt objective *f*1). Then, the vector *<sup>d</sup>* <sup>∈</sup> <sup>R</sup>*<sup>N</sup>* of distances can be simply computed via:

$$d\_i := \|F(a\_{i+1}) - F(a\_i)\|, \quad i = 1, \ldots, N. \tag{8}$$

For an index *m* chosen from arg min *d*, either *am* or *am*+<sup>1</sup> is then removed from *A*, which is done in lines 23–33. The aim of ArchiveUpdateHD is to maintain good approximations of the end points of the Pareto front. Accordingly, *a*2, respectively, *aN*, are always discarded instead of *a*<sup>1</sup> and *aN*+1, respectively (lines 23–26). The rationale behind the selection in lines 28–33 is to keep the archive of size *N* with the most evenly distributed elements.

### **Algorithm 2** ArchiveUpdateHD

**Require:** Problem (MOP), where *k* = 2, *P*: current population, *A*0: current archive, Δ<sup>0</sup> > 0: current value of Δ, Δ*min*: minimal value of Δ, Θ ∈ (0, 1), *κ* > 1: safety factors, *N*: upper bound for archive size **Ensure:** updated archive *A*, updated values for Δ, Δ*min*, and *-* 1: *A* := *A*<sup>0</sup> 2: Δ := Δ<sup>0</sup> 3: *-* := (Δ,..., Δ)*<sup>T</sup>* 4: **for all** *p* ∈ *P* **do** 5: **if** ∃*a* ∈ *A* : *a* ≺Θ *p*, or ∃*a* ∈ *A* : *a* ≺ *p* and ∀*a* ∈ *A* : *F*(*a*) − *F*(*p*) > ΘΔ **then** 6: *A* := *A* ∪ {*p*} 7: **end if** 8: **for all** *a* ∈ *A* **do** 9: **if** *p* ≺ *a* **then** 10: *A* := *A* ∪ {*p*}\{*a*} 11: **if** *F*(*p*) − *F*(*a*)<sup>∞</sup> > Δ **then** reset Δ and *-* 12: Δ*min* := *κ*Δ*min* 13: Δ := Δ*min* 14: *-* := (Δ,..., Δ)*<sup>T</sup>* 15: **end if** 16: **end if** 17: **end for** 18: **if** |*A*| = *N* + 1 **then** apply pruning 19: <sup>Δ</sup> :<sup>=</sup> *<sup>N</sup>*+<sup>1</sup> *<sup>N</sup>* <sup>Δ</sup> 20: *-* :<sup>=</sup> *<sup>N</sup>*+<sup>1</sup> *<sup>N</sup> -* 21: sort *A* (e.g., according to *f*1) 22: compute *<sup>d</sup>* <sup>∈</sup> <sup>R</sup>*<sup>N</sup>* as in (8) 23: choose *m* ∈ arg min *d* 24: **if** *m* = 1 **then** 25: *A* := *A*\{*a*2} remove 2nd entry 26: **else if** *m* = *N* **then** 27: *A* := *A*\{*aN*} remove 2nd but last entry 28: **else** 29: *dl* := *F*(*am*+1) − *F*(*am*−1) 30: *dr* := *F*(*am*+2) − *F*(*am*) 31: **if** *dl* < *dr* **then** 32: *A* := *A*\{*am*} 33: **else** 34: *A* := *A*\{*am*+1} 35: **end if** 36: **end if** 37: **end if** 38: **end for** 39: **return** {*A*, *-*, *min*, Δ}

**Figure 2.** A hypothetical scenario that can happen for multi-modal problems: first, a front that is only locally optimal is detected by the search process and approximated by the archiver. If later, a candidate *p* is computed such that *F*(*p*) lies on a "better" front, the current values of Δ and  may not be adequate any more to suitably approximate this front.

In the following, we investigate the limit behavior of ArchiveUpdateHD.

**Theorem 1.** *Let (MOP) be given and <sup>Q</sup>* <sup>⊂</sup> <sup>R</sup>*<sup>n</sup> be compact, and let there be no weak Pareto points in Q*\*PQ. Furthermore, let F be continuous and injective, and*

$$\forall \mathbf{x} \in Q \; and \; \forall \delta > 0: \quad P(\exists l \in \mathbb{N} \; : \; P\_l \cap B\_\delta(\mathbf{x}) \cap Q \neq \mathcal{Q}) = 1. \tag{9}$$

*Then, an application of Algorithm 1, where ArchiveU pdateHD (Algorithm 2) is used to update the archive, leads to a sequence of archives Al*, *l* ∈ N*, where the following holds:*

*(a) There exists a l*<sup>1</sup> <sup>∈</sup> <sup>N</sup> *and* <sup>Δ</sup><sup>+</sup> <sup>&</sup>gt; <sup>0</sup> *such that*

<sup>Δ</sup>*<sup>l</sup>* <sup>=</sup> <sup>Δ</sup>+, <sup>∀</sup>*<sup>l</sup>* <sup>≥</sup> *<sup>l</sup>*1, *with probability one.*


$$\lim\_{l \to \infty} \operatorname{dist}(\mathcal{A}\_l, P\_{\mathbb{Q}}) = 0, \quad \text{with probability one.}$$

*(d) There exists a l*<sup>3</sup> ∈ N *such that*

$$d\_H(F(A\_I), F(P\_Q)) \le \Delta^+, \quad \forall l \ge l\_{3\star} \quad \text{with probability one.}$$

**Proof.** We first show that during the run of the algorithm, only finitely many changes of the value of Δ (and hence also of *-*) can occur. Since *F* is continuous and the domain *<sup>Q</sup>* <sup>⊂</sup> <sup>R</sup>*<sup>n</sup>* is compact, also the image *<sup>F</sup>*(*Q*) is compact, and hence, in particular bounded. ArchiveUpdateHD changes the value of Δ in two cases: if (i) a reset of Δ and  is executed (line 12) or if (ii) the pruning technique is applied (line 19). In case of (i), the value of Δ*min* is increased by a constant factor *κ* > 1. The value of Δ after the *i*-th reset is hence equal to or larger than *κ<sup>i</sup>* Δ0 *min*, where <sup>Δ</sup><sup>0</sup> *min* > 0 denotes the value of Δ*min* at the start of the algorithm. A reset is applied if the distance of the image of the candidate solution *p* to the image of an archive element *a* is larger than the current value of Δ (line 11). Since *F*(*PQ*) is bounded, only a finite number of such resets can be applied during the run of the algorithm.

Case (ii) happens if the magnitude of the current archive is *N* + 1. New candidate solutions *p* are added to the archive in lines 5 and 6 and lines 9 and 10. Lines 9 and 10 describe a dominance replacement which does not increase the magnitude of the archive. Hence, such replacements do not lead to an application of the pruning. A candidate *p* can be further added to the current archive *A* if one of the following statements is true (line 5):

$$\begin{aligned} \mathcal{E}\_1 \, \exists a \in A : a \prec\_{\Theta a} p \,, \text{or} \\ \mathcal{E}\_2 \, \exists a \in A : a \prec p \text{ and } \forall a \in A : ||F(a) - F(p)|| > \Theta \Delta \end{aligned} \tag{10}$$

Since *F*(*Q*) is bounded, there exists for every *a* ∈ *Q* a (large enough) Δ*<sup>a</sup>* > 0 so that *a* ≺Θ *p*, where *-* = (Δ*a*, ... , <sup>Δ</sup>*a*)*T*. Similarly, *<sup>F</sup>*(*a*) <sup>−</sup> *<sup>F</sup>*(*p*) <sup>&</sup>lt; ΘΔ*<sup>a</sup>* if <sup>Δ</sup>*<sup>a</sup>* is large enough. Since in each pruning step, the value of Δ is increased by the factor of (*N* + 1)/*N* and since only finitely many resets are executed, also only finitely many prunings can be applied during the run of the algorithm.

Note that ArchiveUpdateHD differs from ArchiveUpdateTight2 in two parts: the reset strategy (lines 11–15) and the pruning technique (lines 18–37), and that both these parts come with a change of the values of Δ and *-*. In other words, ArchiveUpdateHD is identical to ArchiveUpdateTight2 as long as no change in Δ and  occurs. For this case, we can hence apply the theoretical results on ArchiveUpdateTight2 for ArchiveUpdateHD. Now, consider a fixed value of Δ (and hence also *-*). During the run, it can either be the case that (i) all magnitudes of *Al* are less than or equal to *N* (i.e., no pruning is applied), or that (ii) this magnitude is *N* + 1 at one point, leading to an application of the pruning technique. In case (i), we can use Theorem 7.4 of [54] on ArchiveUpdateTight2: there exists with probability of one a ¯*<sup>t</sup>* ∈ N such that the sets *<sup>F</sup>*(*Al*) form a <sup>Δ</sup>-tight *-*-approximate Pareto front for all *<sup>l</sup>* <sup>≥</sup> ¯ *l*. Note that once *F*(*Al*) forms such an object, no more resets can occur: assume there exists a candidate solution *p* that dominates an element *a* ∈ *Al*, and where *F*(*p*) − *F*(*a*)<sup>∞</sup> > Δ. The latter means that

$$\max\_{i=1,\ldots,k} f\_i(a) - f\_i(p) > \Delta \tag{11}$$

which in turn means that *a* does not *-*-approximate *p*, which is a contradiction to the assumption on *Al*. In case of (ii), the value of Δ is simply not large enough for the *N*element archive to form a Δ-tight *-*- approximate Pareto front. Again, by Theorem 7.4 of [54], there exists in this case with a probability of one a finite iteration number where the magnitude will exceed *N*. As discussed above, the pruning can only be applied finitely many times during the run of the algorithm. Hence, the value of Δ will, with a probability of one, stay fixed from one iteration onwards, which proves part (a).

Parts (b) and (c) follow from Theorem 7.4 of [54] and part (a), and finally, part (d) follows from parts (b) and (c) and the definition of the Hausdorff distance.


*can of course instead use* Δ = *-* = (Δ1, ... , Δ*k*)*<sup>T</sup> using different values* Δ*i. In that case, the following modifications have to be done: (i) the last condition in line 5 has to be replaced by*

$$\exists a \in A \;:\; |f\_i(a) - f\_i(p)| \le \Delta\_{i\prime} \quad i = 1, \dots, k.$$

*Furthermore, (ii) the condition for the reset in line 11 has to be replaced by*

$$\exists i \in \{1, \ldots, k\} \;:\; |f\_i(p) - f\_i(a)| > \Delta\_i.$$

*(e) The value of* Δ *computed throughout the algorithm yields an approximation quality of the archivers in the Hausdorff sense. The theoretical upper bound of the final value* Δ<sup>+</sup> *is twice the value of the actual Hausdorff approximation as the following discussion shows (refer to Figure 3): assume we are given a linear front with slope* −1*, and we are given a budget of N* = 2 *elements (the discussion is analog for general N). The ideal archive as computed by ArchiveUpdateHD is in this case A* = {*a*1, *a*2}*, where the ai's are the end points of the Pareto set. Assume we have F*(*a*1)=(0, 1)*<sup>T</sup> and F*(*a*2)=(1, 0)*T; then, the Hausdorff distance of the Pareto front and A is* 1/2 *determined by the point ym* = (1/2, 1/2)*T. Given this archive, for any value* Δ < 1 *and assuming that F*(*Q*) *is large enough, there exists a candidate p such that p is not dominated by a*<sup>1</sup> *or a*<sup>2</sup> *and that F*(*ai*) − *F*(*p*) > Δ*, i* = 1, 2*. Hence, p will be added to the archiver—and later on discarded (lines 23–26). The latter leads to an increase of* Δ*.*

*On the one hand, one suggesting strategy would be to take* <sup>1</sup> <sup>2</sup>Δ<sup>+</sup> *as a Hausdorff approximation of the Pareto front in particular, since most Pareto fronts have at least one element where the slope of the tangent space is* −1*. On the other hand, the use of --dominance prevents that the images <sup>F</sup>*(*a*)*, <sup>a</sup>* <sup>∈</sup> *A, are perfectly evenly distributed along the Pareto front so that* <sup>1</sup> <sup>2</sup>Δ<sup>+</sup> *is not that accurate for some problems. In fact, this factor of two can only be observed for linear fronts, while* Δ<sup>+</sup> *already yields a good approximation in general (see, e.g., the subsequent results for MOPs with more than two objectives). However, we have observed that the following estimation gives even better approximations of the Hausdorff distances: given A* = {*a*1, ... , *aN*}*, which is sorted (e.g., according to objective f*1*), the current Hausdorff approximation h is computed as follows:*

$$d\_i := \begin{cases} \|F(a\_{i+1}) - F(a\_i)\|, & \text{if } \|F(a\_{i+1}) - F(a\_i)\| \le 2\Delta \\ 0, & \text{else} \end{cases}, \quad i = 1, \ldots, N - 1$$
  $h := \frac{1}{2} \max\_{i=1, \ldots, N-1} d\_i.$ 

*Note that the distance is set to* 0 *if the distance between two neighboring candidate solutions is larger or equal to* 2Δ*, which has been done to take into account approximations of Pareto fronts that fall into several connected components.*

*(f) Several norms are used within the algorithm. While one is—except in line 11, see the above proof—in principle free for the choice of the norms, we suggest taking the infinity norm in line 5 in order to reduce the issue mentioned in the previous part, and the 2 norm in lines 28 and 29 in order to obtain a (slighly) better distribution of the entries along the Pareto front.*

Algorithm 3 shows the modifications of ArchiveUpdateHD discussed above, which have been used for the calculations presented in this work. Hereby, <sup>Δ</sup>*min* <sup>∈</sup> <sup>R</sup>*<sup>k</sup>* <sup>+</sup> denotes the vector of minimal elements for each entry Δ*i*.

**Figure 3.** Linear Pareto front with slope −1. If for *N* = 2, the archive is given by *A* = {*a*1, *a*2} such that *F*(*a*1) and *F*(*a*2) are the end points of the Pareto front, then the Hausdorff distance of *A* and the Pareto front is given by *h* = *F*(*a*1) − *ym* <sup>∞</sup>, where *ym* is the arithmetic mean of *F*(*a*1) and *F*(*a*2). For Δ < 2*h*, there may exist candidate solutions *p* that will be considered by the archive (line 5 of Algorithm 1) but discarded in the same step (lines 23 to 26 of Algorithm 1), leading to an increase of Δ.

**Remark 2.** *For the performance assessment of MOEAs, it is typically advisable to take instead of the Hausdorff distance dH the averaged Hausdorff distance* Δ*p. The main reason for this is that MOEAs may compute a few outliers in particular if the MOP contains weakly dominated solutions that are not optimal (also called dominance resistance solutions [72]). On the other hand, we stress that* Δ*p, opposed to dH, is not a metric in the mathematical sense, since the triangle inequality does not hold. We refer, e.g., to [13,38,73,74] for more discussion on this matter.*

*In the following, we discuss one possibility to obtain an approximation of the value of* Δ*<sup>p</sup> from a given archive A. To this end, we first investigate the value of* Δ*<sup>p</sup> if the elements of A are perfectly located around a linear connected Pareto front (if N is large enough, we can expect that this approximation works fine for any connected Pareto front). That is, all ai values are optimal. Furthermore, if A is sorted, F*(*a*1*) and F*(*aN*) *are the end points of the Pareto front, and the distance of two consecutive elements F*(*ai*) *and F*(*ai*+1) *is given by* 2*h (leading to dH* = *h). Since all the ai values are optimal, the* Δ*<sup>p</sup> value is hence given by the value of IGDp, which can be computed as follows:*

$$\begin{split} \Delta\_{\mathbb{P}}(F(A), F(P\_Q)) &= IGD\_{\mathbb{P}}(F(A), F(P\_Q)) = \left( \frac{1}{F(a\_N) - F(a\_1)} \int\_{F(a\_1)}^{F(a\_N)} \operatorname{dist}(t, F(A))^p dt \right)^{\frac{1}{p}} \\ &= \left( \frac{1}{(N-1)2h} 2(N-1) \int\_0^h t^p dt \right)^{\frac{1}{p}} \\ &= \left( \frac{1}{h} \left[ \frac{1}{p+1} t^{p+1} \right]\_0^h \right)^{\frac{1}{p}} = \left( \frac{1}{h} \frac{1}{p+1} h^{p+1} \right)^{\frac{1}{p}} \\ &= \sqrt[p]{\frac{1}{p+1}} \cdot h \end{split} \tag{13}$$

*Hereby, we have used the formulation of IGDp for continuous Pareto fronts as discussed in [73]. It remains to compute h. Since the assumption that all the images of the ai values are evenly spread is ideal, we cannot simply take* <sup>1</sup> <sup>2</sup> *F*(*ai*+1) − *F*(*ai*) *for an arbitrarily index i* ∈ {1, ... , *N* − 1}*. Instead, it makes sense to use the average of these distances:*

$$\text{Jh} \approx \frac{\sum\_{i=1}^{N-1} \tilde{d}\_i}{2m},\tag{14}$$

*where* ˜*di is as in (12) and m denotes the number of elements of* ˜*di that are not equal to zero. This leads to the approximation dp of the averaged Hausdorff distance* Δ*<sup>p</sup> of the Pareto front by a given archive A:*

$$d\_p := \sqrt[p]{\frac{1}{p+1}} \cdot \frac{\sum\_{i=1}^{N-1} \tilde{d}\_i}{2m}. \tag{15}$$

In order to obtain a first impression on the effect of the archiver, we apply it to several test problems. More precisely, we use ArchiveUpdateHD together with the generator, which is simply choosing candidate solutions uniformly at random from the domain of the problem. As test problems, we use CONV (convex front), DENT ([75], convex-concave front), RUD1 and RUD2 (disconnected fronts), LINEAR (linear front) and RUD3 (convex front). The first five test problems are uni-modal, while RUD3 has next to the Pareto front eight local fronts. RUD3 is taken from [76], and RUD1 and RUD2 are straightforward modifications of RUD3 to obtain the given Pareto front shapes.

Figure 4 shows the final approximations of the fronts using *N* = 30 for the archive size and initial values of Δ small enough so that this threshold is reached for all problems. As it can be seen, in all cases, evenly distributed solutions along the Pareto fronts have been obtained. Figure 5 shows the actual Hausdorff and averaged Hausdorff values of the computed archives in each step for one run of the algorithm (*dH* and Δ2, i.e., *p* = 2 has been used for the averaged Hausdorff distance), together with their approximations *h* and *d*2. For all problems, the archiver is capable of quickly determining a good approximation of both *dH* and Δ<sup>2</sup> during the run of the algorithm. Tables A1 and A2 show the approximation qualities averaged over 30 independent runs, which support the observations from Figure 5. Figure 6 shows the evolution of the value of Δ during one run of the algorithm for DENT and RUD3. For the uni-modal problem DENT, the value of Δ is essentially increasing monotonically (i.e., not counting the first few iteration steps), while for the multi-modal problem RUD3, more than 10 restarts occur. Nevertheless, in both cases, a final value Δ<sup>+</sup> is reached, which is in accord with Theorem 1.

Figure A1 shows the box collections

$$\mathcal{C}(A\_f) := \bigcup\_{a \in A\_f} B\_{\Lambda\_f}(F(a))\tag{16}$$

of the final archives *Af* and the final value Δ*<sup>f</sup>* for the test problems, where *B*Δ(*x*) denotes the Δ-ball around *x* using the maximum norm. The figure indicates that the Hausdorff distance of *F*(*Af*) and the respective Pareto fronts is indeed less or equal to Δ*<sup>f</sup>* for all problems.

**Algorithm 3** {*A*, Δ, *h*} := *ArchiveU pdateHD*(*P*, *A*0, Δ0, *N*)

**Require:** Problem (MOP), where *k* = 2, *P*: current population, *A*0: current archive, <sup>Δ</sup><sup>0</sup> <sup>∈</sup> <sup>R</sup>*<sup>k</sup>* <sup>+</sup>: current values of Δ, *N*: upper bound for archive size **Ensure:** updated archive *A*, updated values for Δ, Hausdorff approximation *h* 1: *A* := *A*<sup>0</sup> 2: Δ := Δ<sup>0</sup> 3: *-* := Δ 4: **for all** *p* ∈ *P* **do** 5: **if** ∃*a* ∈ *A* : *a* ≺ *p*, or ∃*a* ∈ *A* : *a* ≺ *p* and ∃*a* ∈ *A* : | *fi*(*a*) − *fi*(*p*)| ≤ Δ*i*, *i* = 1, . . . , *k* **then** 6: *A* := *A* ∪ {*p*} 7: **end if** 8: **for all** *a* ∈ *A* **do** 9: **if** *p* ≺ *a* **then** 10: *A* := *A* ∪ {*p*}\{*a*} 11: **if** ∃*i* ∈ {1, . . . , *k*} : *fi*(*a*) − *fi*(*p*) > Δ*<sup>i</sup>* **then** reset Δ and *-* 12: Δ := Δ*min* 13: *-* := Δ 14: **end if** 15: **end if** 16: **end for** 17: **if** |*A*| = *N* + 1 **then** apply pruning 18: <sup>Δ</sup> :<sup>=</sup> *<sup>N</sup>*+<sup>1</sup> *<sup>N</sup>* <sup>Δ</sup> 19: *-* :<sup>=</sup> *<sup>N</sup>*+<sup>1</sup> *<sup>N</sup> -* 20: sort *A* (e.g., according to *f*1) 21: compute *<sup>d</sup>* <sup>∈</sup> <sup>R</sup>*<sup>N</sup>* as in (8) 22: choose *m* ∈ arg min *d* 23: **if** *m* = 1 **then** 24: *A* := *A*\{*a*2} remove 2nd entry 25: **else if** *m* = *N* **then** 26: *A* := *A*\{*aN*} remove 2nd but last entry 27: **else** 28: *dl* := *F*(*am*+1) − *F*(*am*−1)<sup>2</sup> 29: *dr* := *F*(*am*+2) − *F*(*am*)<sup>2</sup> 30: **if** *dl* < *dr* **then** 31: *A* := *A*\{*am*} 32: **else** 33: *A* := *A*\{*am*+1} 34: **end if** 35: **end if** 36: **end if** 37: **end for** 38: sort *A* (e.g., according to *f*1) compute Hausdorff approximation 39: compute ˜*di*, *<sup>i</sup>* <sup>=</sup> 1, . . . , <sup>|</sup>*A*| − 1 as in (12) 40: *h* := <sup>1</sup> <sup>2</sup> max*i*=1,...,|*A*|−<sup>1</sup> ˜*di* 41: **return** {*A*, Δ, *h*}

**Figure 4.** Numerical results of ArchiveUpdateHD on six BOPs with different shapes of the Pareto fronts. For the sake of clarity, we omitted the fronts that already become apparent by the approximations.

**Figure 5.** Hausdorff and averaged Hausdorff approximations (*dh* and Δ2, respectively) obtained by ArchiveUpdateHD for one single run for six bi-objective problems (see Figure 4) together with their approximations *h* and *d*2. *dH* is plotted black solid, *h* is black dashed, Δ<sup>2</sup> is blue solid, and *d*<sup>2</sup> is blue dashed.

**Figure 6.** Evolution of the value of Δ for one run of the algorithm on DENT and RUD3.

### *3.2. The General Case*

Next, we consider the archiver for MOPs with more than two objectives. Algorithm 4 shows the pseudocode of ArchiveUpdateHD for such problems. The archiver is essentially identical to the one for BOPs; however, it comes with two modificatons, since one cannot expect the Pareto front to form a one-dimensional object any more and another one prevents too many unnecessary resets during the run of the algorithm.

1. The distances cannot be be sorted any more as in (8). Instead, one has to consider the distances

$$d\_{i,j} = \|F(a\_i) - F(a\_j)\|\_\prime \quad i, j = 1, \dots, |A|\_\prime \quad j > i,\tag{17}$$

for a given archive *A*. Furthermore, more sophisticated considerations of the distances as, e.g., in lines 27 and 28 of Algorithm 3 cannot be considered any more. Instead, we have chosen to first compute

$$d\_{i\_m, j\_m} \in \arg\min\_{i\_\circ = 1, \dots, \mathcal{N} + 1} d\_{i, j\_\circ} \tag{18}$$

and then to remove *al* from the archiver, where *l* is chosen randomly from {*im*, *jm*}. Similar as for the bi-objective case, an exception can of course be made for the best found solutions for each objective value.


$$f\_i(a) - f\_i(p) > \Delta\_i \quad i = 1, \dots, k.$$

That is, the improvement is larger than Δ*<sup>i</sup>* for *all* objectives. It has been observed that if one only asks for an improvement in one objective (as done for the bi-objective case), too many resets are performed in particular for MOPs that contain a "flat" region of the Pareto front.

Note that none of these changes affects the statements made in Theorem 1. Hence, the statements of Theorem 1 also hold if Algorithm 4 is used for MOPs with *k* > 2 objectives. We stress that this algorithm can of course also be used for the treatment of BOPs; however, in that case, Algorithm 3 seems to be better suited, since both distance considerations and Hausdorff approximation are more sophisticated.

Figure 7 shows an application of Algorithm 4 on the test function DTLZ2 with three objectives (concave and connected Pareto front) for *N* = 300 and *N* = 500. The evolution of the approximated value Δ of the Hausdorff distance *dh*(*F*(*A*), *F*(*PQ*)) together with the real value can be found in Figure 8. Hereby, we have used ArchiveUpdateHD as the external archiver of NSGA-II. The same result could have been obtained using randomly chosen test points within the domain *Q*, however, for a much higher amount of test points. Figures 9 and 10 show the respective results for DTLZ7, whose Pareto front is disconnected and convex-concave. In all cases, the archiver is capable of finding evenly spread solutions along the Pareto front, and the value of Δ is already after some iterations quite close to the actual Hausdorff distance. In order to suitably handle weakly optimal solutions, we have used the approach we describe in the following remark.

**Remark 3.** *It is known that distance-based archiving/selection for MOPs that contains weakly optimal solutions that are not optimal (dominance-resistant solutions) may lead to unsatisfactory results, since candidates may be included in the archive that are far away from the Pareto front. In [77], it has been suggested to consider the modified objectives*

$$\bar{f}\_i(\mathbf{x}) = (1 - \alpha)f\_i(\mathbf{x}) + \frac{\alpha}{m} \sum\_{i=1}^k f\_i(\mathbf{x}), \quad i = 1, \dots, k,\tag{19}$$

*where α* > 0 *is "small", instead of the orginal objectives fi, i* = 1, ... , *k. We have adopted this approach for the treatment of the ZDT and DTLZ functions in this work, using α* = 0.02*.*

**Algorithm 4** {*A*, Δ} := *ArchiveU pdateHD*(*P*, *A*0, Δ0, *N*)

**Require:** Problem (MOP), *<sup>P</sup>*: current population, *<sup>A</sup>*0: current archive, <sup>Δ</sup><sup>0</sup> <sup>∈</sup> <sup>R</sup>*<sup>k</sup>* <sup>+</sup>: current value of Δ, *N*: upper bound for archive size

**Ensure:** updated archive *A*, updated value of Δ

1: *A* := *A*<sup>0</sup> 2: Δ := Δ<sup>0</sup> 3: *-* := Δ 4: **for all** *p* ∈ *P* **do** 5: **if** ∃*a* ∈ *A* : *a* ≺ *p*, or ∃*a* ∈ *A* : *a* ≺ *p* and ∃*a* ∈ *A* : | *fi*(*a*) − *fi*(*p*)| ≤ Δ*i*, *i* = 1, . . . , *k* **then** 6: *A* := *A* ∪ {*p*} 7: **end if** 8: **for all** *a* ∈ *A* **do** 9: **if** *p* ≺ *a* **then** 10: *A* := *A* ∪ {*p*}\{*a*} 11: **if** *fi*(*a*) − *fi*(*p*) > Δ*i*, *i* = 1, . . . , *k*, **then** reset Δ and *-* 12: Δ := Δ*min* 13: *-* := Δ 14: **end if** 15: **end if** 16: **end for** 17: **if** |*A*| = *N* + 1 **then** apply pruning 18: <sup>Δ</sup> :<sup>=</sup> *<sup>N</sup>*+<sup>1</sup> *<sup>N</sup>* <sup>Δ</sup> 19: *-* :<sup>=</sup> *<sup>N</sup>*+<sup>1</sup> *<sup>N</sup> -* 20: compute *di*,*<sup>j</sup>* as in (17) 21: choose *dim*,*jm* ∈ arg min*i*,*j*=1,...,*N*+<sup>1</sup> *j*>*i di*,*<sup>j</sup>* 22: choose *l* randomly from {*im*, *jm*} 23: *A* := *A*\{*al*} 24: **end if** 25: **end for** 26: **return** {*A*, Δ}

**Figure 7.** Results of ArchiveUpdateHD on DTLZ2 for different values of *N*.

**Figure 8.** Real (blue) and approximated (red) Hausdorff distances during the run of one algorithm for DTLZ2.

**Figure 9.** Results of ArchiveUpdateHD on DTLZ7 for different values of *N*.

**Figure 10.** Real (blue) and approximated (red) Hausdorff distances during the run of one algorithm for DTLZ7.

**Remark 4.** *We finally stress that the archiver A only reaches the magnitude N if* Δ *(and hence -) is chosen "small enough", which does not represent a drawback in our opinion. In real-world applications, the values of* Δ *have a physical meaning. As a hypothetical example, consider that one objective in the design of the car is its maximal speed (e.g., f*<sup>1</sup> = *smax), and the decision maker considers two cars to have different maximal speeds if smax differs by at least 10 km/h. In this case* Δ<sup>1</sup> = 10 *is a suitable choice for ArchiveUpdateHD. Hence, depending on these values and the size of the Pareto front, it may happen that less than N elements are needed to suitably represent the solution set. In turn, if* Δ<sup>+</sup> *is (significantly) larger than the target values, this gives a hint to the decision maker that N has to be increased and that the computation has to be repeated in order to obtain a "complete" approximation. Figure 11 shows two results of ArchiveUpdateHD on CONV for two different starting values of* Δ*.*

**Figure 11.** Results of ArchiveUpdateHD on CONV for two different initial values of Δ using *N* = 30. For Δ<sup>0</sup> = 0.01, the final archive contains 30 elements, while there are only 28 elements for Δ<sup>0</sup> = 0.05. The solutions on the left are more evenly spread along the Pareto front due to distance considerations in the pruning technique. For the solution on the right, no pruning technique has been applied during the run of the algorithm.

### **4. Numerical Results**

In this section, we show some more numerical results to further demonstrate the advantage of the proposed archiver. As base MOEAs, we have chosen to take the stateof-the-art algorithms NSGA-II (dominance based), MOEA/D (decomposition based), and SMS-EMOEA (indicator based). We have used the implementations of the algorithms as well as the reference fronts provided by PlatEMO [78]. For sake of a fair comparison, we will in the following equip these MOEAs with ArchiveUpdateHD as an external archiver, where the upper bound *N* is chosen equal to the population sizes. For each run of an algorithm, we have fed the archiver with exactly the same candidate solution as for the respective base MOEA.

Motivated by Theorem 1 and by the discussion made in Remark 2, we will primarily use Δ*<sup>p</sup>* (*p* = 2) for the performance assessment of the MOEA results. However, we will also use the Hpyervolume indicator [79], leading to some surprising results.

We first make a comparison with NSGA-II to investigate possible cyclic behavior during the run of an algorithm. It is known that distance-based selection/archiving strategies may lead to such cyclic behavior, which means that from a certain stage of the search, no more improvements can be expected (and in particular no convergence). The selection strategy of NSGA-II is mainly distance based (since from a certain point on, all individuals of the population are mutually non-dominated). For this, we have set both population size and *N* to 50 and have run NSGA-II for 1000 generations, using *Pc* = 1 and *η<sup>c</sup>* = 20 for SBX, and *Pm* = 1/*N* and *η<sup>m</sup>* = 20 for polynomial mutation. For ArchiveUpdateHD, we have chosen the first *-*<sup>0</sup> small enough so that archive size |*A*| = *N* was reached for all problems. In Figure 12, typical evolutions of the values of the approximation qualities Δ*<sup>p</sup>* are shown over time for six selected BOPs (similar plots are obtained for all problems considered in this study). Hereby, "NSGA-II" stands for the population of the MOEA, and "NSGA-II-A" stands for the respective archive that was fed with the same candidate solutions as NSGA-II. While NSGA-II reveals clear cyclic behavior in all cases, this is not the case for NSGA-II-A. The latter is due to the acceptance strategy of ArchiveUpdateHD that is based on *-*-dominance. As discussed above, the value of Δ (and hence also of *-*) will become large enough during the run of the algorithm so that only dominance replacements will occur, which, however, cannot lead to cyclic behavior.

Apart from the "quasi-monotonic" behavior, one can also observe that the Δ<sup>2</sup> values of NSGA-II-A are for all test problems significantly lower than the ones of NSGA-II.

**Figure 12.** Approximation qualities of the Pareto fronts (measured by Δ2) during one run of the algorithm for NSGA-II (blue) and the archives NSGA-II-A (black) for six selected BOPs.

Next, we investigate the performance of ArchiveUpdateHD as an external archive for the three MOEAs using an extended set of test functions. We first consider bi-objective problems. For this, we have chosen the ZDT problems [80], where we have used the modified objectives as expressed in (19) using *α* = 0.02 to handle weak Pareto optimal solutions that are not optimal. Next to these six test problems, we have taken another four BOPs, which were selected due to the shapes of their fronts: LIN ([81], linear front), CONV (convex front), as well as DENT and SSW [82], which have both convex–concave Pareto fronts. The boxplots of the results are shown in Figures 13 and 14, based on 30 independent runs, for each using 1000 generations and a population size of 50. The Wilcoxon rank-sum is shown in Table 1. In the following, we will compare the results of the base algorithms against the respective solutions that use ArchiveUpdateHD.

**Figure 13.** Boxplots for the obtained results for the ZDT test functions.

**Figure 14.** Boxplots for the obtained results for DENT, SSW, CONV and LIN.

The performance of the external archives is better in 10 out of 10 cases for NSGA-II, in 9 out of 10 cases for MOEA/D, and in 8 out of 10 cases for SMS-EMOA. ArchiveUpdateHD loses against the MOEA/D and SMS-EMOA on test problem LIN, which has a linear and thus most possible regular Pareto front. Both MOEA/D and SMS-EMOA are able to compute perfect solutions for this problem. Such perfect approximations cannot be expected from ArchiveUpdateHD due to its acceptance strategy. While this strategy is responsible for suppressing any cyclic behavior, it also prevents that all the solutions even of the limit archive are perfectly evenly spread along the Pareto front. For more complex Pareto fronts, the situation, however, changes. Figures A2 and A3 show the average results obtained by the different methods on problems DENT and ZDT, respectively. For DENT, the use of ArchiveUpdateHD leads in all three cases to significantly better Pareto front approximations. This is similar to ZDT3, while the improvements are less, since all three base MOEAs can already detect very good approximations.

**Table 1.** Comparison (wins/ties/losses) of the results of the base MOEAs against their archive equipped variants on the bi-objective test problems. The Wilcoxon rank-sum test has been used for statistical significance, where *p*-value < 0.05.


In a next step, we investigate the effect of ArchiveUpdateHD on several test problems with *k* = 3 objectives. For this, we have chosen the seven DLTZ test problems, the test functions IDTLZ1 and ITDLZ2 [21] with "inverted" fronts, and MaF1 to 5 [83]. Table 2 shows the Wilcoxon rank-sum for these 14 test problems using the indicators Δ*p*, HV, as well as the classical Hausdorff distance *dH*. Figure A4 shows the boxplots for all algorithms and test problems for Δ*p*, Figures A5 and A6 show the results of the algorithms on IDTLZ1 and MaF2, respectively, and Figure A7 shows the selected behaviors of the Hausdorff approximations. For the latter, we have taken the median runs with respect to Δ*p*. As it can be seen from Table 2, the use of ArchiveUpdateHD as the external archiver is highly beneficial in almost all cases. More precisely, starting with Δ*p*, NSGA-II-A is better than NSGA-II in 12 out of 14 cases, and it only becomes (slightly) beaten on DTLZ5 and 6, which is likely owed to the degeneration the Pareto fronts of these two test problems (which has to be investigated in more detail in the future). MOEAD-A is superior to MOEAD in 10 out of 14 cases with one tie (DTLZ4) and 3 losses (DLTZ1-3), which is due to the regular structure of these Pareto fronts where MOEA/D can hardly be beaten. Finally, SMS-EMOA-A yields better results in all of the 14 cases. The situation is quite similar when considering the other two performance indicators. While this was expected for *dH*, the results are surprising for HV: note that SMS-EMOA-A also outperforms SMS-EMOA on all of the 14 test functions when considering the hypervolume indicator.

Finally, Figures A8 and A9 show evolutions of the obtained Hausdorff distances of NSGA-II-A for the same test problems but now using *k* = 4 and *k* = 5 objectives. The results already give evidence that the values of Δ obtained by ArchiveUpdateHD yield satisfying approximations of the actual values of *dH* also for problems with more objectives. The only exceptions are DTLZ5 and DTLZ6 (both for four and five objectives) as well as MaF4 for *k* = 5. By Theorem 1, we know that the runs of the algorithms have simply not been long enough, while it is in turn unclear how long these runs should have been. While these results are satisfying, more investigation has to be done in particular for the treatment of many objective problems, which we leave for future study.


**Table 2.** Comparison (wins (1) / ties (0) / losses (−1)) of the results of the base MOEAs against their archive equipped variants on the 14 three-objective test problems. The Wilcoxon rank-sum test has been used for statistical significance, where *p*-value < 0.05.

### **5. Conclusions and Future Work**

In this paper, we have presented and analyzed the archiving strategy ArchiveUpdateHD for use within set-based stochastic search algorithms such as multi-objective evolutionary algorithms (MOEAs) for the treatment of multi-objective optimization problems (MOPs). ArchiveUpdateHD is a bounded archiver that is based on distance dominance, *-*-dominance and the distances among the candidate solutions and that aims for evenly spread solutions along the Pareto front of a given MOP. We have shown that the images *F*(*Ai*) of the sequence of archives *Ai* generated by this archiver form under certain (mild) conditions of the process to generate candidate solutions with a probability of one of a Δ+-approximation of the Pareto front in the Hausdorff sense, and all entries of *Ai* converge to Pareto optimal solutions with a probability of one and for *i* → ∞. Furthermore, the value Δ<sup>+</sup> is computed by ArchiveUpdateHD during the run of the algorithm (without any prior knowledge of the Pareto front). Since this value represents the maximal error in the representation, it is of important value for the decision maker (DM). In particular, if the magnitude of the archives reaches the pre-defined value *N*, the value of Δ<sup>+</sup> gives a feedback if the approximation is "complete enough" or not. Empirical studies on several benchmark test problems have shown the benefit of the novel strategy, among others, that the obtained value Δ<sup>+</sup> gives a good approximation of the actual Hausdorff approximation. For bi-objective problems, we have presented an alternative way to compute this value, which can even be considered to be tight from the practical point of view. Finally, we have used ArchiveUpdateHD as the external archiver for three state-of-the-art MOEAs (NSGA-II, MOEA/D, and SMS-EMOA), indicating that it is capable of significantly improving the overall performance of these algorithms.

One important next step which we will leave for future work is to use the mechanisms behind ArchiveUpdateHD as the selection strategy within an MOEA. If an external archiver is used, two archives (instead of only one) have to be maintained, leading to an additional overhead, which could be avoided. It is hence intended to utilize ArchiveUpdateHD to design a new class of MOEAs that aims for (averaged) Hausdorff approximations of the Pareto fronts (as, e.g., done in [84–86]).

**Author Contributions:** C.I.H.C. and O.S. have contributed equally to this work. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Data Availability Statement:** The codes will be made publicly available after acceptance.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **Appendix A**

**Table A1.** Hausdorff distances *dH* and approximations *h* computed by ArchiveUpdateHD for the six bi-objective problems.


**Table A2.** Averaged Hausdorff distances Δ<sup>2</sup> and approximations *d*<sup>2</sup> computed by ArchiveUpdateHD for the six bi-objective problems.


**Figure A1.** The box coverings *C*(*Af*) for the final archives incidate that the Hausdorff distance between *F*(*Af*) and the Pareto fronts is less than the final value Δ*<sup>f</sup>* computed by ArchiveUpdateHD for all test problems.

**Figure A2.** Numerical results of the different algorithms and archiving/selection strategies on DENT.

**Figure A3.** Numerical results of the different algorithms and archiving/selection strategies on ZDT3.

**Figure A4.** Boxplots for the considered three-objective test functions.

**Figure A5.** Numerical results of the different algorithms and archiving/selection strategies on IDTLZ1 for *k* = 3.

**Figure A6.** Numerical results of the different algorithms and archiving/selection strategies on MaF2 for *k* = 3.

**Figure A7.** Evolution of the Hausdorff distances *dH*(*F*(*A*), *F*(*PQ*)) of NSGA-II-A and the computed approximations Δ for several test functions, using *k* = 3 objectives.

**Figure A8.** Evolution of the Hausdorff distances *dH*(*F*(*A*), *F*(*PQ*)) of NSGA-II-A and the computed approximations Δ for several test functions, using *k* = 4 objectives.

**Figure A9.** Evolution of the Hausdorff distances *dH*(*F*(*A*), *F*(*PQ*)) of NSGA-II-A and the computed approximations Δ for several test functions, using *k* = 5 objectives.

### **References**


### *Article* **Multi-Objective Optimization of an Elastic Rod with Viscous Termination**

**Siyuan Xing <sup>1</sup> and Jian-Qiao Sun 2,***<sup>∗</sup>*


**Abstract:** In this paper, we study the multi-objective optimization of the viscous boundary condition of an elastic rod using a hybrid method combining a genetic algorithm and simple cell mapping (GA-SCM). The method proceeds with the NSGAII algorithm to seek a rough Pareto set, followed by a local recovery process based on one-step simple cell mapping to complete the branch of the Pareto set. To accelerate computation, the rod response under impulsive loading is calculated with a particular solution method that provides accurate structural responses with less computational effort. The Pareto set and Pareto front of a case study are obtained with the GA-SCM hybrid method. Optimal designs of each objective function are illustrated through numerical simulations.

**Keywords:** multi-objective optimization; genetic algorithm; simple cell mapping; rod vibration; mass–damper–spring termination; impulse response

### **Citation:** Xing, S.; Sun, J.-Q. Multi-Objective Optimization of an Elastic Rod with Viscous Termination. *Math. Comput. Appl.* **2022**, *27*, 94. https://doi.org/10.3390/mca27060094

Academic Editors: Carlos Coello, Erik Goodman, Kaisa Miettinen, Dhish Saxena, Oliver Schütze and Lothar Thiele

Received: 7 September 2022 Accepted: 12 November 2022 Published: 15 November 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

### **1. Introduction**

Structures with viscous boundaries have been applied to diverse areas for vibration reduction [1], sound absorption [2], and boundary control [3]. One recent example is the railway bridge design for high-speed trains where the soil interacting with the bridge has been modeled as mass–damper–spring terminations of the structure [4]. The best design of structures has always been the pursuit of engineers. The optimal structural design must usually accommodate multiple objectives such as the settling time of vibrations, the response amplitude, and the shaping of the frequency response, leading to multi-objective optimization problems (MOPs). This paper presents a study of the multi-objective optimal design of a one-dimensional elastic rod with a mass–damper–spring termination.

The multi-objective nature of the optimization problem leads to a set of optimal solutions called the Pareto set, making set-oriented methods such as simple cell mapping (SCM) [5] suitable for solving such problems. The cell mapping method was initially developed by Hsu [6] for investigating the global behavior of nonlinear dynamical systems, then extended by Sun and his coworkers [7–9] for MOPs. The method seeks optimal solutions by constructing cell mappings based on the local dominance relation of cells in the discretized design space until the optimal solutions are achieved. Although the method is effective for low-dimensional problems, it suffers from the curse of dimensionality for high-dimensional problems because the searching space grows exponentially with the increase of the dimensions.

In terms of solving MOPs with relatively high dimensions, the evolutionary algorithms such as the genetic algorithm (GA) [10], immune algorithm [11], particle swarm optimization (PSO) [12], and ant colony optimization [13] are the mainstream methods for MOPs. The evolutionary algorithms are stochastic methods that mimic the biological evolutionary process using the evolution laws defined based on the Pareto dominance of fitness functions. Such methods can escape the local optima and rapidly discover the

domains containing the solutions. However, the results of evolutionary algorithms can be sensitive to the selection of the hyperparameters.

Recently, Sun and colleagues [5,14] proposed a hybrid method that incorporates NSGAII and simple cell mapping (SCM). The method begins with NSGAII to generate a rough set from several generations such that the domains containing optimal solutions can be outlined. Using the rough set, SCM performs a local recovery method to complete the branches of the Pareto set through iterative refinement of the design space. With the power of NSGAII, the searching domain of the simple cell mapping method has been substantially reduced, making it possible to apply SCM for high-dimensional problems. On the other hand, the SCM method can complement the GA since obtaining outlined optimal domains using the GA is not very sensitive to the selection of the hyperparameters and is much easier than obtaining detailed Pareto optimal solutions using the GA. This can reduce the burden of parameter tuning with the GA. This paper will present a new case study of MOPs by the hybrid GA-SCM method. For more discussions on the advantage of the GA-SCM method and a comparison with different methods, the reader is referred to [5] and the references therein.

To accelerate the MOP algorithms for structural design, a fast and accurate solver that can predict structural response under external loading is needed. Traditional methods such as the finite-element method for calculating structural response can result in considerable computational load. However, obtaining such a solver for structures with viscous terminations is not an easy task. This is because viscous boundary conditions lead to non-self-adjoint boundary value problems that cannot be solved by the traditional method of eigenvalue expansion. To address this issue, several analytical methods have been developed. Hull et al. [15] presented a method that applies modal expansion in the augmented spatial interval where orthogonal eigenmodes exist. Jayachandran and Sun [16] transformed the problem into a self-adjoint boundary value problem in Hilbert space. Oliveto et al. [17] proposed a complex modal expansion method, which requires formulating new orthogonality conditions. Jovannovic [18] formulated the steady-state solution in the form of Fourier series in the state space by reconstructing the differential operator of the equations of motion. Recently, Xing and Sun [19] applied a particular solution method to study the impulsive response of a 1D elastic rod subject to a mass–damper–spring termination.

In this study, we will continue the effort in [19] to optimize the viscous termination of a 1D elastic rod under impulsive loading using the GA-SCM method. The solution of this problem has many potential applications in structural and acoustic design. The dynamic response of the rod will be predicted by the particular solution method. Firstly, we will define the multi-objective optimization problem, followed by the introduction of the GA-SCM hybrid method. Then, we will formulate the impulse response of the structural problem using the particular solution method and introduce the multi-objective functions for the structural optimization problem. We will demonstrate the effectiveness of the GA-SCM method through a case study.

### **2. Multi-Objective Optimization**

A continuous multi-objective optimization problem (MOP) can be defined as

$$\begin{aligned} \min\_{\mathbf{x} \in \mathbb{R}^n} \mathbb{F}(\mathbf{x}),\\ \text{with } g\_i(\mathbf{x}) \le 0, \ i = 1, \dots, l, \\ h\_j(\mathbf{x}) = 0, \ j = 1, \dots, m\_\prime \end{aligned} \tag{1}$$

where **x** is a variable of the design space and *gi* and *hj* are the design constraints. **F** is a map comprised of objective functions *fi* (*i* = 1, 2, . . . , *k*), i.e.,

$$\mathbf{F}(\mathbf{x}) = \{f\_1(\mathbf{x}), \dots, f\_k(\mathbf{x})\},\tag{2}$$

where *fi* : *Q* → R. Herein, Q is the feasible set represented by

$$Q = \{ \mathbf{x} \in \mathcal{R}^n \mid \mathcal{g}\_i(\mathbf{x}) \le 0, \ i = 1, \dots, l,\tag{3}$$
 
$$\text{and } h\_j(\mathbf{x}) = 0, \ j = 1, \dots, m \}. \tag{3}$$

The optimal solution of the multi-objective problem is defined in the sense of Pareto optimality, which requires the introduction of the following definitions.

**Definition 1** (Dominance relation [5])**.**

*(a) A vector* **y** ∈ *Q is called strictly dominated (or simply dominated by a vector* **x** ∈ *Q (***x** ≺ **y***) if*

**F**(**x**) <*<sup>p</sup>* **F**(**y**) and **F**(**x**) = **F**(**y**),

*where* <*<sup>p</sup> is an elementwise less-than-or-equal-to relation.*

*(b) A vector* **y** ∈ *Q is called weakly dominated by a vector* **x** ∈ *Q (***x y***) if* **F**(**x**) ≤*<sup>p</sup>* **F**(**y**)*.*

The dominance relation defines the "good" solution in the sense of Pareto optimality. This is a strong relation, which can lead to many optimal solutions, because objective functions are considered as equally "good" solutions when they partially satisfy the inequality relations. To define the sets of optimal solutions and their objective functions, we introduce the Pareto set and Pareto front.

**Definition 2** (Pareto point, Pareto set, Pareto front [5])**.**


$$\mathcal{P} = \mathcal{P}\_{\mathbb{Q}} := \{ \mathbf{x} \in Q \, : \, \mathbf{x} \text{ is a Pareto point of } (1) \}. \tag{4}$$

*(e) The image* **F**(P) *of* P *is called the Pareto front.*

### **3. GA-SCM Hybrid Method**

We apply a hybrid method combining genetic algorithms (GAs) and cell mapping methods [14] to solve an MOP with multi-objective performance indices to be defined in Section 4. The hybrid method is initiated with a genetic algorithm (NSGAII) to generate a rough Pareto set in the design space, which is then used by a cell-mapping-based recovery method to seek a complete branch of the Pareto set through iterative refinement of the cellular space of the design parameters, which will be defined in Section 5. The pseudo code of the GA-SCM method is listed in Algorithm 1. The pseudo code for recovering the Pareto optimal solution is listed in Algorithm 2.

As shown in Algorithm 2, the recovery process firstly discretizes the design space and then iterates through elements of the rough Pareto set from the GA or the previous cell partition, performing a one-step simple cell mapping to search local Pareto points. If a cell is mapped to itself (i.e., a local sink is found), then the cell is pushed into the candidate set, followed by an operator to gather nearby solutions into the set to be visited (*S*tovisit) as long as they dominate some elements in the Pareto set P*s*. Otherwise, the destination cell of the cell mapping is pushed to *S*tovisit. Then, the same iterative procedure will be performed on the set *S*tovisit until no new cells can be brought into *S*tovisit. At last, a dominance check is carried out to remove non-dominant points from the Pareto set. More detail on the method can be seen in [5].

### **Algorithm 1** GA-SCM algorithm.

**Input:** Design space *Q*, cell space partition *N*, refinement partition *sub*, GA population size *n*, objective functions **F**, refinement number *k*

**Output:** Pareto set P*s*, Pareto front P*<sup>f</sup>*


### **Algorithm 2** SCM-based recovering algorithm.

**Input:** Rough Pareto set P*s*, rough Pareto front P*<sup>f</sup>* , objective functions **F**, cell space partition *N*, design space *Q*, max iteration *n*

**Output:** Pareto set P*s*, Pareto front P*<sup>f</sup>* (under the cell space partition *N*)


11: *S*tovisit ←− *S*tovisit ∪ {**x**|**x** ∈ *neighbor*(*q*) and **x** ≺ **y** where **y** ∈ P*s*} {collecting neighbors that dominate some element(s) in P*s*}

12: **end if** 13: **end for** 14: *S*visiting ←− *S*tovisit 15: P*<sup>s</sup>* ←− P*<sup>s</sup>* ∪ *Sc*, P*<sup>f</sup>* ←− P*<sup>f</sup>* ∪ **F**(*Sc*) 16: **end while** 17: P*s*,P*<sup>f</sup>* ←− dominance check(P*s*, P*f*)

The detail of the one-step simple cell mapping algorithm is listed in Algorithm 3. The method finds the local optimal solution by checking the dominance relation between a cell and its neighbor. The optimal solution is defined as the most distant cell that dominates the source cell.


**Input:** Objective functions **F**, cell *Cs*, visited cell set *S*visited **Output:** Destination cell *Cd*, visited cell set *S*visited 1: *S*nbr ←− *neighbor*(*Cs*) 2: **for** *N* in *Snbr* **do** 3: **if** *N* ≺ *Cs* **and** constraints satisfied **then** {**F**(*N*) can be fetched from visited set directly if *N* is visited.} 4: Store *N* 5: *S*visited ←− *S*visited ∪ {*N*} 6: **end if** 7: **end for** 8: *Cd* ←− arg{max*qs* − *qnbr*2} {*qs* and *qnbr* are the cell centers of *Cs* and *Snbr*}

Given the numerical computation of the impulse response of the rod is the most time-consuming subroutine in this problem, we record all visited cells using a dictionary structure, whose key is the cell index and whose values consist of the multi-objective functions. This way, the algorithm can search for the values in the dictionary with a time complexity *O*(1), eliminating the repeated computation for cells that have been visited. In addition, the key of a dictionary is unique. Pushing a visited cell to the dictionary will automatically replace the repeated one. Therefore, our implementation, different from that in [14], does not require combining the repeated cells in the visited set.

### **4. Multi-Objective Optimization of Mass–Damper–Spring Termination**

### *4.1. Impulse Response*

The one-dimensional elastic rod with a mass–damper–spring termination is shown in Figure 1. An impact loading *f*(*t*) = *f*0*δ*(*t*) is applied to its free end. Young's modulus, the cross-section area, and the length of the rod are denoted by *E*, *A*, and *L*, respectively. We split total response *u*(*x*, *t*) into the sum of rigid-body and elastic responses such that

$$
\mu(\mathbf{x},t) = \mu\_r(\mathbf{x},t) + \mu\_\varepsilon(\mathbf{x},t), \text{ with } 0 \le \mathbf{x} \le L, t \ge 0. \tag{5}
$$

where *ur* is the rigid-body response and *ue* is the elastic response. From [19], the equations of motion of the system in Figure 1 are in the form

$$\rho A L \ddot{u}\_{\ell} + M \ddot{u}\_{\ell} + c \ddot{u}\_{\ell} + k u\_{\ell} + \tag{6}$$

$$\rho A \int\_{0}^{L} \frac{\partial^{2} u\_{\ell}(\mathbf{x}, t)}{\partial t^{2}} d\mathbf{x} + M \ddot{u}\_{\ell}(\mathbf{L}, t) + c \dot{u}\_{\ell}(\mathbf{L}, t) + k u\_{\ell}(\mathbf{L}, t) = 0,$$

$$c\_{p}^{2} \frac{\partial^{2} u\_{\ell}}{\partial \mathbf{x}^{2}} = \ddot{u}\_{r} + \frac{\partial^{2} u\_{\ell}}{\partial t^{2}}.\tag{7}$$

where *ρ* is the density and *cp* = *E*/*ρ* is the speed of the longitudinal stress wave. The corresponding boundary conditions are

$$\begin{split} EA \frac{\partial u\_{\varepsilon}(0,t)}{\partial x} &= 0, \\ EA \frac{\partial u\_{\varepsilon}}{\partial x}(L,t) &= -M[\ddot{u}\_{r} + \frac{\partial^{2}u\_{\varepsilon}}{\partial t^{2}}(L,t)] \\ &\quad - \varepsilon[\dot{u}\_{r} + \frac{\partial u\_{\varepsilon}}{\partial t}(L,t)] - k[u\_{r} + u\_{\varepsilon}(L,t)]. \end{split} \tag{9}$$

**Figure 1.** A uniform elastic rod with a mass–damper–spring termination. An impact loading *f*(*t*) is applied to the free end. The material coordinate system is fixed to the free end of the rod.

The non-homogeneous boundary condition of Equation (9) leads to a non-orthogonal eigenvalue problem. We attack this problem using a method of a particular solution, which expresses the elastic motion *ue*(*x*, *t*) in the form

$$
\mu\_{\mathcal{C}}(\mathbf{x},t) = \mu\_{\mathcal{h}}(\mathbf{x},t) + \mu\_{\mathcal{p}}(\mathbf{x},t), \tag{10}
$$

where *uh*(*x*, *t*) is the homogeneous solution with free–free boundary conditions such that

$$u\_h(x,t) = \sum\_{i=1}^{n} \phi\_i(x) y\_i(t),\tag{11}$$

where

$$\int\_{0}^{L} \phi\_{i}(\mathbf{x}) \phi\_{j}(\mathbf{x}) d\mathbf{x} = \delta\_{ij\prime} \tag{12}$$

and *up*(*x*, *t*) is the particular solution such that

$$
\mu\_p(\mathbf{x}, t) = \left(\frac{\mathbf{x}}{L}\right)^2 \alpha(t). \tag{13}
$$

Substitution of Equations (10)–(13) into Equations (5)–(9) yields a state space form [19]

$$
\mathbf{\hat{Z}} = \mathbf{A} \mathbf{Z},
\tag{14}
$$

where

$$\mathbf{Z} = [z(t); \dot{z}(t)],\tag{15}$$

$$\mathbf{A} = \begin{bmatrix} \mathbf{0} & \mathbf{I} \\ -\mathbf{M}^{-1}\mathbf{K} & -\mathbf{M}^{-1}\mathbf{C} \end{bmatrix}' \tag{16}$$

$$\varpi(t) = [u\_r(t), u(t), y\_1(t), y\_2(t), \dots, y\_n(t)].\tag{17}$$

The formal solution of Equation (14) reads

$$\mathbf{Z}(t) = \varepsilon^{\mathbf{A}t} \mathbf{Z}\_{0\prime} \tag{18}$$

where **Z**<sup>0</sup> is the initial condition generated from the impulsive input (see Appendix A). The numerical error analysis of the method was performed in [19]. We incorporate this method into the GA-SCM method to optimize the termination of the structure.

### *4.2. Objective Functions*

We define the multi-objective performance indices of terminal response as

$$\mathbf{F} = (t\_s^{\varepsilon\_3}, |\boldsymbol{\mu}(L)|\_{\max}, 1/\delta), \tag{19}$$

where *t e*3 *<sup>s</sup>* is the settling time of the third elastic mode, |*x*(*L*)|*max* is the maximal absolute displacement at termination, and *δ* is the log decrement of the strain response at termination.

*t e*3 *<sup>s</sup>* is an indirect indicator for the settling time of the rod response. The reason for using *t e*3 *<sup>s</sup>* is twofold. Firstly, the settling time of higher modes produced by the model cannot properly capture the physical phenomena that the response of high-frequency modes usually decays more rapidly than that of low-frequency modes. Secondly, identifying the settling time of the total response from the numerical simulation could lead to extensive computational load. Therefore, the settling time of the third elastic mode is used and defined in the form

$$t\_s^{\varepsilon\_3} = \frac{4}{|\text{Real}(\lambda\_{\varepsilon\_3})|},\tag{20}$$

where *e*<sup>3</sup> stands for the third elastic mode. The selection of the third mode is based on trial and error.

*δ* is also an indirect indicator to estimate the decay of the impact wave. After the impact load is applied, an impulsive wave will be produced at the left terminal and a response wave due to the rigid-body motion will be generated at the right terminal. The two waves will propagate along the rod and be reflected at both ends. Although the strain response is the superposition of two waves, the impact wave dominates the response when it is propagated to the right terminal for the first few times. We define *δ* in the form

$$\delta = \frac{1}{n-1} \log \frac{|u\_x(t\_1, L)|}{|u\_x(t\_n, L)|},\tag{21}$$

where *t*<sup>1</sup> and *tn* represent the first and *n*-th time when the impulse wave is propagated to the right end, respectively. The larger *δ* is, the more the impact wave is suppressed. We let *n* = 3 in this study.

### **5. A Case Study**

We considered an elastic rod with Young's modulus *E* = 10, density *ρ* = 10, length *L* = 2, cross-section area *A* = 0.1, and excitation force magnitude *f*<sup>0</sup> = 1.0. The design space was chosen as

$$Q = \{ \mathbf{x} | \mathbf{x} \in [0.1, 2.0] \times [1.0, 6.0] \times [10, 20] \},\tag{22}$$

subject to a constraint

$$
\delta > 0,\tag{23}
$$

where **x** is the tuple (*m*, *c*, *k*). We calculated the first 15 s rod response under the impact loading through the numerical integration of Equation (14), because the max displacement appears quickly after impact, and the impact wave dominates the terminal response when it is propagated at the right end during this time period. Thirty elastic modes were adopted, which, based on our observation, are sufficient to approximate the values of performance indices within the design space.

We first discover a rough Pareto set using the NSGAII algorithm with a population size 1000, number of generations 10, and mutation rate 0.05. Other configurations of NSGAII can be seen in Table 1. With the numerical predictor, the NSGAII algorithm was completed in 66 s on a desktop with an Intel core i-7 CPU, producing a rough Pareto set as the input to the SCM method. In the SCM method, the *m* − *c* − *k* design space is discretized into a 10 × 20 × 20 cellular grid as shown in Table 2. The elements of the Pareto set are the cells in the design space. The local search and recovery algorithm are performed twice, the first time with the initial grid and the second time with the refined grid, which divides the initial grid by three. We stop the program after the refinement because the desired resolution 0.06 × 0.08 × 0.166 in the parameter space is achieved. The computational time was 36 s with the initial grid and 2000 s with the refined grid.

**Table 1.** Configuration of NSGAII.


**Table 2.** Configuration of SCM.


There are 5392 cells in the Pareto set. The Pareto set and front of the mass–damper– spring termination are presented in Figure 2. Generally, either larger stiffness or damping will lead to better design. The majority of optimal design is achieved with either moderate or small mass. The Pareto front can be divided into three regions, labeled in Figure 2b. Region 1 minimizes displacement at the cost of long settling time and moderate damping performance. Region 2 balances the performance of three objective functions. Region 3 achieves premium damping performance at the expense of large displacement and moderate settling time.

**Figure 2.** The Pareto set and front of the *m* − *c* − *k* termination design of the elastic rod. (**a**) Pareto set. (**b**) Pareto front. Design parameters *m* ∈ [0.1, 2.0], *c* ∈ [1.0, 6.0], *k* ∈ [10.0, 20.0]. The labels "1", "2", and "3" indicate the regions where optimal terminal displacement, balanced performance of objective functions, and optimal damping performance are achieved.

The optimal designs of each performance index are presented in Figures 3–5. The corresponding design parameters, as well as performance indices are listed in Table 3.

**Table 3.** Design parameters and performance indices of optimal designs in Figures 3–5.


**Figure 3.** The optimal design of settling time. The corresponding (**a**) terminal displacement and (**b**) strain responses of the rod. The response is computed with *N* = 30.

### *5.1. Optimal Design: Minimal Settling Time*

Figure 3 shows the optimal design of the settling time. The settling time of the total response approximates 1200 s. While the performance index of the settling time is significantly smaller than this number, it still correctly reflects the trend of the settling time change in comparison to other designs such as those in Figures 4 and 5. The large mass in this design can increase the portion of energy transmitted to the mass after impact, which can be more effectively dissipated through the heavily damped boundary condition.

### *5.2. Optimal Design: Maximal Decay of Impact Wave*

The time response of the optimal design maximizing the decay of the impact wave is presented in Figure 4. The impact wave propagates to the right end when *t* = 2, 6, 10 .... The suppression of the impact wave is evident. However, this is at the cost of at least a five-times longer settling time and a slight increase of the maximal displacement. When compared to the other two designs, this design considerably reduces the damping coefficient. This could be attributed to the velocity change of the mounted mass in response to the impact wave hitting the terminal. Such a change will immediately alter the viscous force produced by the damper, which in turn can lead to higher strain at the terminal. A small damping coefficient can reduce the magnitude of the reflected impact wave.

**Figure 4.** The optimal design of the decay of the impact wave. The corresponding (**a**) terminal displacement and (**b**) strain response of the rod. The response is calculated with *N* = 30.

### *5.3. Optimal Design: Minimal Peak Displacement at Termination*

The optimal design of terminal peak displacement in Figure 5 has the same stiffness, but much smaller mass and larger damping as the design in Figure 4. This makes sense because the terminal displacement is identical to the displacement of the mounted mass. Using small inertia and large stiffness and damping, one can effectively reduce the maximal terminal displacement. However, smaller inertia also leads to less energy distributed to the mass. Because the energy can only be dissipated through the damper attached to the mass, this choice can also significantly amplify the settling time.

**Figure 5.** The optimal design of the maximal terminal displacement. The corresponding (**a**) terminal displacement and (**b**) strain responses of the rod. The response is calculated with *N* = 30.

### **6. Conclusions**

In this paper, a multi-objective optimization problem of the terminal response of an elastic rod with a viscous boundary condition was formulated. The terminal response of the rod was predicted through a computationally effective and accurate particular solution method. The Pareto set and front of the MOP were obtained with the GA-SCM hybrid method. The proposed objective functions can effectively capture the dynamic response of the structure. The optimal design strategies were presented and analyzed. The amount of energy distributed to the terminal mass after impact was significant for the optimization of the terminal design.

The computational load of this work was due to the repeated computations of the impulse response with different parameter sets. Although the solver adopted in this paper can be computationally more effective and accurate than finite-element methods, it still requires a sufficient number of modes to capture the non-smooth impulsive response when highly accurate results are desired. The computational load can be further reduced using a surrogate (metamodel) model [20]. One future direction is to use neural operators such as DeepONet [21] to approximate the impulsive response, with the neural operator trained using data from the adopted solver.

**Author Contributions:** Conceptualization, methodology, and supervision, J.-Q.S.; software, formal analysis, investigation, and writing—original draft preparation, S.X.; writing—review and editing, J.-Q.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** The first author would like to thank for the release time support from the Donald E. Bently Center for Engineering Innovation at California Polytechnic State University.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **Appendix A**

From [19], the initial conditions of Equation (18) are

$$
\rho A L \dot{u}\_{r0} + M \dot{u}\_{r0} = f\_{0\prime} \tag{A1}
$$

$$
\dot{u}\_{r0} + \sum\_{i=1}^{n} \phi\_i(0)\dot{y}\_{i0} = \frac{f\_0}{\rho A'} \tag{A2}
$$

$$
\mu\_{r0} + \sum\_{i=1}^{n} y\_{i0} + \left(\frac{\mathfrak{x}}{L}\right)^{m} \mathfrak{a}\_{0} = 0,\\
0 \le \mathfrak{x} \le L,\tag{A3}
$$

$$\sum\_{i=1}^{n} \phi\_i(x)\dot{y}\_{i0} + \left(\frac{x}{L}\right)^m \dot{\alpha}\_0 = 0,\\ 0 < x \le L,\tag{A4}$$

where *ur*<sup>0</sup> = *ur*(0), *α*<sup>0</sup> = *α*(0), and *yi*<sup>0</sup> = *yi*(0). Equation (A1) leads to

$$\dot{u}\_{r0} = f\_0/(\rho AL + M). \tag{A5}$$

By uniformly sampling spatial points on the rod and applying the least-mean-squares method, the initial conditions of the particular solution and response of elastic modes can be obtained in the form

$$\dot{\mathbf{y}}\_0 = (\boldsymbol{\Phi}^T \boldsymbol{\Phi})^{-1} \boldsymbol{\Phi}^T \mathbf{F},\tag{A6}$$

where ˙**y**<sup>0</sup> = [*a*˙0, *y*˙10, ··· , *y*˙*n*0], **F** = [ *f*0/(*ρA*) − *u*˙*r*0, 0, 0, ...0] *<sup>T</sup>* and

$$
\Phi = \begin{bmatrix}
0 & \phi\_1(0) & \phi\_2(0) & \cdots & \phi\_n(0) \\
(\mathbf{x}\_1/L)^m & \phi\_1(\mathbf{x}\_1) & \phi\_2(\mathbf{x}\_1) & \cdots & \phi\_n(\mathbf{x}\_1) \\
\vdots & \vdots & \vdots & \ddots & \vdots \\
(L/L)^m & \phi\_1(L) & \phi\_2(L) & \cdots & \phi\_n(L)
\end{bmatrix}.\tag{A7}$$

### **References**


### *Article* **Scarce Sample-Based Reliability Estimation and Optimization Using Importance Sampling**

**Kiran Pannerselvam, Deepanshu Yadav and Palaniappan Ramu \***

Advanced Design, Optimization and Probabilistic Techniques (ADOPT) Laboratory, Department of Engineering Design, Indian Institute of Technology Madras, Chennai 600036, India

**\*** Correspondence: palramu@iitm.ac.in; Tel.: +91-4422574738; Fax: +91-4422574732

**Abstract:** Importance sampling is a variance reduction technique that is used to improve the efficiency of Monte Carlo estimation. Importance sampling uses the trick of sampling from a distribution, which is located around the zone of interest of the primary distribution thereby reducing the number of realizations required for an estimate. In the context of reliability-based structural design, the limit state is usually separable and is of the form Capacity (*C*)–Response (*R*). The zone of interest for importance sampling is observed to be the region where these distributions overlap each other. However, often the distribution information of *C* and *R* themselves are not known, and one has only scarce realizations of them. In this work, we propose approximating the probability density function and the cumulative distribution function using kernel functions and employ these approximations to find the parameters of the importance sampling density (ISD) to eventually estimate the reliability. In the proposed approach, in addition to ISD parameters, the approximations also played a critical role in affecting the accuracy of the probability estimates. We assume an ISD which follows a normal distribution whose mean is defined by the most probable point (MPP) of failure, and the standard deviation is empirically chosen such that most of the importance sample realizations lie within the means of *R* and *C*. Since the probability estimate depends on the approximation, which in turn depends on the underlying samples, we use bootstrap to quantify the variation associated with the low failure probability estimate. The method is investigated with different tailed distributions of *R* and *C*. Based on the observations, a modified Hill estimator is utilized to address scenarios with heavy-tailed distributions where the distribution approximations perform poorly. The proposed approach is tested on benchmark reliability examples and along with surrogate modeling techniques is implemented on four reliability-based design optimization examples of which one is a multiobjective optimization problem.

**Keywords:** reliability; importance sampling; scarce data; surrogate; RBDO; MOO

### **1. Introduction**

Reliability-based design optimization (RBDO) is a design approach that is used to generate reliable designs by accounting for uncertainties in the system variables such as material properties, loading conditions and geometry. Probabilistic approaches use probability distributions to model the uncertainties. In such approaches, the reliability of the system is measured as the probability of failure to satisfy a performance criterion. Mathematically, this involves calculating the hyper-volume of a multi-dimensional probability density function (PDF) under the failure region. This calculation becomes infeasible when the number of dimensions are large. Even in low dimensions, this calculation could become intractable due to complicated geometry of the failure region [1]. Attractive alternatives are analytical methods or sampling-based approaches.

Analytical methods, such as first-order and second-order reliability methods (FORM and SORM), transform the probability distributions to standard normal space and use linear or quadratic approximations of the performance function to estimate the failure

**Citation:** Pannerselvam, K.; Yadav, D.; Ramu, P. Scarce Sample-Based Reliability Estimation and Optimization Using Importance Sampling. *Math. Comput. Appl.* **2022**, *27*, 99. https://doi.org/10.3390/ mca27060099

Academic Editors: Carlos Coello, Erik Goodman, Kaisa Miettinen, Dhish Saxena, Oliver Schütze and Lothar Thiele

Received: 15 September 2022 Accepted: 18 November 2022 Published: 22 November 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

probability at the most probable point (MPP) of failure [2–4]. In essence, analytical methods can estimate failure probability within a reasonable number of evaluations for linear or slightly non-linear performance functions. However, if the failure boundary is highly non-linear, analytical approaches are likely to lead to erroneous estimation. Additionally, other factors, such as island failure regions and multiple failure modes, limit their performance [5,6].

Sampling-based approaches such as Monte Carlo methods are highly effective for complex failure boundary problems and multi-modal failure problems [7]. In high-reliability applications where the failure probability is very low, MCS requires a very large number of model evaluations to obtain an accurate estimate. Most RBDO applications involve evaluating high-fidelity models which are computationally expensive, thus rendering MCS prohibitive for reliability estimation.

The computational burden of RBDO can be reduced by using surrogate-based methods wherein surrogate models which are cheaper to evaluate are constructed using limited high-fidelity model evaluations. Uncertainty from the random variables is then propagated through these surrogate models to obtain reliability estimates. Various surrogate modeling approaches, such as polynomial response surface (PRS), radial basis function (RBF), support vector machine (SVM) and Kriging among others have been adopted for reliability estimation [8]. Li and Xiu [9] proposed using cheaper to evaluate surrogates away from the limit state and high-fidelity model evaluations close to the limit state to improve accuracy and reduce cost. Dai et al. [10] proposed an SVM-based radial basis function to approximate the limit state function which is then used to estimate failure probability. Dubourg et al. [11] used error measure derived from Kriging to refine the surrogate during subset-simulation-based reliability estimation. Reliability estimation using surrogates may carry forward the bias from the surrogate approximation [12]. Surrogate modeling can also be used to build approximation models for reliability metrics instead of limit state function. Foschi et al. [13] used the combination of response surface, FORM and importance sampling to perform reliability estimation. Qu and Haftka [14] compared the accuracy of surrogates of different reliability metrics, such as failure probability, reliability index and probabilistic sufficiency factor (PSF) during RBDO. They conclude that inverse measure, such as PSF operating in performance space, performs better compared to a classical reliability metric. A survey of various surrogate-based RBDO frameworks is provided in [15]. The choice between surrogate for reliability estimation versus surrogate for limit state is a trade-off based on the number of design variables and the number of random variables as well as the cost of building the surrogate and cost of reliability estimate [16].

Employing variance reduction techniques to improve the efficiency and accuracy of Monte Carlo estimations is another way to alleviate the computational burden of RBDO. Several variance reduction techniques, such as importance sampling (IS) [17–19], subset simulation (SS) [1,20] and separable Monte Carlo (SMC) [21,22], are used to improve the accuracy of failure probability estimates while reducing the required number of high-fidelity model evaluations.

Importance sampling improves the accuracy of the estimate by drawing the sample realizations of the input random variables that have greater impact on the estimate, more often. In order to do that, an alternate sampling density known as importance sampling density (ISD) is chosen which enables sampling the important values more frequently. This introduces a bias in the estimator which is corrected by weighing the sample realizations. A good choice of ISD is proven to improve accuracy and in contrast, incorrect choice of ISD could lead to spurious estimates. Hence, the choice of ISD is critical, and several approaches have been explored to find the optimal ISD. Melchers [17] applied importance sampling for assessing reliability of parallel and series structural systems, where a multi-normal PDF centered at MPP was chosen as the ISD. Using the original distribution shifted to MPP, multivariate normal distribution located at MPP with various choices for the co-variance matrix ranging from the same as the original distribution to using only diagonal elements of the original co-variance matrix have been studied [23].

In cross-entropy-based methods [24], ISD is found by minimizing the Kullback–Leibler (KL) divergence between theoretical optimal ISD and a chosen family of distributions. Kurtz and Song [25] used a Gaussian mixture to obtain a non-parametric multi-modal PDF for ISD. It was observed that the coefficient of variation of the failure probability estimate converged as the number of Gaussian densities in the mixture increased. Cao and Choe [26] used an expectation-maximization (EM) algorithm to obtain a Gaussian mixture as the near optimal ISD. Cross-entropy information criterion (CIC) was used to select the number of Gaussian densities in the mixture. Geyer et al. [27] proposed a modified version of EM algorithm for updating the Gaussian-mixture-based ISD. For selecting the number of distributions in the mixture, density-based spatial clustering of applications with noise (DBSCAN) algorithm was used. In cross-entropy-based IS, one still requires the joint PDF of the original random variables to compute the near optimal ISD.

Kernel-based IS is another way to obtain the alternate sampling density wherein a kernel sampling density is constructed instead of choosing from a family of distributions [28,29]. Au and Beck [30] proposed Markov Chain Monte Carlo (MCMC) to distribute samples asymptotically according to the optimal ISD, and subsequently a kernel sampling density is constructed from these samples. Dai et al. [31] proposed a wavelet density estimation technique to construct the ISD from the MCMC samples. Botev et al. [32] proposed combining MCMC and IS to address the issues of biased sampling estimators that result from MCMC. Various adaptive importance sampling procedures have gained popularity recently. Dalbey and Swiler [33] proposed a Gaussian-process-based adaptive importance sampling where a Gaussian process surrogate is used to identify the likely regions of failure to adaptively improve the sampling density estimate. Zhao et al. [34] constructed a Kriging surrogate from the initial MCMC samples which is improved using an active learning process, and subsequently the surrogate is used for limit state evaluations for adaptive improvement of ISD. Wang and Song [35] used cross-entropy-based adaptive importance sampling for high-dimensional reliability analysis. Here, ISD is obtained by minimizing the KL divergence between a von Mises–Fisher mixture model and near optimal ISD.

In all the literature discussed above, while employing IS, determination of the appropriate ISD is considered to be the challenge. The probabilistic distributions and the corresponding parameters of the original random variables, however, are assumed to be known [22,36]. However, this is not necessarily always true. Here, we propose a framework to employ the importance sampling method for separable failure boundary of the form Capacity (*C*)–Response (*R*) using only scarce realizations of the *R* and *C* where no information about their distributions is known. We make use of tail-modeling techniques to approximate the cumulative distribution function (CDF) of the capacity and response. These approximations are used to locate a Gaussian ISD whose standard deviation is empirically chosen. Sample realizations drawn from the ISD are used to compute the failure probability. During this computation, kernel density estimates of PDF of response are utilized along with CDF approximation of capacity. Furthermore, bootstrap samples are generated from the original scarce samples of *R* and *C*. The proposed framework is applied on the bootstrap samples to obtain the confidence bounds for the reliability estimate. Reliability estimates obtained from scarce samples using the proposed approach are used to construct a surrogate of reliability index which is used for constraint evaluation during RBDO.

The rest of the paper is organized as follows. Section 2 presents the theoretical background of importance sampling in the context of reliability estimation. In Section 3, the methodology is discussed, and in later sections we present the results from the proposed method when applied to test cases and RBDO applications including a multi-objective optimization (MOO) application. Appendices A and B are used to provide short descriptions of kernel density estimation (KDE) and the third-order polynomial normal transformation (TPNT) technique.

### **2. Reliability Estimation Using Importance Sampling for Separable Limit States**

In structural engineering, limit states are useful in prescribing performance requirements of a design. Thus, a limit state decomposes the design space of a system into safe and failure regions. Violation of a limit state is considered to be failure. and reliability is a measure of probability of such violations. For most structural problems, limit state can be expressed as the difference of Capacity (*C*) and Response (*R*), as presented in Equation (1), where *R* and *C* are functions of independent sets of random variables. This is referred to as a separable limit state [21,22,37], and we consider such limit states in this work.

$$G(\mathbb{C}, \mathbb{R}) = \mathbb{C} - \mathbb{R} \tag{1}$$

The system is said to fail when *C* ≤ *R* and safe when *C* > *R*. When either capacity or response or both are functions of variables that are random, a probabilistic measure such as the probability of failure is as presented in Equation (2)

$$p\_f = \iint\_{\tilde{G}\le 0} f\_{\mathcal{CR}}(c, r) \, dc \, dr \tag{2}$$

where *fCR* is the joint probability distribution function (PDF), and *G* ≤ 0 is the failure region. In the case of separable limit state, the joint PDF of capacity and response can be decomposed into the product of the marginal PDFs of capacity and response as presented in Equation (3)

$$p\_f = \int\_{-\infty}^{\infty} \int\_{-\infty}^{c \le r} f\_\mathbb{C}(c) f\_\mathbb{R}(r) \, dc \, dr \tag{3}$$

$$p\_f = \int\_{-\infty}^{\infty} F\_\mathbb{C}(\mathbf{x}) f\_\mathbb{R}(\mathbf{x}) \, d\mathbf{x} \tag{4}$$

where *fC*(*c*) and *fR*(*r*) are the marginal density functions of *C* and *R*, respectively. Equation (3) can also be written in a single integral form as presented in Equation (4). Here, *FC*(*c*) is the cumulative distribution function (CDF) of capacity. For low failure probabilities of order 10−4, to obtain a Monte Carlo estimate with 10% coefficient of variation, the sample size required is 100 <sup>×</sup> <sup>1</sup> *<sup>p</sup> <sup>f</sup>* <sup>=</sup> <sup>10</sup>6. Generating 106 instances of expensive computer models is not feasible. To reduce such computational burden, importance sampling is used as in Equation (5).

$$p\_{f\_{IS}} = \int\_{-\infty}^{\infty} \frac{F\_C(\mathbf{x}) f\_R(\mathbf{x})}{h\_\mathbf{x}(\mathbf{x})} h\_\mathbf{x}(\mathbf{x}) \, d\mathbf{x} \tag{5}$$

where *hx*(*x*) is the alternate sampling density, and *pfIS* is the failure probability computed using the importance sampling approach. It is the expectation of the integrand *FC*(*x*)*fR*(*x*) *hx*(*x*) computed with respect to the ISD, *hx*. Consequently, if *x*1, *x*2, ... , *xN* is an independent identically distributed (i.i.d.) random sample from *hx*, then the expectation has an unbiased estimator:

$$\widehat{p\_{f\_{IS}}} = \sum\_{i=1}^{N} \frac{F\_{\mathbb{C}}(\mathbf{x}\_{i}) f\_{R}(\mathbf{x}\_{i})}{h\_{\mathbf{x}}(\mathbf{x}\_{i})}, \quad \mathbf{x}\_{i} \sim h\_{\mathbf{x}} \tag{6}$$

Alternatively, if the marginal distribution of the response (*fR*(*r*)) is integrated in Equation (3), then the importance sampling estimator of *pf* would be expressed as:

$$\widehat{p\_{f\_{IS}}} = \sum\_{i=1}^{N} \frac{(1 - F\_R(\mathbf{x}\_i)) f\_C(\mathbf{x}\_i)}{h\_x(\mathbf{x}\_i)}, \quad \mathbf{x}\_i \sim h\_x \tag{7}$$

As mentioned in Section 1, in most of the literature, importance sampling is used when the distributions of *R* and *C* are known, and it becomes computationally expensive to draw samples in the region where tails of *R* and *C* overlap. Figure 1 presents a schematic of such a scenario where the limit state is of the separable form. Here, the region of interest lies in the tails of the distributions of *R* and *C*, and it contains the most probable point (MPP) of failure. Hence, it is only logical to locate the ISD at the MPP [3,17,22].

**Figure 1.** Basic *C*–*R* problem: a schematic of importance sampling.

Figure 1 depicts a Gaussian ISD for which MPP serves as the mean of the distribution and is usually found by solving a constrained optimization problem. Once the mean of the ISD is determined, the spread of the distribution needs to be chosen. In order to have a finite variance for the importance sampling estimator, the ISD should not have a lighter tail than the original distribution [38]. In the literature [22,23], using the same co-variance matrix and sometimes using strictly the diagonal elements of the co-variance matrix of the original distribution is suggested. This has been shown to work for a range of possible limit states. However, sometimes a designer does not necessarily know the original distributions of input variables but has only a few realizations of *R* and *C* either through expensive computer simulations or physical experiments. Generally, response is considered to be the source of randomness since it is a measured quantity. However, capacity can also be a random variable. For instance, an example of capacity is yield strength which is a measured quantity. We know that it is random and hence modeled using probability distributions [39,40]. Other examples of capacity that are considered to be random include: member capacity under seismic loading [41], maximum flow capacity in measuring hydraulic reliability of water distribution system [42], and material fatigue properties [43,44].

In this work, we first characterize the distributions of *R* and *C* from limited samples and use the concept of importance sampling to estimate low failure probabilities. A Gaussian distribution is chosen as the ISD which is fully described by its two parameters, the mean and variance. Section 3 describes the procedure employed to identify the parameters of ISD. In the proposed framework, to demonstrate the methodology, distribution information of original input random variables is used only to generate the initial limited samples. However, the distribution information is not utilized anywhere afterwards.

### **3. Identifying Parameters of Gaussian ISD**

Figure 2 presents an example scenario [23] of a separable limit state where the response and capacity follow normal distributions. It can be observed that the functional *FC fR* integrated in Equation (4) is maximum at the point (*x* ∼ 11.55), near the point of intersection between the PDF of response and the CDF of capacity, i.e., *fR*(*x*) = *FC*(*x*). In importance sampling, MPP is used to locate the ISD to maximize sampling of the failure probability content. Hence, it is only logical to define this point of intersection as the MPP and use it as the mean of Gaussian ISD.

**Figure 2.** Probabilistic view of MPP for a separable limit state scenario.

This point can be anywhere within the bounds of capacity and response. In highreliability scenarios, it is located at the tails of one of the distributions or both, so we need to extrapolate into the tails of *R* and *C*. The available PDF approximation methods capture the central part of the distribution better than the tails, whereas the main focus of the tail modeling techniques applied for CDF estimation is at the tails of the distribution. Hence, we can reduce the errors in finding the MPP by using the intersection of 1 − *FR* and *FC* instead of the intersection point of the curves *fR* and *FC*. Here, 1 − *FR* is the complementary CDF of *R*. It is to be noted that the suitability of such a modification has been tested for only uni-modal type response. For multi-modal response, using a single MPP will only capture partial failure region. However, the approach can be extended by using a mixture of Gaussian distributions by using a summed failure probability integral as described in [45]. Algorithm 1 presents the bracketing procedure followed in the current work to find the MPP, while the same process is visually presented in Figure 3.

In Steps 3 and 6 of Algorithm 1, it is stated that the CDF approximations of capacity (*F C*) and response (*F R*) are obtained through a TPNT technique (details provided in Appendix B); however, it is an independent block in the proposed approach and hence can be replaced by any other suitable technique based on user discretion. In effect, the MPP finding problem reduces to finding the zeroes of the function *F C*(*x*) − (1 − *F R*(*x*)). Hence, root finding algorithms can also be applied to find the MPP. However, it is advised to use derivative-free approaches to avoid numerical issues.

### **Algorithm 1** Finding *μh*.


**Figure 3.** Procedure for identifying mean of Gaussian ISD (*μh*). (**a**) Scarce samples of *R* and *C*. (**b**) Finding point A using response sample *ri* as per Step 4 in Algorithm 1. (**c**) Finding point B using capacity sample *ci* as per Step 7 in Algorithm 1. (**d**) *μ<sup>h</sup>* identified between points A and B as per Step 9 in Algorithm 1.

In very low failure probability estimation, the region corresponding to the estimation is quite small, and it has been observed that aspects such as non-uniqueness of MPP and non-linearity of the limit states have little effect on the estimation [5]. Similarly, we find that errors in the MPP estimation have lesser effect on the reliability estimation compared to a poor choice of standard deviation.

The standard deviation of the ISD determines the importance ascribed to region around the MPP during sampling. The region of sampling itself could be bounded by the supports of capacity and response distributions. Different measures of spread could be applied based on the available knowledge. In this work, a 10% coefficient of variation (CoV) was used to calculate the standard deviation of the ISD to investigate the effect. Such a measure was found to be suitable for scarce samples of normally distributed capacity and response. Different measures of spread were investigated while testing the

formulation on samples simulated to be belonging to a different distribution. The spread parameter obtained through Equation (8) was found to be appropriate as such a measure allows one to restrict 68% percent of importance sample realizations within the means of capacity and response.

$$
\sigma\_{\rm l} = \frac{\pounds\_{\rm C} - \pounds\_{\rm R}}{2} \tag{8}
$$

where *x*¯*<sup>C</sup>* is the sample mean of capacity, and *x*¯*<sup>R</sup>* is the sample mean of the response.

### **4. Estimation of Reliability and Its Confidence Bounds**

$$\widehat{p\_f} = \sum\_{i=1}^{N} \frac{\widehat{F\_C}(\mathbf{x}\_i)\widehat{f\_R}(\mathbf{x}\_i)}{h\_{\mathbf{x}}(\mathbf{x}\_i)}, \quad \mathbf{x}\_i \sim h\_{\mathbf{x}} : \mathcal{N}(\boldsymbol{\mu}\_{h\boldsymbol{\nu}}\sigma\_h^2) \tag{9}$$

Using the now defined ISD, sample points are drawn at which the PDF of response and CDF of capacity are obtained to be used in the computation of failure probability as per Equation (6). However, instead of the actual values, approximations *F C*(*xi*), *fR*(*xi*) are used during the computation as presented in Equation (9). As mentioned earlier, TPNT is used to approximate CDF (*F C*), whereas for PDF approximation (*fR*) an adaptive kernel density estimation method is employed. Both of these methods are distribution-free methods; however, other suitable methods of approximation [46,47] can also be used. Despite the accuracy of the method chosen, errors from the approximations are bound to result in loss of accuracy in the failure probability estimate. Hence, it becomes necessary to quantify the confidence on the estimate. Here, confidence bounds on the estimate are computed using a non-parametric bootstrap method.

**Figure 4.** Schematic representation of bootstrapping.

The underlying idea of a non-parametric bootstrap method is to recreate samples from the original sample by sampling with the replacement. The sample size of bootstrap samples must be the same as the original sample. From each of the bootstrap samples, a statistical quantity of interest (such as failure probability) can be estimated. By repeating the process many times (say *B* times), one can obtain a distribution around the estimate from the original sample. In the current work, the proposed approach is applied to the bootstrap samples of *C* and *R* to obtain confidence bounds on the reliability estimate. Since the original samples of *C* and *R* are themselves scarce, this process of bootstrapping is repeated for *T* (=100) original samples of *C* and *R*. Thus, the quantiles of *β*ˆ obtained from *T* (=100) iterations are compared with mean and standard deviation of quantiles of *β*ˆ *boot* from the bootstrap samples. Figure 4 presents a schematic representation of the bootstrapping procedure followed in the current work.

In order to test the efficacy of the formulation, different tails for capacity and response were considered. To simulate samples of response and capacity belonging to different tail types, generalized extreme value (GEV) distribution is used which takes the shape parameter (*ξ*) as an input along with location (*θ*) and scale (*σ*) parameters. Shape parameter or tail index is a measure of heaviness of tails of a distribution. In GEV distributions, the shape parameter affects the lower-tail and upper-tail differently. In this study, the parameters are chosen such that response distributions are positively skewed, while capacity distributions are negatively skewed to better simulate the difficulties of sampling in tails. Individually, the shape parameters (*ξr*, *ξc*) correspond to three types of tail heaviness: heavy, medium and light. Here, tail heaviness is considered for the upper-tail of *R* and the lower-tail of *C*. The location parameter of the capacity distribution (*θc*) is changed while keeping the response location (*θr*) stationary so that each combination corresponds to a failure probability of 10−4. Nine study cases result because of the different combination of tails possible for both *R* and *C*. The parameters for the nine study cases are presented in Table 1. The distribution parameters are only utilized to generate scarce samples of *R* and *C*. It is to be noted that the GEV distributions used here are continuous over the real line (R) and can have negative values. Though this does not reflect the real-world scenario, it does not deter in evaluating the performance of the proposed formulation.


**Table 1.** GEV distributions parameters of *R* and *C* for nine study cases.

To capture sampling variability, multiple iterations are carried out, and the procedure employed for identifying the variability in the estimates and bootstrap bounds is presented in Algorithm 2.

### **Algorithm 2** Confidence bounds using bootstrap.


$$\beta\_{IS} = \Phi^{-1}(1 - \overrightarrow{p\_f}) \tag{10}$$

The failure probability estimates calculated through Algorithm 2 for each study case are converted into reliability indices using Equation (10). To facilitate comparison, the estimates are divided by the actual reliability index (*β<sup>a</sup>* = 3.71). For all nine study cases, the sample size of response (*M*) and capacity (*N*) are considered to be 50. The results of applying the formulation to each study case are presented in Table 2. Values under the 'Original sample' column represent the percentiles of ratio of the estimates from the original samples (from Step 7 of Algorithm 2) repeated for *T* (=100) iterations. Mean and standard deviation of the percentiles (from Step 8 of Algorithm 2) from the bootstrap repetitions (*B* = 100) are presented under the 'Bootstrap' column. An accurate estimate of the reliability index would be indicated with a value of one, and high precision is indicated by low variability between the percentiles and low standard deviation in the bootstrap percentiles. It is observed that the estimation is poorer in the case of heavy-tailed response and medium-tailed capacity and heavy-tailed response and light-tailed capacity. In both these cases, the failure region is situated further into the tail of the heavy-tailed response where the scarce sample-based PDF estimation is prone to high errors. In many instances, the probability density drops to zero prematurely which results in overestimation of reliability.

**Table 2.** Both *R* and *C* unknown case: percentiles of *βIS*/*β<sup>a</sup>* .


In certain reliability applications, either *fR* or *fC* is known, or it is easier to obtain samples from one of the distributions of *R* and *C*. The proposed method is tested for such scenarios where only one of the distributions is unknown, and during this exercise, the same combinations of tails for *R* and *C* are assumed. Tables 3 and 4 present the results for *R* unknown and *C* unknown cases, respectively, using the proposed approach. From Table 3, it is observed that both the accuracy and precision of the estimation improved for most of the tail combinations compared to both *R* and *C* unknown cases. In the case of heavy-tailed response and medium-tailed and light-tailed capacity, and in addition, medium-tailed response and light-tailed capacity, the bias in the estimate has increased. This is again due to the erroneous PDF estimation of heavier-tailed response distribution using a scarce sample. Lesser bias in both *R* and *C* unknown cases is due to the interaction between the PDF and CDF approximation.

In Table 4, larger errors are observed for heavy-tailed capacity and medium-tailed response and heavy-tailed capacity and light-tailed response. This suggests that estimating heavy-tailed distribution further into the tails results in larger errors; this is akin to the observations made for the *R* unknown case. Furthermore, the largest errors observed in the *C* unknown case are smaller compared to the largest errors from the *R* unknown case which suggests that TPNT is a better approximation for CDF of heavy tailed distributions compared to adaptive KDE used for PDF approximation. To demonstrate that this is indeed the case, the alternate form of the importance sampling estimator presented in Equation (7) is used for the *R* unknown scenario, wherein CDF of the response distribution is approximated instead of the PDF. Table 5 presents the results, from which it is observed that the bias has reduced compared to the PDF approximation-based estimation. However, this also results in underestimation of reliability in the case of light-tailed response and light-tailed capacity. This is a trade-off between the choice of adaptive-kernel-based PDF approximation and TPNT-based CDF approximation. From these observations, it can be surmised that it would be beneficial to know the heaviness of *R* and *C* so that appropriate estimator can be chosen based on the efficacy of the approximation methods available.


**Table 3.** *R* unknown case: failure probability estimated as per Equation (6); (PDF approximation of *R*): percentiles of *βIS*/*β<sup>a</sup>* .

An additional observation is that for one (either *R* or *C*) unknown case, the bootstrap standard deviation is larger when the bias in the estimates from the original samples are larger and smaller for more accurate cases. Thus, in practice where there is no actual value to compare with, bootstrap standard deviation can be used to discern the accuracy of the estimate. However, this does not track well in the case of both *R* and *C* unknown.


**Table 4.** *C* unknown case: failure probability estimated as per Equation (6); (CDF approximation of *C*): percentiles of *βIS*/*β<sup>a</sup>* .

**Table 5.** *R* unknown case: failure probability estimated as per Equation (7); (CDF approximation of *R*): percentiles of *βIS*/*β<sup>a</sup>* .


### *Tail-Index Estimation*

Estimation of tail index is sometimes part of the distribution identification process and in turn CDF estimation where methods such as Hill estimator [48], Pickands estimator, Generalized Pareto fits and others are applied. Hill estimator considers *k* upper order statistics from a sample to evaluate the tail heaviness. Equation (11) presents the estimator for a positive sample of size *n* with the order statistics, *X*1,*<sup>n</sup>* ≤ *X*2,*<sup>n</sup>* ≤ ... *Xn*,*n*.

$$H\_{k,n} = \frac{1}{k} \sum\_{i=0}^{k-1} \log \frac{X\_{n-i,n}}{X\_{n-k,n}} \tag{11}$$

The estimate of tail heaviness is sensitive to the number of order statistics (*k*) used and shifting of sample. Different modifications that address such sensitivity issues exist in the literature [49,50]. As we only require comparing between tails of *R* and *C*, especially when they are very different, obtaining exact estimates is of low priority. Hence, a modified Hill estimator is used where the sample is mean-shifted, and *k* is considered to be 10% of the sample size. For upper-tail estimation, the largest *k* values are used and for lowertail estimation, the absolute values of the smallest *k* values are used. The Modified Hill estimator is applied to assess the tail-heaviness of upper-tail of response and lower-tail of capacity, and the sample with the heavier tail is chosen for CDF approximation, and thereby the appropriate form of the estimator is chosen for failure probability estimation.

In both unknown scenarios, the heavy-tailed *R* and the light-tailed *C* which showed the largest deviation from the actual value was chosen to test the tail index estimatebased improvement on the proposed formulation. Results presented in Table 6 show the use of tail index estimation improved the estimates. It is to be noted that the modified Hill estimator was successful in contrasting between heaviness of tails of *R* and *C* 81 out of 100 iterations.


**Table 6.** Heavy *R* and light *C*: Percentiles of *βIS*/*β<sup>t</sup>* from tail estimation-based improved formulation.

### **5. Reliability Estimation Examples**

Based on the flowchart of the proposed approach in Figure 5, benchmark reliability estimation examples are tested in this section. In all examples, to account for sample variability 100 iterations are used with samples of size 50 for *R* and *C* in each iteration. For each iteration, bootstrap samples for *R* and *C* are generated 100 times.

**Figure 5.** Flowchart representing failure probability estimation using proposed IS approach.

*5.1. Example 1: Concave Limit State 1*

This is a concave limit state example taken from [51].

$$G = 2.62 - \mu\_2 - 0.15\mu\_1^2 \tag{12}$$

Both *u*<sup>1</sup> and *u*<sup>2</sup> are standard normal variables, and the concave limit state is of the separable form. Here, *R* is taken as 0.15*u*<sup>1</sup> 2, and *<sup>C</sup>* is taken as 2.62 <sup>−</sup> *<sup>u</sup>*2.


**Table 7.** Concave limit state-1: percentiles of *<sup>β</sup>IS*/*β<sup>t</sup>* for *β<sup>t</sup>* = 2.39.

Note: *β<sup>t</sup>* is computed through MCS with 108 samples.

From tail-index estimates in 88 out of 100 iterations, response *R* is determined to be heavier-tailed than *C* which is consistent with the analytical form of *R* and *C*. The response distribution resulting from squaring a standard normal variable is an *χ*2-distribution with one degree of freedom. In this case, the *χ*2-distribution has a heavier tail than the standard normal distribution.

Table 7 provides the results obtained for a target reliability *β<sup>t</sup>* = 2.39. It is observed that the true value is contained within the percentile bounds (0.84–1.12) from the original sample estimates, and the bootstrap bounds obtained for each percentile also contain the true value. The variability of the estimates from the original sample is high, and the standard deviation computed from the bootstrap estimates is reflective of the variability presented in the original sample estimates.

### *5.2. Example 2: Concave Limit State 2*

Example 2 is a concave limit state taken from [51].

$$G = \mu\_1^2 - 5\mu\_2^2 + 45\tag{13}$$

Both *u*<sup>1</sup> and *u*<sup>2</sup> are standard normal variables, and the concave limit state is of the separable form where *R* is taken as 5*u*<sup>2</sup> <sup>2</sup> and *C* is taken as *u*<sup>1</sup> <sup>2</sup> + 45.

Table 8 presents the numerical results obtained for this example by applying the proposed IS approach. From tail-index estimates, in 100 out of 100 iterations, response *R* is determined to be heavier-tailed than *C*. Here, both the response and capacity distributions resulting from squaring standard normal variables *u*<sup>1</sup> and *u*<sup>2</sup> are *χ*2-distributions. However, the upper tail of response is heavier because scaling a *χ*2-distribution results in an increase of tail heaviness.

**Table 8.** Concave limit state-2: percentiles of *<sup>β</sup>IS*/*β<sup>t</sup>* for *β<sup>t</sup>* = 2.82.


Note: *β<sup>t</sup>* is computed through MCS with 108 samples.

The percentiles of *βIS*/*β<sup>t</sup>* from the original samples show that the target value is contained within the first and third quartile bounds. For all the percentiles, it is observed that the bootstrap bounds also contain the target reliability.

### *5.3. Example 3: Roof Truss Example*

This example is discussed as a case study for reliability estimation in [45]. The schematic in Figure 6 presents a roof truss subjected to a uniformly distributed load *q* which is transformed into the nodal load *P* = *ql* <sup>4</sup> . Equation (14) presents the limit state constructed for perpendicular deflection Δ*<sup>C</sup>* at node *C* as:

$$G(\mathbf{x}) = 0.03 - \Delta\_{\mathbb{C}} \tag{14}$$

where

$$
\Delta\_{\mathbb{C}} = \frac{ql^2}{2} \left( \frac{3.81}{A\_{\text{c}} E\_{\text{c}}} + \frac{1.13}{A\_{\text{s}} E\_{\text{s}}} \right) \tag{15}
$$

Here, *Ac* and *As* are areas of cross sections of the concrete reinforced bars and steel bars, respectively. Similarly, the variables *Ec* and *Es* represent the moduli of elasticity, while *l* denotes the length. The input variables are considered to be mutually independent random variables, distributed normally. The parameters of their distributions are as presented in Table 9.

**Figure 6.** Schematic of roof truss.

**Table 9.** Mean and SD of random variables for roof truss example.


The limit state is transformed into *<sup>G</sup>*(*C*, *<sup>R</sup>*) = *<sup>C</sup>* <sup>−</sup> *<sup>R</sup>* form where *<sup>C</sup>* <sup>=</sup> 0.03 *<sup>q</sup>* and *<sup>R</sup>* <sup>=</sup> <sup>Δ</sup>*<sup>C</sup> q* becomes a function of five random variables that correspond to the geometry and material properties of the truss members.

Table 10 provides the estimates of reliability and the corresponding bootstrap bounds for a target reliability of *β<sup>t</sup>* = 3.4. It is observed that at each of the percentiles, the bootstrap bound captures the target reliability. From tail-index estimates, in 77 out of 100 iterations, response *R* is determined to be heavier-tailed than *C*.


**Table 10.** Roof truss example: percentiles of *<sup>β</sup>IS*/*β<sup>t</sup>* for *β<sup>t</sup>* = 3.4.

Note: *β<sup>t</sup>* is computed through MCS with 108 samples.

### *5.4. Example 4: Propped Cantilever Beam Example*

Figure 7 presents the schematic of the example which is taken from [52]. Equation (16) presents the original limit state for the maximum deflection of the beam *νmax* against the maximum allowable deflection *νcrit*:

$$G = \nu\_{\text{max}} - \nu\_{\text{crit}} \tag{16}$$

where deflection of the beam *ν*(*x*) is measured as per Equation (17), and for the considered loading condition, the maximum deflection (*νmax*) is obtained at *x* = 0.5528*L*.

$$\nu(\mathbf{x}) = \frac{q\_0 \mathbf{x}^2}{120LEI} (4L^3 - 8L^2 \mathbf{x} - 5Lx^2 - \mathbf{x}^3) \tag{17}$$

where

$$I = \frac{b\_f d^3 - (b\_f - t\_w)(d - 2t\_f)^3}{12} \tag{18}$$

**Figure 7.** Propped cantilever beam with triangular distributed loading.

The limit state is modified to convert into *G*(*C*, *R*) = *C* − *R* form,where *C*(*νcrit*, *E*) = *νcritE*, and *R* is a function of remaining seven random variables. Mean and standard deviation (SD) of the normally distributed random variables is presented in Table 11. The critical displacement *νcrit* is changed between 4–5 mm to generate three target reliability situations.


**Table 11.** Mean and SD of random variables for propped cantilever beam example.

Note: All random variables follow normal distribution.

Table 12 presents the results obtained from the proposed IS formulation for different target reliability indices (*βt*). As the target reliability increases, the variability from both the original sample estimates and bootstrap sample estimates remain similar.

**Table 12.** Propped cantilever beam: percentiles of *<sup>β</sup>IS*/*β<sup>t</sup>* for different *νcrit* and *βt*.


Note: *β<sup>t</sup>* values are computed through MCS with 108 samples.

### **6. Application to RBDO Examples**

The proposed importance sampling approach is demonstrated on two benchmark and two real-world RBDO examples. Algorithm 3 delineates the steps followed to perform RBDO using the proposed approach of reliability estimation. For the benchmark examples, efficiency between the proposed approach and a crude Monte Carlo approach is compared using the total number of limit state evaluations which is computed as *Ndoe* × *Nis* for the proposed approach and *Ndoe* × *Nmcs* for the MCS-based approach. For the real-world examples, MCS is used only to validate the optima obtained through the proposed approach.

**Algorithm 3** RBDO using proposed importance sampling approach.

1: **d** = {*d*1, *d*2, ... , *dNdoe*} ← Generate a design of experiment (DoE) in design variable space.

2: **for** *i* = 1, 2, . . . , *Ndoe* **do**


### *6.1. Cantilever Beam Example*

This is a standard RBDO example taken from [53]. In this example, the weight of a cantilever beam (shown in Figure 8) is minimized while considering two failure modes: bending stress does not exceed yield strength (19) and the tip displacement does not exceed the allowable displacement limit of 2.5 *in* (20).

$$G\_s = \sigma\_y - \frac{6L}{wt} \left(\frac{X}{w} + \frac{Y}{t}\right) \tag{19}$$

$$G\_d = D\_0 - \frac{4L^3}{Ewt} \sqrt{\left(\frac{Y}{t^2}\right)^2 + \left(\frac{X}{w^2}\right)^2} \tag{20}$$

The length of the beam (*L*) and the density are held constant, while the width (*w*) and thickness (*t*) of the beam are allowed to change which transforms the objective function from weight to area of cross section, *A* = *wt*. The horizontal (*X*) and vertical (*Y*) loads along with modulus of elasticity (*E*) are random variables whose uncertainty characteristics are presented in Table 13.

**Figure 8.** Cantilever beam under horizontal and vertical loads.

**Table 13.** Random variables for cantilever beam example.


Note: All random variables follow normal distribution.

RBDO of the cantilever beam requires that the failure probability of both the limits states does not exceed 1.35 <sup>×</sup> <sup>10</sup>−<sup>3</sup> which translates to a target reliability index *<sup>β</sup><sup>t</sup>* <sup>≥</sup> 3. Thus, the optimization is formulated as:

$$\begin{aligned} \min\_{w,t} \quad & A = wt\\ \text{s.t.} \quad & \mathfrak{g}\mathbf{1} = \Phi^{-1}(1 - Pr(G\_{\mathfrak{s}} \le \mathbf{0})) \ge 3\\ & \mathfrak{g}\mathbf{2} = \Phi^{-1}(1 - Pr(G\_{\mathfrak{d}} \le \mathbf{0})) \ge 3 \end{aligned} \tag{21}$$

As stated in Algorithm 3, a DoE of size (*Ndoe* = 40) is created in *d* = (*w*, *t*) space using latin hypercube sampling (LHS). Additionally, four corner points are added to the DoE. At each design point (*di*), samples of response and capacity for both limit states are obtained by simulating uncertainty as per Table 13, considering a sample size of *Nis* = 50. For limit state *Gs*, *σ<sup>y</sup>* is considered as capacity, while the rest of the equation is considered as a response. Similarly, for limit state *Gd*, *D*<sup>0</sup> × *E* is taken as capacity by regrouping the random variables in the limit state function to convert it into a separable form [54]. Using these samples, failure probability estimates for both limit states are obtained through the proposed approach. A surrogate model for reliability index *β* is constructed, and this surrogate is used for constraint evaluation during optimization.

In a similar fashion, MCS-based failure probability estimates are also obtained at the design points using samples of size *Nmcs* = 106. A surrogate model using the MCS estimates is constructed for a reliability index, and optimization is carried out. At design points that are closer to the lower and upper bounds, failure probability estimates could be either zeros or ones. These estimates when converted into reliability indices become −∞ and ∞; hence, these are modified during surrogate construction. These singularities are more common in the case of MCS estimation compared to IS estimation.

The optima obtained from both MCS and IS are compared in Table 14. As the cost of the knowledge of random variables is not quantified, computational cost is only compared with MCS. Additionally, different surrogate choices, such as PRS, RBF, Kriging and weighted average surrogate (WAS), were considered for **fi** construction, and the optima obtained by using WAS model are presented. The accuracy of the surrogate for both constraints is evaluated using a generalized mean square error (GMSE) metric. GMSE for constraint *g*<sup>1</sup> using IS-based estimates is reported as 0.91 (mean of reliability estimates = 2.21), whereas the error from the MCS-based surrogate is 0.09 (mean of reliability estimates = 1.97). For constraint *g*2, GMSE (vs. mean) is reported as 0.70 (vs. 1.98) for the IS surrogate and 0.12 (vs. 1.41) for the MCS surrogate.

To validate the results, at the optima, a reliability index is calculated using MCS with sample size of 10<sup>7</sup> which is also presented in the Table 14. It is observed that the surrogate from IS estimates has less global accuracy. However, reliability indices obtained at the optima using MCS (*βMCS* at *d*∗) suggest that the surrogate is reasonably approximated at the constraint boundaries. The optima obtained from IS and MCS are very close, but the computational savings are hugely in favour of IS (50 vs. 106).


**Table 14.** RBDO results of cantilever beam example.

### *6.2. Bracket Structure Example*

This is originally from [55] where a bracket structure is subjected to a tip load (*P*) in addition to its own weight due to gravity (g) as presented in Figure 9. The weight of the bracket structure is optimized while considering two failure modes:


**Figure 9.** Bracket structure subjected to a tip load.

Equations (22) and (23) present the limit states as:

*GCD* = *fy* − *σB*, where (22a)

$$
\sigma\_{\overline{B}} = \frac{6M\_B}{w\_{CD}t^2} \tag{22b}
$$

$$M\_B = \frac{PL}{3} + \frac{\rho\_{\overline{\mathcal{R}}} w\_{\overline{\mathcal{C}}} tL^2}{18} \tag{22c}$$

$$G\_{AB} = F\_{buckling} - F\_{AB\prime} \quad \text{where} \tag{23a}$$

$$F\_{\text{buckling}} = \frac{\pi^2 E t w\_{AB}^3 9 \sin \theta^2}{48L^2} \tag{23b}$$

$$F\_{AB} = \frac{1}{\cos \theta} \left( \frac{3P}{2} + \frac{3\rho g w\_{CD} tL}{4} \right) \tag{23c}$$

During RBDO of the bracket structure, the target reliability index for both limit states is *β* ≥ 2, and the design parameters are means of the geometrical parameters of the structure: width of CD (*μwCD* ), width of AB (*μwAB* ) and thickness of AB and CD (*μt*) which are bounded between 50 mm and 300 mm. The uncertainty characteristics of the random variables is presented in Table 15. Thus, the RBDO is formulated as:

$$\begin{split} \min\_{\boldsymbol{w}\_{\rm CD}, \boldsymbol{w}\_{\rm AB}, t} \quad & \mathcal{C} = \mu\_{\rho} \mu\_{t} \mu\_{L} \left( \frac{4\sqrt{3}}{9} \mu\_{w\_{AB}} + \mu\_{w\_{\rm CD}} \right) \\ \text{s.t.} \quad & \mathcal{g} 1 = \Phi^{-1} (1 - \Pr(\mathcal{G}\_{\rm CD} \le 0)) \ge 2 \\ & \mathcal{g} 2 = \Phi^{-1} (1 - \Pr(\mathcal{G}\_{AB} \le 0)) \ge 2 \\ & \mathcal{5} 0 \le \mu\_{\rm w\_{\rm CD}}, \mu\_{\rm w\_{AB}}, \mu\_{t} \le 300 \text{ (in mm)} \end{split} \tag{24}$$

**Table 15.** Random variables for bracket structure example.


The steps enumerated in Algorithm 3 are followed using a DoE of size (*Ndoe* = 60), and scarce sample sizes of *R* and *C* during the IS approach are considered as *Nis* = 75, while *Nmcs* = 10<sup>6</sup> realizations are used in MCS. The RBDO results are presented in Table 16 which has the same format as the first example. GMSE (vs. mean) for constraint *g*<sup>1</sup> using IS-based estimates is reported as 0.58 (1.37), whereas the error from the MCS-based surrogate is 0.24 (1.02). For constraint *g*2, GMSE (vs. mean) is reported as 0.97 (vs. 3.72) for the IS surrogate and 0.30 (vs. 3.08) for the MCS surrogate. It is observed that the IS estimate-based surrogate has less global accuracy; however, the MCS-based reliability indices computed at the optima show that it approximates reasonably well near the constraint boundaries. It is to be noted that in both engineering examples, we assume that both *R* and *C* are unknown. Any knowledge of the uncertainty characteristics of either *R* or *C* will only improve the accuracy of the reliability estimates.


**Table 16.** RBDO results of bracket structure example.

### *6.3. Torque Arm Example*

This example presents RBDO of torque arm where the mass of the component is to be minimized adhering to a probabilistic constraint on the allowable stress. Unlike previous examples, there is no analytical expression available for limit state evaluation in this case. It is a shape optimization problem originally from Bennett and Botkin [56]. Researchers have used it as a benchmark example for reliability estimation [52,57]. Rahman and Wei [58] perform RBDO where a constant allowable stress limit is considered.

In the current study, seven design variables (see Figure 10) as per [59] are considered for altering the shape of the torque arm. Figure 11 presents the base design of the torque arm around which the optimization is to be performed. Here, a horizontal load (*Fx* = −2789 N) and a vertical load (*Fy* = 5066 N) are applied at the right hole while the left hole is fixed. The torque arm has modulus of elasticity *<sup>E</sup>* <sup>=</sup> <sup>207</sup> <sup>×</sup> 105 <sup>N</sup> · cm−2, density *<sup>ρ</sup>* <sup>=</sup> 7.850 <sup>×</sup> <sup>10</sup>−<sup>3</sup> kg · cm−<sup>3</sup> and Poisson's ratio *<sup>ν</sup>* <sup>=</sup> 0.3.

**Figure 10.** Design variables used to modify the shape of the torque arm [59].

**Figure 11.** Loading and boundary conditions with base design parameter values (in cm) [59].

Equation (25) presents the limit state equation for the torque arm.

$$G = \sigma\_{\max}(d\_{i'}, F\_{\text{x'}} F\_{\text{y}}) - \sigma\_{all} \tag{25}$$

where *σmax* is the maximum von Mises stress developed in the torque arm, and *σall* is the allowable stress limit. Since there is no analytical expression that relates the design variables with the response *σmax*, finite-element analysis is used to compute the stresses developed in the torque arm for a given loading condition. A MATLAB finite-element toolbox developed by CALFEM [60] is used for this purpose. The thickness of the finite-elements in the mesh is considered to be 0.3 cm. At the end of each finite-element analysis, the maximum of the stresses is used as *σmax* in Equation (25).

For RBDO, the design variables (*d*<sup>1</sup> to *d*7), loads (*Fx* and *Fy*) and the allowable stress (*σall*) are considered to be uncertain. Each design point (*di*) is considered to be normally distributed about itself with a coefficient of variation of 10%. The uncertainty characteristics of the remaining random variables is presented in Table 17.

**Table 17.** Mean and SD of random variables in torque arm example.


Equation (26) presents the RBDO formulation of the torque arm where a target reliability index of three is considered.

$$\begin{aligned} \min\_{\{d\_1, \dots, d\_7\}} \quad & \text{Mass} \\ \text{s.t.} \quad & \text{g} = \Phi^{-1}(1 - Pr(G \le 0)) \ge 3 \end{aligned} \tag{26}$$

As per Algorithm 3, a DoE of size (*Ndoe* = 200) is generated where reliability is estimated using an IS approach. Sample sizes of *R* = *σmax* and *C* = *σall* during IS approach are considered as *Nis* = 100. An RBF-based surrogate is constructed using IS-based reliability estimates. The error (GMSE vs. range) for the RBF surrogate of *β* is found to be 6% which indicates a good fit. This surrogate is used for constraint (g) evaluation during optimization.

The stress contour of the torque arm design obtained as a result of the optimization is presented in Figure 12. Here, mean values for design parameters and mean loading condition are considered. The maximum von Mises stress is observed to be 523.92 MPa for the optimum design. The optimal mass of the torque arm is 0.801 kg. It can be observed that the mass is distributed to meet the target reliability. An MCS (with 105 sample size) is used to validate the reliability of RBDO optima obtained from the IS approach, and it is observed that *βMCS* = 3.00. For the Monte Carlo simulation of 10<sup>5</sup> simulations, it took 8.7 h using parallel computing toolbox of MATLAB on a system with the following specifications: Intel Xeon 10 Core 2.20 GHz 64 bit processor, with 32GB RAM. Table 18 presents the design variable bounds and the optimum design obtained from RBDO.

Though the errors in the individual reliability estimations are quantified via bootstrap, they were not propagated into the surrogate model during RBDO. Using the surrogate model for reliability index evaluation instead of direct evaluation enabled the smoothing of noise from the IS-based estimation which leads to better convergence during optimization. From MCS validation at the optima, it is observed that the proposed approach results in heavier but safer designs. Further analysis is required to understand the error propagation from different stages of the proposed approach.

**Figure 12.** Stress (von Mises stress in MPa) contour of optimum design with mean design parameters (*d*∗) and loading condition of *Fx* = −2789 N and *Fy* = 5066 N.


**Table 18.** Design variable bounds and optimum design of torque arm (in cm).

**Figure 13.** Pareto optimal front corresponding to different values of reliability index estimated through polynomial response surface (PRS).

### *6.4. Car Side-Impact Problem—A Multi-Objective Reliability-Based Design Optimization (MORBDO) Example*

We demonstrate the proposed methodology on an MORBDO example taken from [61]. In this example, the objective is to minimize the weight (*f*1) of a car as well as the average rib deflection (*f*2) during a crash. A car is subjected to a side-impact based on European Enhanced Vehicle-Safety Committee (EEVC) procedures. The effect of the side-impact on

a dummy in terms of head injury criteria, load in the abdomen, pubic symphysis force, viscous criterion, and rib deflections at the upper, middle and lower rib locations are considered as constraints. The MORBDO formulation is made up of seven uncertain design variables (*x*1, ... , *x*7) and four random variables (*p*1, ... , *p*4). Equation (27) presents the optimization formulation of the car side-impact problem:

$$\begin{aligned} \min\_{\mu\_{\mathbf{x}}} \quad & f\_1 = f(\mu\_{\mathbf{x}}, \mu\_{\mathbf{p}})\\ \min\_{\mu\_{\mathbf{x}}} \quad & f\_2 = \frac{g\_2(\mu\_{\mathbf{x}}, \mu\_{\mathbf{p}}) + g\_3(\mu\_{\mathbf{x}}, \mu\_{\mathbf{p}}) + g\_4(\mu\_{\mathbf{x}}, \mu\_{\mathbf{p}})}{3} \\ \text{s.t.} \quad & \Phi^{-1} \left( 1 - \mathbb{P} \mathbf{r} \left( g\_i(\mathbf{x}\_i \mathbf{p}) \right) \le b\_i \right) \ge \beta\_{t\prime} \\ & i = 1, \dots, 10. \end{aligned} \tag{27}$$

where

*g*1(**x,p**) ≡ Abdomen load ≤ 1 kN *g*2(**x,p**) ≡ Upper rib deflection ≤ 32 mm *g*3(**x,p**) ≡ Middle rib deflection ≤ 32 mm *g*4(**x,p**) ≡ Lower rib deflection ≤ 32 mm *g*5(**x,p**) ≡ Upper viscous criteria ≤ 0.32 m/s *g*6(**x,p**) ≡ Middle viscous criteria ≤ 0.32 m/s *g*7(**x,p**) ≡ Lower viscous criteria ≤ 0.32 m/s *g*8(**x,p**) ≡ Pubic symphysis force ≤ 4.0 kN *g*9(**x,p**) ≡ Velocity of B-pillar at middle point ≤ 10 mm/ms *g*10(**x,p**) ≡ Velocity of front door at B-pillar ≤ 15.7 mm/ms;

1.0 ≤ *x*<sup>1</sup> ≤ 1.5, 0.45 ≤ *x*<sup>2</sup> ≤ 1.0, 0.5 ≤ *x*<sup>3</sup> ≤ 1.5, 0.5 ≤ *x*<sup>4</sup> ≤ 1.5, 0.875 ≤ *x*<sup>5</sup> ≤ 2.625, 0.4 ≤ *x*<sup>6</sup> ≤ 1.2, 0.4 ≤ *x*<sup>7</sup> ≤ 1.2, *μp*<sup>1</sup> = 0.345, *μp*<sup>2</sup> = 0.192, *μp*<sup>3</sup> = *μp*<sup>4</sup> = 0.

Analytical expressions of the objective functions and constraints as well as the physical descriptions of the design variables (*x*<sup>1</sup> − *x*7), and random variables (*p*<sup>1</sup> − *p*4) are presented in Appendix C.

In order to solve the MORBDO problem as per Algorithm 3, a DoE of size (*Ndoe* = 200) is generated within the design bounds where reliability of the ten constraints (*g*1, ... , *g*10) is estimated using the IS approach. In this example, the number of design variables is seven; hence, 2<sup>7</sup> = 128 corner points have been sampled. Next, we performed a space-filling sampling using LHS. In each dimension, we sampled 10 points (in total 7 × 10 = 70) using LHS design as per a thumb rule in DoE sampling [62]. Without loss of generalization, we added two to make a round sampling number. Sample sizes of response and capacity during IS are considered as *Nis* = 50. PRS surrogate for reliability indices of the ten constraints is constructed using the IS-based estimates. In order to improve the surrogate accuracy, an additional DoE of size 200 is generated within adjusted design bounds. The final surrogate errors (GMSE vs. range) for all constraints is found to range between 3.76% and 8.38%. Using these surrogates for constraint evaluation, the MOO problem presented in Equation (27) is solved using NSGA-II [63].

The Pareto optimal front corresponding to the different target reliability indices (*β<sup>t</sup>* = [2.5, 3.0, 3.5]) along with the deterministic Pareto optimal front is presented in Figure 13. The NSGA-II algorithm is applied for the four instances using the population size of 200 for 100 generations. The GA uses the following parameters: number of offspring is 50, probability (*pc*) of simulated binary crossover (SBX) is 0.9, crossover parameter (*ηc*) is 20, probability of mutation (*pm*) is 0.9, and the mutation parameter (*ηm*) is 50.

It is evident from Figure 13 that as the targeted reliability increases, the respective reliable Pareto optimal front shifts inside the feasible criterion space and away from the deterministic Pareto optimal front to ensure more reliable solutions. The Pareto solutions were further validated using MCS (*Nmcs* = 106), and all the solutions were found to meet the target reliability.

### **7. Conclusions**

A scarce sample-based importance sampling approach to estimate reliability is proposed when there is little or no information about the uncertainty characteristics of the random variables involved. The proposed formulation was tested on different tail heaviness for *R* and *C* distributions. In the case of one of the distributions being heavy-tailed, a tail-index estimate-based improvement to choose between PDF and CDF approximation was employed and shown to improve the accuracy (50th percentile by 1.9 times). Confidence bounds on the reliability estimate obtained through the bootstrap procedure have been shown to be indicative of the accuracy of the estimate. The proposed IS approach has been applied for reliability estimation and RBDO examples and found to be effective in terms of computational savings (50 × 44 = 2200 for cantilever beam and 75 × 60 = 4500 for bracket structure example) as compared to MCS where the sample size for each reliability estimate was 106. The approach is demonstrated on a non-analytical RBDO example which yielded a design that met the target reliability index (validated by MCS). The proposed approach has also been demonstrated on the car side-impact problem which is a multi-objective reliability-based design optimization example.

While the tail-index estimate-based alternative reduces the errors from the approximation, establishing the superiority of CDF approximation versus PDF approximation was only achieved through post-analysis of results for the specific methods used in this work. Future work could include incorporating region-wise best methods for both PDF and CDF approximation, using a suite of methods assessed through cross-validation errors. Active learning approaches could be employed during optimization to further reduce the number of design points, thereby reducing the number of reliability estimations required to obtain the optima.

**Author Contributions:** Conceptualization, K.P. and P.R.; investigation, K.P. and D.Y.; methodology, K.P. and P.R.; supervision, P.R.; validation, K.P. and D.Y.; writing—original draft, K.P.; writing—review and editing, K.P., D.Y. and P.R. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Data Availability Statement:** MATLAB® codes used to generate the results shall be provided upon request.

**Conflicts of Interest:** The authors declare that they have no conflict of interest.

### **Appendix A. Kernel Density Estimation (KDE)**

Let (*x*1, *x*2, ..., *xn*) be an i.i.d. sample from a distribution with a PDF (*f*), then the kernel density estimate of *f* is given as

$$\hat{f}\_h(\mathbf{x}) = \frac{1}{nh} \sum\_{i=1}^n K\left(\frac{\mathbf{x} - \mathbf{x}\_i}{h}\right) \tag{A1}$$

where *h* is a smoothing parameter called bandwidth, *n* is the sample size, and *K*(.) is a kernel, which is non-negative, integrates to one and is centered at zero. Different kernel functions can be used, such as normal, uniform, Epanechnikov, triangle and others, and the bandwidth is selected based on the sample data chosen. The choice of the bandwidth influences the variance and the bias of the estimator. The performance of the kernel density estimator is more dependent on the choice of the bandwidth rather than kernel choice. Despite being the most popular non-parametric approach to density estimation, there are some implementation issues, such as bandwidth selection, local adaptivity and boundary bias. While KDE works well for data following a normal distribution, it performs poorly while estimating heavy-tailed distributions, especially in the tail region which is our region of interest, hence adaptive KDE proposed by [64] is chosen. The Matlab® implementation of adaptive KDE for 1D [65] was employed in the current work for PDF approximation.

### **Appendix B. Third-Order Polynomial Normal Transformation Technique (TPNT)**

Hong and Lind [66] proposed this method of approximating a CDF where given the order statistics *ζ*<sup>1</sup> ≤ *ζ*<sup>2</sup> ≤ ... ≤ *ζ<sup>N</sup>* obtained from a sample realization of a random variable *Z*, through "sample rule" the fractiles are constrained in the following manner:

$$\{\zeta\_{i\prime}, \mathcal{F}\_{\mathcal{Z}}(\zeta\_i)\} = \{\zeta\_{i\prime}, \frac{i}{N+1}\}, \quad i = 1, 2, \dots, N \tag{A2}$$

where *FZ*(.) is the cumulative distribution function of *Z*. In this method, a third-order polynomial relationship between *ζ* and a normal transformation of *FZ*(*ζ*) is assumed as presented in Equation (A3)

$$\mathcal{Z} = \sum\_{k=0}^{3} a\_k \eta^k \tag{A3}$$

where

$$\eta = \Phi^{-1}(\text{Fz}(\zeta))\tag{A4}$$

Here, Φ−1(.) is the inverse of the standard normal distribution function. The coefficients of the polynomial in Equation (A3) are found through least squares minimization of the error, *ε*:

$$\varepsilon = \sum\_{j \in J\_s} \left( \zeta\_j - \sum\_{k=0}^3 a\_k (\eta\_j)^k \right) \tag{A5}$$

where *Js* is a set of data points chosen for the parameter estimation which is usually the same as sample size *N*. Two constraints *a*<sup>2</sup> <sup>2</sup> − 3*a*1*a*<sup>3</sup> > 0, *a*<sup>3</sup> > 0 are imposed to ensure monotonicity in the third-order polynomial curve.

At a new fractile *ζ*0, the probability *FZ*(*ζ*0) is determined through Φ(*η*0), where *η*<sup>0</sup> is obtained by solving Equation (A3), with substitution of *ζ* by *ζ*0.

### **Appendix C. Car Side-Impact Problem**

The analytical expression of the objective function and constraint functions are given below:

*f*1(**x**) = 1.98 + 4.9*x*<sup>1</sup> + 6.67*x*<sup>2</sup> + 6.98*x*<sup>3</sup> + 4.01*x*4+ 1.78*x*<sup>5</sup> + 0.00001*x*<sup>6</sup> + 2.73*x*7, *g*1(**x,p**) = 1.16 − 0.3717*x*2*x*<sup>4</sup> − 0.00931*x*<sup>2</sup> *p*3− 0.484*x*<sup>3</sup> *p*<sup>2</sup> + 0.01343*x*<sup>6</sup> *p*3, *g*2(**x,p**) = 28.98 + 3.818*x*<sup>3</sup> − 4.2*x*1*x*<sup>2</sup> + 0.0207*x*<sup>5</sup> *p*<sup>3</sup> + 6.63*x*6*x*<sup>9</sup> − 7.7*x*7*x*<sup>8</sup> + 0.32*p*<sup>2</sup> *p*3, *g*3(**x,p**) = 33.86 + 2.95*x*<sup>3</sup> + 0.1792*p*<sup>3</sup> − 5.057*x*1*x*2− 11*x*<sup>2</sup> *p*<sup>1</sup> − 0.0215*x*<sup>5</sup> *p*<sup>3</sup> − 9.98*x*<sup>7</sup> *p*<sup>1</sup> + 22*p*<sup>1</sup> *p*2, *g*4(**x,p**) = 46.36 − 9.9*x*<sup>2</sup> − 12.9*x*<sup>1</sup> *p*<sup>1</sup> + 0.1107*x*<sup>3</sup> *p*3, *g*5(**x,p**) = 0.261 − 0.0159*x*1*x*<sup>2</sup> − 0.188*x*<sup>1</sup> *p*<sup>1</sup> − 0.019*x*2*x*7+ 0.0144*x*3*x*<sup>5</sup> + 0.0008757*x*<sup>5</sup> *p*<sup>3</sup> + 0.08045*x*6*x*9+ 0.00139*p*<sup>1</sup> *p*<sup>4</sup> + 0.00001575*p*<sup>3</sup> *p*4, *g*6(**x,p**) = 0.214 + 0.00817*x*<sup>5</sup> − 0.131*x*<sup>1</sup> *p*<sup>1</sup> − 0.0704*x*<sup>1</sup> *p*2+ 0.03099*x*2*x*<sup>6</sup> − 0.018*x*2*x*<sup>7</sup> + 0.0208*x*<sup>3</sup> *p*<sup>1</sup> + 0.121*x*<sup>3</sup> *p*<sup>2</sup> − 0.00364*x*5*x*<sup>6</sup> + 0.0007715*x*<sup>5</sup> *p*3− 0.0005354*x*<sup>6</sup> *p*<sup>3</sup> + 0.00121*p*<sup>1</sup> *p*4+ 0.00184*x*<sup>9</sup> *p*<sup>3</sup> − 0.018*x*2*x*2, *g*7(**x,p**) = 0.74 − 0.61*x*<sup>2</sup> − 0.163*x*<sup>3</sup> *p*<sup>1</sup> + 0.001232*x*<sup>3</sup> *p*3− 0.166*x*<sup>7</sup> *p*<sup>2</sup> + 0.227*x*2*x*2, *g*8(**x,p**) = 4.72 − 0.5*x*<sup>4</sup> − 0.19*x*2*x*<sup>3</sup> − 0.0122*x*<sup>4</sup> *p*3+ 0.009325*x*<sup>6</sup> *p*<sup>3</sup> + 0.000191*p*<sup>4</sup> *p*4, *g*9(**x,p**) = 10.58 − 0.674*x*1*x*<sup>2</sup> − 1.95*x*<sup>2</sup> *p*<sup>1</sup> + 0.02054*x*<sup>3</sup> *p*3− 0.0198*x*<sup>4</sup> *p*<sup>3</sup> + 0.028*x*<sup>6</sup> *p*3, *g*10(**x,p**) = 16.45 − 0.489*x*3*x*<sup>7</sup> − 0.843*x*5*x*<sup>6</sup> + 0.0432*p*<sup>2</sup> *p*3− 0.0556*p*<sup>2</sup> *p*<sup>4</sup> − 0.000786*p*<sup>4</sup> *p*4.

Description of the design variables (*x*<sup>1</sup> − *x*7) and random variables (*p*<sup>1</sup> − *p*4) (standard deviation in bracket)


### **References**


### *Article* **Is NSGA-II Ready for Large-Scale Multi-Objective Optimization?**

**Antonio J. Nebro 1,2, Jesús Galeano-Brajones 3, Francisco Luna 1,2,\* and Carlos A. Coello Coello <sup>4</sup>**


**Abstract:** NSGA-II is, by far, the most popular metaheuristic that has been adopted for solving multi-objective optimization problems. However, its most common usage, particularly when dealing with continuous problems, is circumscribed to a standard algorithmic configuration similar to the one described in its seminal paper. In this work, our aim is to show that the performance of NSGA-II, when properly configured, can be significantly improved in the context of large-scale optimization. It leverages a combination of tools for automated algorithmic tuning called irace, and a highly configurable version of NSGA-II available in the jMetal framework. Two scenarios are devised: first, by solving the Zitzler–Deb–Thiele (ZDT) test problems, and second, when dealing with a binary realworld problem of the telecommunications domain. Our experiments reveal that an auto-configured version of NSGA-II can properly address test problems ZDT1 and ZDT2 with up to 217 = 131, 072 decision variables. The same methodology, when applied to the telecommunications problem, shows that significant improvements can be obtained with respect to the original NSGA-II algorithm when solving problems with thousands of bits.

**Keywords:** NSGA-II; auto-configuration and auto-design of metaheuristics; large-scale multi-objective optimization; real-world problems optimization

### **1. Introduction**

Since the publication of the seminal paper of Deb et al. [1] presenting the Nondominated Sorting Genetic Algorithm-II (NSGA-II) over twenty years ago, this algorithm has become the standard metaheuristic for solving multi-objective optimization problems. Since then, NSGA-II has been included in a large number of works as a reference against which newly proposed approaches are compared (e.g., [2–4]). Additionally, it is normally the first-choice solver for dealing with real-world problems [5–8]. Its popularity can be easily assessed by looking at the number of citations to [1] (e.g., in Google Scholar or Clarivate Analytics).

NSGA-II is a generational genetic algorithm characterized by applying a dominance ranking scheme to foster convergence and the crowding distance density estimator to promote diversity. These components are used in the replacement step prior to building up the population for the next generation of the algorithm. In most of the studies involving NSGA-II, particularly when continuous problems are tackled, it is configured according to a parameterization mimicking the one used when it was originally introduced in [1], namely: population and offspring population size of 100, Simulated Binary Crossover (probability: 0.9, distribution index: 20.0), and Polynomial-based Mutation (probability: 1/*L*, where *L* is the number of decision variables of the problem, distribution index: 20.0).

**Citation:** Nebro, A.J.; Galeano-Brajones, J.; Luna, F.; Coello Coello, C.A. Is NSGA-II Ready for Large-Scale Multi-Objective Optimization? *Math. Comput. Appl.* **2022**, *27*, 103. https://doi.org/ 10.3390/mca27060103

Academic Editor: Efrén Mezura-Montes

Received: 26 October 2022 Accepted: 28 November 2022 Published: 30 November 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

It is well known that the performance of metaheuristics in solving a given problem depends, to a large extent, on its correct parameter settings [9], so the motivation behind this work is to carry out an experimental study to determine to what extent the search capacity of NSGA-II can be improved if it is properly configured. We focus this study on the context of large-scale optimization problems, i.e., those problems having more than 100 decision variables.

The methodology that we have applied consists, first, of using a highly configurable version of NSGA-II, which is available in jMetal, a Java-based optimization framework [10,11]. We assume that any multi-objective genetic algorithm using dominance ranking and the crowding distance in the replacement step is an NSGA-II variant. That version, referred to as AutoNSGA-II, has been made more extensible and flexible so that: (i) it can adopt an external archive to store the non-dominated solutions, (ii) the offspring population size can be different from the population size, and (iii) the variation operators can be taken from an extended set of different crossover and mutation operators besides Simulated Binary Crossover and Polynomial-based Mutation. Second, we use the irace tool [12] to automatically find the best AutoNSGA-II configurations from a set of training instance problems.

We are going to consider two scenarios, one consisting of solving the Zitzler–Deb– Thiele (ZDT) [13] test suite, starting with 2048 decision variables and another one dealing with a real-work binary telecommunication problem where the solutions can have thousands of bits, which aims to minimize the energy consumption and increase the provided bandwidth in an ultra-dense 5G (fifth generation) network. It is important to emphasize that the purpose of this work is not to compare NSGA-II against state-of-the-art algorithms designed to solve large-scale multi-objective problems but to empirically assess up to what extent the performance of NSGA-II can be enhanced when properly configured in the two scenarios previously considered.

The rest of this paper is organized as follows. The next section reviews the related literature and identifies the research gap covered in this work. Section 3 elaborates on the components required to auto-configure NSGA-II with irace, as well as the two target scenarios used to assess the performance of AutoNSGA-II. The results obtained in the experiments conducted are analyzed in Section 4. Finally, Section 5 discusses the main conclusions drawn and proposes some lines for future research.

### **2. Related Work**

The auto-configuration (or auto-tuning) of metaheuristics is an open research field that studies the design of tools that follow the machine learning approach of, given a set of problems used as a training set, automatically finding an accurate parameterization of the algorithm that it is expected to work well on a validation test and, consequently, on similar problems. A further step is the auto-design of metaheuristics, which, given a set of components, is able to create a full algorithm specifically tailored to the training and validation sets. In the field of multi-objective metaheuristics, these issues have been studied in several papers, such as in [14–16].

Focusing on NSGA-II, the idea of auto-tuning a configurable version of it by combining jMetal and irace was presented in [17], where the Walking Fish Group (WFG) [18] test suite was used as the training set, and the resulting NSGA-II variant was validated with the same problems plus the Deb–Thiele–Laumanns–Zitzler (DTLZ) [19] test suite. The reported results showed that that version globally outperformed the original NSGA-II in most of the problems when applying four quality indicators. A similar approach has been used in this paper to address large-scale multi-objective optimization problems.

Indeed, the context of large-scale multi-objective optimization is a hot research topic that is mainly motivated because many real-world problems contain hundreds or even thousands of decision variables (e.g., the training of deep neural networks). Consequently, the search space becomes huge and traditional metaheuristics have difficulties finding accurate solutions. One of the first works in this line is [20], where eight multi-objective metaheuristics, including NSGA-II, were tested on the ZDT problems scaling the variables

up to 2048. Paper [21] presents a survey of recent proposals, but none of them is based on applying auto-configuration to an existing algorithm.

### **3. Materials and Methods**

In this section, we describe the configurable version of NSGA-II available in jMetal and the experimental methodology adopted, which includes the two scenarios considered, the auto-configuration process with irace, and the computing environments.

### *3.1. Component-Based NSGA-II*

The implementations of NSGA-II in jMetal have evolved over time. Keeping as a reference the behavior of a generic evolutionary algorithm, following the pseudo-code included in Algorithm 1, the first implementation provided by the release presented in [10] was based on a single and large method (130 lines of Java code) that contained all the steps of the algorithm. In the jMetal 5 release [11], this approach was replaced by an abstract class that closely mimicked the pseudo-code, which improved the modularity and reusability of the code. The last implementation, presented in [17], is based on a component-based architecture, where all the steps of an evolutionary algorithm are objects; this scheme offers an enhanced degree of flexibility that allows the generation of evolutionary algorithms in a dynamic way from a repository of components. This architecture is the basis of the AutoNSGA-II algorithm that we will use in this work.

### **Algorithm 1** Pseudo-code of an evolutionary algorithm.

1: *P*(0) ← GenerateInitialSolutions() 2: *t* ← 0 3: Evaluation(*P*(0)) 4: **while not** TerminationCriterionIsMet() **do** 5: *P* (*t*) ← Selection(*P*(*t*)) 6: *Q*(*t*) ← Variation(*P* (*t*)) 7: Evaluate(*Q*(*t*)) 8: *P*(*t* + 1) ← Replacement(*P*(*t*), *Q*(*t*)) 9: *t* ← *t* + 1 10: **end while**

The component types and some of the available instances are shown in Table 1. Therefore, we see that there are three strategies to create a population of solutions: random, Latin hypercube sampling, and the strategy used in some scatter search algorithms (e.g., AbySS [22]). The evaluation of a population can be performed sequentially or in parallel using the processor cores (multithreaded evaluation). We can observe that there are four components to indicate the stopping condition, ranging from the typical computation of a maximum number of evaluations to reach a certain level in a quality indicator; in the latter case, a maximum number of evaluations must also be set to cope with situations where the stopping condition is never fulfilled. The most commonly used selection scheme in NSGA-II is a binary tournament, but we have generalized it to an *n*-ary tournament and added a random selection. As NSGA-II is a genetic algorithm, the variation component applies both crossover and mutation, and the replacement component characterizing NSGA-II is the one based on ranking and a density estimator.


**Table 1.** Component catalog in jMetal for evolutionary algorithms.

### *3.2. Parameter Space for Auto-Configuring NSGA-II*

The automatic configuration of our AutoNSGA-II is based on a parameter space that is composed of several elements coming both from the particular selected components and from specific algorithmic parameters. We have to take into account that a number of components are fixed: the evaluation is sequential, the termination is by evaluations, and the replacement is performed based on a ranking procedure (non-dominated sorting) and the use of a density estimator (crowding distance).

Currently, the implementation of AutoNSGA-II can deal with both continuous and binary problems. The full parameter space for solving both types of problems is detailed in Table 2. There is a first group of common parameters that is not dependent on the encoding, and then we include those that are specific for either continuous or binary decision variables.

Given a population size, which we have fixed to 100 solutions, the algorithm can optionally use an external archive to store the non-dominated solutions of capacity 100; in that case, the result of the algorithm will be either the external archive or, otherwise, the population. Furthermore, when using an archive, the population size can vary from 10 to 200, and the crowding distance estimator is used to promote diversity when the archive is full (i.e., the solution having the lowest crowding distance value is removed). While the standard NSGA-II is a generational evolutionary algorithm, we can configure the offspring population size from 1 (i.e., steady-state) to a maximum of 400 solutions.

Next, we describe the parameters for real-coded multi-objective optimization problems. As commented in the previous section, there are three possible strategies for creating the initial population (random, Latin hypercube sampling, and scatter search). The variation component can choose between two crossover operators (SBX and BLX\_ALPHA) and four mutation operators (uniform, polynomial, linked polynomial, and non-uniform). The operators can have common parameters (e.g., the crossover probability) and specific parameters (e.g., the distribution index for the SBX crossover is a value in the range [5.0, 400.0]). The mutation probability is problem-dependent, usually set to 1/*n* (where *n* is the number of decision variables), so we consider a mutation probability factor, which is a value between 0.0 and 2.0, in such a way that the effective mutation probability will be the multiplication of that factor and 1/*n*. The repair strategies (random, round, bounds) are applied when a variation operator produces values out of bounds:



**Table 2.** Parameter space of AutoNSGA-II for real- and binary-coded problems.

The operators and parameters used to solve binary problems include single-point, HUX, and uniform crossover, while the mutation operator is bit-flip. We have also used here a mutation factor between 0.0 and 2.0 to modulate the effect of the mutation operator.

### *3.3. Experimental Methodology*

Our aim in this paper is to carry out an empirical study to determine if NSGA-II can address large-scale multi-objective optimization problems if it is properly configured. To this end, we designed two trial scenarios (a real-coded benchmark problem and a binary-coded real-world problem) and conducted a set of experiments divided into two phases, namely, auto-configuring NSGA-II with irace with a simple set of instances and performance assessment over a wider testbed.

### 3.3.1. Scenarios

The first scenario faces continuous benchmark problems; concretely, we have chosen the ZDT instances. These problems were used in the scalability study presented in [20], where a number of algorithms, including NSGA-II, were applied to optimize the problem family configured with up to 2048 variables. In that work, the solvers stopped the search when they found an approximated front whose Hypervolume (HV) was higher than 95% of the HV of the front used as reference. Those algorithms requiring the fewest number of evaluations to fulfill that condition were considered the fastest. A limit of ten million evaluations was also set so that an algorithmic execution reaching such a limit before obtaining an acceptable front was considered unsuccessful. In our scenario, we keep the same stopping condition, but the limit for failed executions is raised to

25 million evaluations and we configure ZDT instances starting from 2048 variables until 131,072 variables.

The second scenario considers a binary real-world problem from the domain of telecommunications, specifically in the context of 5G networks. A key enabling technology for these networks to meet their expected performance in terms of data rates, latency, etc. [23], lies in deploying many small base stations (SBS) close to end-users, which allows better re-use of the electromagnetic spectrum, as well as improving the signal quality and reducing the communication latency [24]. They are known as Ultra-Dense Networks (UDNs). Dimensioned to satisfy a given demand, UDNs incur considerable power consumption because of the number of SBSs that are operating. If no action is taken, this energy consumption also appears even in periods of low demand (e.g., commercial centers, office buildings, out of business hours, etc.). A well-known and standardized approach to reducing the electricity bill is to switch off a subset of the SBSs when they are underutilized. This poses a multi-objective optimization problem, named CSO (cell switch-off), which, given a set of SBSs, has to determine which subset must be turned on/off (binary decision) in order to minimize the power consumption and maximize the capacity provided to the users [25–27]. A detailed definition of the problem can be found in Appendix A. Recall that this is a large-scale multi-objective optimization problem, as seminal studies have anticipated that deployments with SBS every few meters might be required [24]. We have scaled up to about 12,000 cells per km2 in this work. Figure 1 shows an example of a UDN deployment with macro and micro base stations and small cells, where the on-off state of each one corresponds to one bit of the solutions.

3.3.2. Auto-Configuration and Performance Assessment

We now describe the phases of the experiments, namely, the use of irace to approximate the best configurations of AutoNSGA-II and the comparison of the obtained NSGA-II versions with the original one. We would like to point out that irace uses an iterative approach that samples the space of all possible configurations defined in Table 2 according to a particular distribution, selects the best configurations from the newly sampled ones by means of racing, and updates the sampling distribution to bias the sampling towards the best configurations. Therefore, it is a heuristic algorithm that does not guarantee the global optimal algorithmic configuration is found, as the sampling is limited to a maximum number of evaluations for which the algorithm is run with the sampled configuration on a given instance.

In order to use irace, a number of inputs are required:


In the continuous benchmark problem scenario, common parameters for real-coded variables are used. The training set consists of five ZDT problems with their default number of decision variables: 30 for ZDT1, ZDT2, and ZDT3, and 10 for ZDT4 and ZDT6. The executable is a jar file including jMetal code that, after solving a problem with a particular AutoNSGA-II configuration, applies the hypervolume quality indicator by using a reference front for the problem (as the ZDT are synthetic problems, reference fronts representing a subset of the Pareto fronts are available). Once irace has found a compromised configuration for AutoNSGA-II for the training set, this version of NSGA-II is compared with the original NSGA-II.

In the case of the CSO problem, irace receives the common and binary-coded parameters of Table 2. Evaluating a typical instance of this problem requires a significant amount of time, so generating and evaluating 100,000 configurations is infeasible. Our approach has been to define a small instance (with 1170 bits) that is used for training. As the Pareto front for this problem is unknown, we have defined a reference point (which is the requirement to apply the hypervolume) after inspecting several approximated fronts reached in a number of pilot tests. We have taken the extreme points of these fronts and added an offset in a conservative way to ensure that any approximated front computed by AutoNSGA-II would dominate those points. The reference point is then the result of taking the highest values per dimension of the extreme points. As with benchmark problems, the configuration found for AutoNSGA-II will be compared with the standard NSGA-II on a set of realistic problem instances.

### 3.3.3. Computing Environments

Running irace for algorithm auto-configuration can require a significant amount of computer power. The experiments on the ZDT problems have been executed in a virtualization environment located at the Ada Byron Research Center at the University of Málaga (Spain). We have used a virtual machine with Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60 GHz processor (64 cores) and 64 GB of RAM. The operating system is Ubuntu 21.04, and the versions of Java and irace are, respectively, JDK 14 and 3.4.1. The version of jMetal is 6.0-SNAPSHOT.

The experimentation conducted on the CSO problem, which is very computationally demanding, has been deployed on the facilities of the Supercomputing and Bioinformatics Center of the Universidad de Málaga, named Picasso. It is a heterogeneous computing platform composed of several clusters with up to 30.616 computing cores. The full hardware description can be found at http://www.scbi.uma.es/site/scbi/hardware, accessed on 25 October 2022. As the stopping condition here is to reach a predefined number of function evaluations because the true Pareto front is not known for this real-world problem, executions can be performed in this heterogeneous environment because runtimes are not relevant for this study. As such, each of these executions is submitted to Picasso using slurm, a cluster job manager, which allocates them to the first available computing core.

### **4. Results**

In this section, we present and analyze the results obtained after applying the experiments in the two scenarios described above.

### *4.1. ZDT Benchmark*

In Table 3, we include the default settings of NSGA-II and the configuration of AutoNSGA-II found by irace. If we compare the two algorithms, we observe that none of the default parameters of NSGA-II is kept by AutoNSGA-II. The auto-configured algorithm uses an external archive with population and offspring populations sizes of 56 and 14, respectively (the default values are 100 in both populations). It is worth noting that the traditionally used Simulated Binary Crossover (SBX) and Polynomial-based Mutation are replaced by BLX\_alpha crossover and non-uniform mutation. The configuration obtained by irace sets a value of *α* = 0.94 for BLX\_alpha, which introduces an additional diversity in the population that aims to properly integrate the controlled effect of the non-uniform mutation with the 1/*n* scheme used for the mutation rate in the search, and both the perturbation = 0.3 and the mutation factor of 0.45.


**Table 3.** Settings of NSGA-II and AutoNSGA-II for the ZDT problems.


We have executed both NSGA-II variants in the first scenario. The results obtained are presented in Table 4, which includes the computing times and evaluations required to reach the stopping condition. It is worth mentioning that we conducted a set of preliminary experiments, which revealed that the computing times and a number of evaluations per algorithm–problem combination were roughly similar, so performing a number of independent runs and reporting mean values would not add relevant information. This has to be taken into account, as it should be noted that some runs take hours or even days to complete. Consequently, the figures in Table 4 are the result of single executions.

If we focus on ZDT1 and 2048 variables, we observe that AutoNSGA-II needs 182,356 evaluations against the 1,250,500 required by NSGA-II. As a consequence, the computing times are reduced from 0.13 to 0.02 h (453 and 87 s, respectively), so the AutoNSGA-II is about 4.6 times faster than NSGA-II. This behavior continues until the number of variables increases up to 16,384, as NSGA-II is unable to solve ZDT1 with 32,768 variables; however, AutoNSGA-II is able to reach an approximated front that satisfies the stopping condition for the 131,072 decision variables of ZDT1 (95% of the HV of the reference front). The figures of ZDT2 are similar to those of ZTD1.

In the case of ZDT3, the number of evaluations decreases for NSGA-II compared to the ones of ZDT1 and ZDT2, while they increase for AutoNSGA-II, which is around 4.2 times faster. For this problem, NSGA-II fails to solve ZDT3 with 32,768 variables, while AutoNSGA-II is not capable of doing so with the largest number of variables. The results for ZDT6 reveal that AutoNSGA-II is about 18 times faster than NSGA-II in solving the problem with up to 65,356 variables, while NSGA-II can only solve it with 8192. The ZDT4 problem deserves special attention. Neither algorithm was able to solve it for 2048 variables, so we decided to re-run the auto-configuration process by using only ZDT4 as the training set. The settings obtained by irace are similar to those shown in Table 3 except for the mutation operator, which is linked polynomial mutation [28] (distributed index = 18.49, mutation probability factor = 0.28, and mutation repair strategy = random). With these parameter values, AutoNSGA-II has been able to solve ZDT4 with 2048 variables in less than 25 million evaluations.


**Table 4.** Results for NSGA-II and AutoNSGA-II on the ZDT benchmark. The last row shows the time and evaluations of AutoNSGA-II using a specific configuration for the ZDT4 problem.

\* This instance has used a specifically tuned configuration by irace.

From these results, we can state that the use of auto-configuration for NSGA-II produces a variant that is not only faster than NSGA-II on all problems except for ZDT4 but is also capable of scaling up to more than 100,000 variables in the case of problems ZDT1 and ZDT2, which is a remarkable outcome of our study. Using the five instances as a training set for the auto-configuration process has had the consequence of finding a suitable parameterization for four problems at the expense of a detriment in ZDT4.

The ZDT benchmark was proposed more than 20 years ago, and its problems are considered easy to solve, so we could consider our findings as a kind of lower bound of the capabilities of NSGA-II to solve scalable problems. We could also argue that the time required to solve ZDT1 and ZDT2 with 131,072 variables is more than four days, but we have to consider that we have used virtual machines and we have not applied any optimization technique (e.g., parallelism), so those times could be significantly reduced.

### *4.2. The CSO Problem*

The resulting configuration of AutoNSGA-II and how it contrasts with the typical NSGA-II settings for binary encodings is shown in Table 5. In this case, the main differences are again the presence of an external archive, the size reduction in the two populations (from 100 to 93 and 32 individuals, respectively), almost doubling the mutation impact (to 1.7) and higher selection pressure since a tournament size of 9 is adopted instead of 2.

**Table 5.** Settings for NSGA-II and AutoNSGA-II for the CSO problem.


In this experimental scenario, the goal is not to reach an approximated Pareto front with a given quality level but to approximate the best possible set of non-dominated solutions. To do so, we have used nine different families of CSO instances with an increasing density, not only in the SBSs deployed in the network (i.e., the problem size) but also in the number of existing users that represents the actual demand for data traffic. Three density levels for each parameter have been considered, namely Low, Medium, and High (L, M, and H, respectively), whose full specification is included in Table A1 in the Appendix. The combination of these density levels results in nine families of instances that have already been addressed in previous works [26,27]. We would like to emphasize that we have used the term "family" because the generation of these instances involves random processes for the deployment of both users and SBSs. To address this issue, we have considered here the same 50 random seeds for the two algorithms so that both NSGA-II and AutoNSGA-II face exactly the same generated instances. Two statistical measures of the HV indicator of the approximated Pareto fronts are computed: the mean and the standard deviation (see Table 6). Finally, as we do not have the true Pareto front for this realworld problem, the stopping condition is slightly different from that of the benchmarking problems addressed in the previous section. In fact, a maximum number of function evaluations has been used, which increases with the size of the instances: 100,000, 150,000, and 250,000 for L{X}, M{X}, and H{X}, respectively, with X = {L,M,H}. To obtain a reliable value of the HV indicator, we have first composed a reference Pareto front composed of all the non-dominated solutions found by all the algorithms for each instance, and then we have normalized each approximated front prior to computing the HV value, thus avoiding the effect of the different scaling in the problem objectives.


**Table 6.** HV indicator for the nine CSO problem families (Mean±Standard deviation).

The HV values reached by NSGA-II and AutoNSGA-II are shown in Table 6, where we have used a gray background to highlight the best (highest) value of the indicator. The conclusion is clear: AutoNSGA-II consistently outperforms NSGA-II in all combinations of densities in the UDN. These differences are remarkable, considering the normalization of the approximated fronts. If we analyze the effect of the density in more detail, we can also observe that when the density of users is Low, i.e., families {X}L (rows 1, 4 and 7), the average HV improvement of AutoNSGA-II is 0.18 over NSGA-II, whereas it is slightly lower for families {X}M and {X}H, which is 0.15 and 0.11, respectively. This showcases a very interesting point for the radio network designer (the decision-maker in the CSO problem) because substantially improved solutions can be reached in periods of very low demand, thus saving more energy consumption. All these results are shown to have statistical significance at a 95% level using either an ANOVA I or a Kruskal–Wallis depending on the normality of the samples, which is checked beforehand by a Kolmogorov–Smirnov test.

In order to better support these claims, in Figure 2, we also show the *50%-attainment surfaces* [29] of the nine families of CSO instances. It can be seen that, averaged over all the approximated fronts, the attainment surfaces of AutoNSGA-II cover regions of the solution space with very large energy savings (left-hand side of the plots), where NSGA-II is unable to reach. This particularly holds in the plots of the first column (i.e., families {X}L), corroborating the previous analysis of the HV values. Note that this is a key issue in the deployment of 5G networks, as this problem objective actually computes the instantaneous power consumption, so even small reductions have a deep impact on the electricity bill over a month/year timeframe for a network operator.

**Figure 2.** Attainment surfaces for the nine CSO instance families.

### **5. Conclusions**

This work has shown how a well-designed optimization software in combination with an automatic configuration tool such as irace allows tuning the NSGA-II algorithm to deal with large-scale multi-objective optimization problems. By properly adjusting the algorithm components in a methodology that involves not only updating the application rates but also the type of operators used, the auto-configured version of NSGA-II, named AutoNSGA-II, has been successfully evaluated over fairly different scenarios. On the one hand, AutoNSGA-II has been able to address instances of the continuous ZDT problem family (ZDT1 and ZDT2) with up to 2<sup>17</sup> = 131, 072 decision variables, being considerably faster (in terms of the number of function evaluations and thus the execution time) than the canonical NSGA-II in reaching approximated Pareto fronts with 95% of the HV indicator of the true Pareto front. On the other hand, in a more application-oriented context, AutoNSGA-II has been able to improve upon NSGA-II when addressing a combinatorial optimization problem in ultra-dense 5G networks, where a subset of cells have to be selected to be switched off in order to reach a trade-off between energy consumption and quality of service. The newly algorithmic configuration has been able to reach approximated Pareto fronts with, specifically, higher energy-efficient solutions than those computed by the standard NSGA-II.

A line of work that is worth addressing in the future is to repeat our experiments with the ZDT problems but using each problem separately as a training set aimed at determining, first, whether the performance of AutoNSGA-II can be improved (in terms of reducing the number of evaluations and then reducing the computing time) and, second, to analyze the obtained NSGA-II configurations for each problem to detect common parameter values or components.

The usefulness of using a methodology for automated algorithm tuning, such as NSGA-II, makes sense in the context of dealing with real-world problems, as our study with the CSO has shown. The application of this approach with our combination of jMetal and irace to other problems is also further research work.

**Author Contributions:** Conceptualization, A.J.N., J.G.-B., F.L. and C.A.C.C.; methodology, A.J.N., F.L. and C.A.C.C.; software, A.J.N. and J.G.-B.; validation, A.J.N., J.G.-B. and F.L.; analysis, A.J.N., J.G.-B., F.L. and C.A.C.C.; writing—original draft preparation, A.J.N. and J.G.-B.; writing—review and editing, A.J.N., J.G.-B., F.L. and C.A.C.C.; All authors have read and agreed to the published version of the manuscript.

**Funding:** This work has been partially funded by the Spanish Ministry of Science and Innovation via grants PID2020-112540RB-C41 and PID2020-112545RB-C54, by the European Union NextGenerationEU/PRTR under grant and TED2021-131699B-I00 (AEI/FEDER, UE), and the Andalusian PAIDI program with grants P18-RT-2799, A-TIC-608-UGR20, P18.RT.4830, and PYC20-RE-012-UGR. Carlos A. Coello Coello acknowledges support from CONACyT grant no. 2016-01-1920 (Investigación en Fronteras de la Ciencia 2016).

**Data Availability Statement:** A repository containing the source codes will be publicly available if the paper is accepted.

**Acknowledgments:** The authors would like to thank Picasso, the supercomputer at the Supercomputing and Bioinformatics centre of the Universidad de Málaga, for providing its services to perform the experiments (http://www.scbi.uma.es/, accessed on 25 October 2022).

**Conflicts of Interest:** The authors declare no conflict of interest.

### **Appendix A. UDN Modeling and Instances**

This work considers a service area of 500 × 500 meters, which has been discretized using a grid of 100 × 100 points (also called "pixels" or area elements), each covering a 25 m2 area, where the signal power is assumed to be constant. In addition to that, vertical densification has been taken into account by considering three vertical area elements, i.e., 25 m of height.

Ten different regions have been defined with different propagation conditions. To compute the received power at each point, *Prx*[*dBm*], the following model has been used:

$$P\_{rx}[dBm] = P\_{tx}[dBm] + P\_{Loss}[dB] \tag{A1}$$

where *Prx* is the received power in dBm, *Ptx* is the transmitted power in dBm, and *PLoss* is the global signal losses, which depend on the given propagation region, and are computed as:

$$PLoss[dB] = GA + PA \tag{A2}$$

where *GA* is the total gain of both antennas, and *PA* is the transmission losses in space, computed as:

$$PA\left[dB\right] = \left(\frac{\lambda}{2\cdot\pi\cdot d}\right)^K\tag{A3}$$

where *d* is the Euclidean distance to the corresponding sector at the SBS, and *K* is the exponent loss, which randomly ranges in [2.0, 4.0] for each of the 10 different regions. The Signal-to-Interference plus Noise Ratio (SINR) for UE *k*, is computed as:

$$SINR\_k = \frac{P\_{rx,j,k}[m\mathcal{W}]}{\sum\_{i=1}^{M} P\_{rx,j,k}[m\mathcal{W}] - P\_{rx,j,k}[m\mathcal{W}] + P\_n[m\mathcal{W}]} \tag{A4}$$

where *Prx*,*j*,*<sup>k</sup>* is the received power by UE *k* from the cell *j*, the summation is the total received power by UE *k* from all the cells operating at the same frequency that *j*, and *Pn* is the noise power, computed as:

$$P\_n[dBm] = -174 + 10 \cdot \log\_{10} BW\_j \tag{A5}$$

where *BWj* is the bandwidth of cell *j*, defined as 10% of the SBS operating frequency, which is the same for all the cells it deploys (see Table A1).

Finally, the UE's capacity has been calculated according to the MIMO depicted in [30]. Thus, we assume that the transmission power from each antenna is *Ptx*/*ntx*, where *ntx* indicates the number of transmitting antennas. Then, if we consider the subchannels to be uncoupled, their capacities can add up, and the overall channel capacity of the UE *k* can be estimated using the Shannon capacity formula:

$$\mathcal{C}\_{k}^{\vec{j}}[bps] = BW\_{k}^{\vec{j}}[Hz] \cdot \sum\_{i=1}^{r} \log\_{2} \left( 1 + \frac{SINR\_{k} \cdot \lambda\_{i}}{n\_{tx}} \right) \tag{A6}$$

where <sup>√</sup>*λ<sup>i</sup>* is the singular value of the channel matrix **<sup>H</sup>**, of dimensions *nrx* <sup>×</sup> *ntx* (i.e., # receiving antennas × # transmitting antennas). Note that both *nrx* and *ntx* depend on the cell type (see Table A1). *BW<sup>j</sup> <sup>k</sup>* is the bandwidth assigned to UE *k* when connected to cell *j*, assuming a round-robin schedule, that is:

$$BW\_k^j = \frac{BW\_j}{N\_j} \tag{A7}$$

where *Nj* is the number of UEs connected to cell *j*, and the UEs are connected to the cell that provides the highest SINR, regardless of its type.

In order to build a heterogeneous network, three different types of cells of increasing size and decreasing frequency are considered: femtocells, picocells, and microcells. Recall that these cells are generated by the antennas installed in a given sector of an SBS. Figure A1 illustrates the three configurations used in our modeling. In the first row, the three SBSs have the three sectors, and all their cells switched on (in operation). Thus the mapping to the binary string that represents a tentative solution, included below each subfigure, does have all the genes set to 1. In the second row, we have included several solutions with a subset of cells switched off, with the corresponding genes set to 0. It should also be noted that the number of transmitting antennas of each cell type increases with frequency, being 8, 64, and 256 transmitting antennas, respectively, for micro, pico, and femtocells. In the same way, we assume that high-capacity UEs, which will preferably connect to small cells (pico and femtocells), will implement a higher number of receiving antennas (4 and 8 for pico and femtocells, respectively). em

**Figure A1.** Configuration of the SBSs, sectors, and cells used in this work, as well as its mapping into a binary encoded representation.

With the system configuration described above, the actual deployment of the cells is carried out via the placement of SBSs in the working area, using a random rotation angle for the sectors, which determines the orientation of the different cell beams. Then, both SBSs and UEs are deployed using independent Poisson Point Processes (PPP) with different densities, defined by *λCells <sup>P</sup>* and *<sup>λ</sup>UE <sup>P</sup>* ), respectively.

The power consumption of a transmitter is computed based on the model presented in [31], which considers that the device is transmitting over the fiber backhauling. Therefore, the regular power consumption of cell *j*, *Pj*, is expressed as:

$$P\_{\bar{\jmath}} = \alpha \cdot P + \beta + \delta \cdot S + \rho \tag{A8}$$

where *P* denotes the transmitted or radiated power of the transmitter, coefficient *α* represents the efficiency of the transmission power produced by a radio frequency amplifier and feeder losses, the power dissipated due to signal processing and site cooling is denoted by *β*, and the dynamic power consumption per unit of data is given by *δ*, where *S* is the actual traffic demand provided by the serving cell. Finally, the power consumption of the transmitting device is represented by the coefficient *ρ*. However, in order to consider an accurate power consumption model, the power consumed by the air conditioning and power supply of the SBS should also be taken into account [32]. This has been called maintenance power and is set to 2W/SBS for any SBS containing at least one active cell.

The detailed parametrization of the scenarios addressed is included in Table A1, in which the column equation links the parameter to the corresponding equation in the formulation detailed above. The names in the last nine columns, XY, represent the deployment densities of SBSs and UEs, respectively, so that X = {L, M, H}, meaning either low, medium, or high-density deployments (*λCell <sup>P</sup>* parameter of the PPP), and Y = {L, M, H}, indicate a low, medium, or high density of deployed UEs (*λUE <sup>P</sup>* parameter of the PPP), in the last row of the table. The parameters *Gtx* and *f* of each type of cell refer to the transmission gain and the operating frequency (and its available bandwidth) of the antenna, respectively, where *ntx* and *nrx* are the number of transmitting and receiving antennas. Finally, the parameters of the previously described power consumption model are also included. Nine instances have been, therefore, used in this work in order to assess the performance of the different metaheuristics and their hybridization with the problem-specific operators.


**Table A1.** Model parameters for users and base stations.

### *Problem Formulation and Objectives*

Let B be the set of randomly deployed SBSs. A solution to the CSO problem is a binary string *s* ∈ {0, 1}|B| , where *si* indicates whether the cell *i* of a given SBS is activated or not. The first objective to be minimized is, therefore, computed as:

$$\min f\_{\text{Power}}(\mathbf{s}) = \sum\_{i=1}^{|S|} s\_i \cdot P\_i \tag{A9}$$

where *Pi* is the power consumption of SBS *i* (Equation (A8)). Note that *Pi* includes both the transmission power of every cell *i* in the SBSs and its maintenance power.

Let U be the set of UEs also deployed, as described in the previous section, and U be the entire set of cells contained in B. Subsequently, in order to compute the total capacity of the system, UEs are first assigned to the active cell that provides it with the highest SINR. Let A(*s*) ∈ {0, 1}|U|×|C| be the matrix where *aij* = 1 if *sj* = 1 and the Cell *j* serves UE *i*

with the highest SINR, and *aij* = 0 otherwise. Then, the second objective to be maximized, which is the total capacity provided to all UEs, is calculated as:

$$\max f\_{\mathbb{C}ap}(\mathbf{s}) = \sum\_{i=1}^{|\mathcal{U}|} \sum\_{j=1}^{|\mathcal{C}|} s\_j \cdot a\_{ij} \cdot B\mathcal{W}\_i^j \tag{A10}$$

where *BW<sup>j</sup> <sup>i</sup>* is the shared bandwidth of cell *j* provided to UE *i* (Equation (A7)). We would like to remark that these two problem objectives are clearly conflicting with each other since switching off base stations leads to a reduction in the power consumption of the network, but it also damages the capacity received by the user, as the UE–cell distance increases (rising the propagation losses) at the same time as the available bandwidth to serve users is reduced.

### **References**


### *Article* **Knowledge-Driven Multi-Objective Optimization for Reconfigurable Manufacturing Systems**

**Henrik Smedberg 1,\*, Carlos Alberto Barrera-Diaz 1, Amir Nourmohammadi 1, Sunith Bandaru <sup>1</sup> and Amos H. C. Ng 1,2**


**Abstract:** Current market requirements force manufacturing companies to face production changes more often than ever before. Reconfigurable manufacturing systems (RMS) are considered a key enabler in today's manufacturing industry to cope with such dynamic and volatile markets. The literature confirms that the use of simulation-based multi-objective optimization offers a promising approach that leads to improvements in RMS. However, due to the dynamic behavior of real-world RMS, applying conventional optimization approaches can be very time-consuming, specifically when there is no general knowledge about the quality of solutions. Meanwhile, Pareto-optimal solutions may share some common design principles that can be discovered with data mining and machine learning methods and exploited by the optimization. In this study, the authors investigate a novel knowledge-driven optimization (KDO) approach to speed up the convergence in RMS applications. This approach generates generalized knowledge from previous scenarios, which is then applied to improve the efficiency of the optimization of new scenarios. This study applied the proposed approach to a multi-part flow line RMS that considers scalable capacities while addressing the tasks assignment to workstations and the buffer allocation problems. The results demonstrate how a KDO approach leads to convergence rate improvements in a real-world RMS case.

**Keywords:** multi-objective optimization; knowledge discovery; reconfigurable manufacturing system; simulation

### **1. Introduction**

Current trends in the manufacturing industry are challenging companies to cope with demand variations and fluctuating production volumes. Companies are required to rapidly adjust the functionalities of their manufacturing systems to critically manage the needs of this dynamic market to stay competitive [1]. By implementing Reconfigurable Manufacturing Systems (RMSs), companies can efficiently meet the requirements of the competitive market [2]. RMSs enable cost-effective means to meet dynamic market demands by reconfiguring, among other aspects, their resources (e.g., machines, operators, buffers, etc.) and the process plan of the manufacturing system [3].

Today's manufacturing industry is affected by disruptions and shortages of components caused by extraordinary situations such as a global pandemic or war. These disruptions, combined with an increasingly shortened product life-cycle trend, mean that manufacturing organizations are required to ramp up and down products more frequently by modifying their production volumes more often than ever before [4]. Therefore, findings regarding how dynamic market demands of today can be addressed more efficiently constitutes a crucial research area in the RMS community.

Although an RMS may be able to meet the dynamic requirements in the market, designing and configuring the RMS is no trivial task. Simulation techniques, particularly

**Citation:** Smedberg, H.; Barrera-Diaz, C.A.; Nourmohammadi, A.; Bandaru, S.; Ng, A.H.C. Knowledge-Driven Multi-Objective Optimization for Reconfigurable Manufacturing Systems. *Math. Comput. Appl.* **2022**, *27*, 106. https://doi.org/10.3390/ mca27060106

Academic Editors: Carlos Coello, Eik Goodman, Kaisa Miettinen, Dhish Saxena, Oliver Schütze and Lothar Thiele

Received: 7 October 2022 Accepted: 7 December 2022 Published: 9 December 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

discrete event simulation, have proven to be a powerful tool for the manufacturing industry to assess the capabilities of their production systems [5,6]. Often, several conflicting objectives are used to simultaneously measure the quality of the system. Combining simulation techniques with Multi-Objective Optimization (MOO), i.e., Simulation-based Multi-objective Optimization (SMO), has been a successful approach for optimizing RMSs in the literature [4,7,8]. A general Multi-Objective Optimization Problem (MOOP) can be defined as:

$$\begin{array}{ll}\text{Minimize:} & F(\mathbf{x}) = [f\_1(\mathbf{x}), \dots, f\_M(\mathbf{x})]^T\\\text{Subject to:} & \mathbf{x} \in S\end{array}$$

for *M* number of objective functions, in the constrained and feasible search space *S*, where **x** = [*x*1, ... , *xN*] *<sup>T</sup>* is a vector of *N* decision variables. Due to the structure of MOOPs, a MOOP solution can be seen to inhabit two distinct spaces: the decision space and the objective space, and the objective functions can be seen as a mapping from the decision space to the objective space. The goal of MOO is to find a set of solutions that together represent the so-called Pareto-optimal front—the set of solutions that outperform or *dominate* all other solutions to the MOOP in *S*.

Due to complex aspects, such as the stochastic failures of resources and equipment that can be modeled using simulation techniques, exact methods are often omitted from consideration when optimizing SMO problems; instead, evolutionary algorithms are used. Multi-Objective Evolutionary Algorithms (MOEAs) are optimization techniques that are developed to mimic fundamental principles of evolution found in nature such as the wellknown algorithm Non-dominated Sorting Genetic Algorithm II (NSGA-II) [9], which is inspired by Darwinian survival of the fittest and evolves a population of solutions over a number of generations to converge on the Pareto-optimal front.

Although MOEAs are a powerful tool to solve all kinds of MOOPs, they generate many non-optimal solutions during the optimization process. Since these are largely eliminated during optimization and rarely considered in the decision-making process, one can see the wasted computational effort in evaluating them, specifically given the very time-consuming simulations in SMO. After the observation that most of the analysis of MOOP solutions is focused solely on the objective space, and mostly disregarding the dominated solutions, the authors of [10] present Knowledge Driven Optimization (KDO), which is the idea to employ knowledge discovery methods to describe decision-makers (DMs) preferences in the objective space, in terms of knowledge about the solutions in the decision space, and then use this knowledge to drive the search towards faster convergence on more optimal solutions. This can be achieved in two ways, either *offline* where knowledge is generated related to a previous scenario or case, and used to improve the convergence in a future scenario, or *online* where knowledge discovery is integrated into the optimization process as part of the MOEA itself to drive the search towards better convergence in the current scenario. In this paper, we investigate an offline KDO approach.

In this work, we employ a knowledge-driven NSGA-II for a real-world Multi-Part Flow Line (MPFL) to optimize the RMS configuration by considering scalable capacities and fluctuating production volumes. The MOOP formulation addresses task allocation to workstations as well as buffer allocation while maximizing throughput (THP) and minimizing total buffer capacity (TBC). In Section 5, we show how the new approach is able to speed up the convergence towards non-dominated solutions for new scenarios by utilizing knowledge discovered from initial scenarios. The scope of the paper is limited to proposing and showcasing this knowledge-driven approach and comparing the effects of utilizing knowledge in the form of decision rules discovered from one variant of the considered RMS, to speed up the convergence rate of another variant of the same RMS.

### **2. Background**

Simulation and optimization techniques have successfully been used in the context of manufacturing in the literature. However, the analysis of solutions is often limited to manual methods and mostly focus on the objective space. This section offers a background of simulation and optimization, knowledge discovery in MOO, and knowledge-driven optimization.

### *2.1. Simulation and Optimization in Manufacturing Systems*

Regardless of the benefits of RMS compared to traditional manufacturing systems in achieving demand and capacity fluctuations, the design and management of these systems are considered a complex combinatorial NP-hard problem which therefore can be handled by the employment of simulation and optimization tools [7,11,12]. When it comes to RMS problems, meta-heuristic methods such as genetic algorithms have become very popular in the literature because they have shown better performance in generating near-optimal solutions [7]. In addition, simulation has been a satisfactory tool to support the modeling and analysis of manufacturing systems for many years [13]. Because of the complexity and dynamism inherent in manufacturing systems, engineers and DMs supported by simulation tools can perform better analysis and, therefore, obtain a better understanding of the real-world systems [14]. Concerning RMS, simulation has been identified in the literature as a supportive technique to handle the uncertainty found in these types of dynamic, evolving systems [15]. Still, considering that the complexity of today's manufacturing systems is growing and that they need to consider a range of possible scenarios with a large number of variables to model and analyze, the use of simulation tools becomes nonfunctional. Alternatively, optimization methods could be employed to solve larger-scale NP-hard problems [7]. However, the majority of prior studies that applied optimization methods to RMS reduced the problem by excluding variability and stochasticity (e.g., machine failures) and therefore providing imprecise solutions. Therefore, studies that employed simulation and optimization separately have shown some of the above-mentioned shortcomings. Against these drawbacks, simulationbased optimization combines the benefits of simulation and optimization. In the literature, simulation-based optimization has successfully led to improvements in manufacturing systems. Consequently, SMO could lead to improvements in current RMSs [11,13].

RMSs need to address three main challenges, namely: (i) the system configuration, (ii) the process planning, and (iii) the components of the system [3]. The system configuration targets the physical arrangement of the resources (e.g., operators, machines, etc.) in the system [2]. This challenge is usually addressed by optimizing the resource assignment to workstations (WSs). The process planning targets the task allocation and balancing throughout the WSs [16]. This challenge is usually addressed by optimizing the work tasks allocation. Lastly, the components of the system address the appropriate number and type of components (e.g., buffers, operators, machines, etc.) in the system to reach the established capacity goal [13]. This challenge is usually addressed by optimizing the number of resources to perform the tasks. Although simulation-based optimization has been employed to address RMS problems previously in the literature, the use of SMO to address several or all of these challenges simultaneously is sporadic.

### *2.2. Knowledge Discovery in MOO*

Methods for knowledge discovery in the decision space of MOO solutions are not conventional in the multi-criteria decision-making literature, which mostly focus on manual methods for analyzing the solutions in the objective space. However, Ref. [10] offers a survey of data mining and machine learning methods that have been employed for knowledge discovery to support decision-making in MOO. The process of *innovization* [17] was developed as a way of finding innovative design principles to describe the Paretooptimal front. Innovization was initially described as a manual process of formulating relationships between correlated regions of the objective space using appropriate regression models; however, it has since been automated using genetic programming [18]. Simulation Based Innovization (SBI) is another method for knowledge discovery in MOO [19,20]. SBI trains a decision tree with the distance to a user-defined reference point (a point describing a DMs aspiration) in the objective space as the regression target. The DM then

chooses a threshold for the distance to the reference point to find rules that describe the decision space for the solutions within this threshold. A further application of knowledge discovery methods used in the analysis of solutions is offered by [21] where the authors used clustering in both the objective and decision spaces, as well as association rule analysis in cantilever design optimization problems.

### Flexible Pattern Mining

Although previous approaches have successfully utilized common data mining and machine learning methods for knowledge discovery, these methods were not developed specifically for the indented use in MOO, and may not fully be able to manage the typical characteristics of MOOP solutions, such as different variable types (continuous, discrete and ordinal, and nominal) [10]. However, a method that has been specifically developed for knowledge discovery in MOO is Flexible Pattern Mining (FPM) [22]. FPM was developed to extend sequential pattern mining [23] using the a priori algorithm [24] for finding decision rules. While sequential pattern mining finds rules of the form {*xi* = *c*} for a variable *xi* and constant value *c*, FPM is further able to find rules on the forms {*xi* = *c*}, {*xi* < *c*}, {*xi* ≤ *c*}, {*xi* > *c*} and {*xi* ≥ *c*}. To run FPM, the DM is required to supply a *selected* and an *unselected* set of solutions, and these selections are made in the objective space. With these selections as input, FPM then generates rules that separate the selected set from the unselected set in terms of the variables in the decision space. Typically, the DM may choose the non-dominated solutions as the selected set and the remaining solutions as the unselected set. Each rule generated by FPM has an associated *significance* or *sig* value, which is the fraction of solutions in the selected set that are covered by the rule, and a similar *unselected significance* or *unsig* for the fraction of solutions in the unselected set. An interesting and meaningful FPM-rule would have a high *sig* while having a low *unsig*, and thereby be describing only the solutions in the selected set. Rule interactions can also be considered by combining several FPM-rules and evaluating their combined *sig* and *unsig*. The three individual rules {*x*<sup>1</sup> < *c*1}, {*x*<sup>2</sup> > *c*2} and {*x*<sup>3</sup> = *c*3} can be combined into the three-level rule interaction {*x*<sup>1</sup> < *c*<sup>1</sup> ∧ *x*<sup>2</sup> > *c*<sup>2</sup> ∧ *x*<sup>3</sup> = *c*3}.

### *2.3. Knowledge-Driven Optimization*

Knowledge discovery methods can be a powerful tool in decision-making; however, in this manner, the knowledge is only used by the DMs. The term *Knowledge-Driven Optimization* (KDO) is used when knowledge discovered from good or preferred MOOP solutions is fed back into the optimization algorithm to affect the convergence behaviour, or used to update the MOOP formulation itself to make the search more efficient. The former is called online KDO, while the latter refers to offline KDO [10].

A key difference between online and offline KDO is that, since the knowledge used for the former is discovered during the search process from the best-so-far solutions, it does not necessarily describe the optimal solutions to the MOOP. On the other hand, with the assumption that an optimizer converges close to the Pareto-optimal front, offline KDO has access to "pure" knowledge directly describing the optimal (or preferred) solutions.

### 2.3.1. Online KDO

*Online KDO* in a MOEA involves a specific knowledge discovery step to generate knowledge from previous or current solutions, and is able to use this knowledge to affect the convergence behaviour and more effectively generate better or preferred solutions. Online KDO algorithms have been implemented to involve an additional step after the ordinary evolutionary process that finds knowledge for feeding into the evolutionary operators for the next generation. An example of an approach like this is shown in [25,26], where FPM rules are generated to build a distribution over the preferred solutions close to the reference point in preference-based MOO. This distribution is then sampled in a new mutation operator for the next generation of solutions. Another approach using FPM rules is presented in [27], where the rules are used as constraints in the decision space.

Approaches that train a classifier between *good* and *bad* solutions have also been proposed. In [28], a classifier was trained online to differentiate between dominated and non-dominated solutions, and in [29], a classifier is trained online to find constraint violating solutions. In both papers, the classifier was used before the solutions were evaluated, in order to save time by not evaluating poor solutions. Recently, approaches that use innovization online have also been proposed [30,31].

### 2.3.2. Offline KDO

*Offline KDO* refers to when knowledge about MOOP solutions is generated offline, after an optimization run has finished and is used to benefit future optimizations of the same or similar cases, or to give insights that can lead to an updated MOOP formulation. Only when DMs fully understand the MOOP and its solutions are they able to make an informed decision.

In [32], SBI was used to find decision rules about solutions close to a user-defined reference point, and then used as constraints in a second optimization run, to generate more non-dominated solutions. This method served both as a way to discover more preferred solutions, but also to validate the method and show that it generates actionable knowledge. Previously, it has also been found that leveraging domain knowledge can also greatly benefit the optimization [33,34]. This type of knowledge is not generated from previous optimizations, but from the experience and intuitions of veteran DMs. In [35], domain knowledge was used to develop specialized design heuristics to speed up the convergence of a multi-objective satellite design system problem.

Offline KDO is similar to the concept of *transfer learning* in the machine learning literature [36], where a model able to perform a specific task is also able to perform or jump-start the learning process of another related task. In this paper, we focus on using offline KDO in order to generate knowledge from an initial scenario that can be applied to benefit the search in a new scenario. In the next section, we present an illustration of how offline KDO can be implemented.

### **3. Illustration of Offline KDO**

Knowledge generated from MOOP solutions obtained in one scenario may be beneficial for future scenarios of similar MOOPs. Preferred solutions in the objective space may have a certain structure in the decision space that can be exploited to ensure a faster convergence towards the Pareto-optimal front or a greater density of preferred solutions. In this paper, we consider the knowledge generated through the FPM procedure [22] and the openly available implementation in the web-based decision support system Mimer (Mimer: https://assar.his.se/mimer/, accessed on 6 October 2022).

In this section, we want to showcase an example of how simple knowledge about non-dominated solutions can help to speed up the convergence and generate even more non-dominated solutions. We show how knowledge in the form of FPM-rules can be applied as constraints in the decision space to focus the search for non-dominated solutions in different parts of the Pareto-front.

### *Illustrative Example*

We showcase an example of offline KDO on the RE3-5-4 problem from the RE suite of real-world (inspired) test-problems [37]. We show how it is possible to use FPM to generate rules that describe non-dominated solutions, and then use these rules as box-constraints for the decision variables of the MOOP for a different optimization run. Without first generating knowledge about an initial solution set, this approach would not be possible. We also compare this offline approach with simply constraining the decision space to focus the search without relying on any knowledge.

The RE3-5-4 is a three-objective engineering problem with a mathematical formulation, based on the vehicle crash-worthiness design problem described in [38]. The objectives to RE3-5-4 are: (*f*0) minimize the weight of the vehicle, (*f*1) minimize the acceleration

characteristics in the crash, and (*f*2) minimize the toe-board intrusion during the crash, while the variables (*x*0–*x*4) each relate to the thickness of a different support member in the frontal structure in the vehicle.

Figure 1 shows non-dominated solutions generated from a single run on the RE3-5-4 problem with a budget of 6000 function evaluations, which resulted in 2344 non-dominated solutions. The structure of the objective space clearly shows three distinct, disconnected clusters of solutions. A DM would not only be interested in what causes solutions to end up in these different clusters in terms of the decision space, but also how to focus the search to further saturate these regions with more trade-off solutions. We can use FPM for each of the clusters, to find knowledge for the respective solutions in terms of the decision space. The FPM procedure requires a selected and an unselected set of solutions. We run FPM three times using Mimer, each time with the non-dominated solutions from one of the clusters as the selected set and the remaining solutions from the entire solution set as the unselected set, thus finding rules that describe the non-dominated solutions in each cluster. The resulting FPM rules are shown in Table 1.

**Figure 1.** Non-dominated solutions from RE3-5-4.

**Table 1.** Rule interactions found by using FPM for each of the clusters shown in Figure 1.


FPM was run with a minimum significance of 100% in each case, meaning that all discovered rules completely covered the selected set, and the results still show that the rules discriminate between the selected and unselected set, given the low unselected significance. However, the rule interaction found for cluster 2 had an *unsig* of 22.68%. This means that the rule interaction also describes 22.68% of the solutions in the unselected set, which would lead to a lower search pressure towards the non-dominated solutions within this cluster when used for offline KDO.

With this knowledge about the different clusters in hand, we run additional optimizations, focusing on each of these clusters separately. We used the rule interactions found using FPM as bounds to constrain the decision space, and ran an optimization with a total of 2000 function evaluations for each respective rule interaction. These three solution sets where then combined, and the non-dominated solutions from these combined runs are shown in Figure 2. This offline approach resulted in 3070 non-dominated solutions, with the same total function evaluations (6000) as the original run.

**Figure 2.** Non-dominated solutions from RE3-5-4 using offline KDO.

We also compare this offline KDO approach with the crude method of simply constraining the objective space to focus the search on the three clusters. The clusters can be classified by the objective space bounds shown in Table 2. To be fair against the offline KDO approach, we gave this crude method a budget of 4000 function evaluations for each cluster since the offline KDO approach was able to utilize knowledge from an initial 6000 solutions. We combined the final solutions sets from each cluster into one. This approach resulted in 2858 total non-dominated solutions, which are shown in Figure 3.

**Cluster Bounds** 1 *f*<sup>2</sup> > 0.13 2 *f*<sup>2</sup> < 0.13 ∧ *f*<sup>0</sup> < 1680 3 *f*<sup>2</sup> < 0.13 ∧ *f*<sup>0</sup> > 1680 0.2 f<sup>2</sup>\*Hmbi2` R \*Hmbi2` k \*Hmbi2` j

**Table 2.** Objective space bounds for each of the clusters as shown in Figure 1.

**Figure 3.** Non-dominated solutions from RE3-5-4 using bounded objective space.

6

8

f0

10

1,660

f1

0.1

1,680

We compare the baseline run with the offline KDO approach and the bounded objective space approach, by using the hypervolume metric (HV) [39] and by counting the contribution of each run to the composite front produced by combining the solutions from the three approaches. The composite front is shown in Figure 4 and the resulting HV and contribution to the composite front is shown in Table 3. The offline KDO approach resulted in a slightly greater HV and a greater contribution to the composite front, meaning that this approach gives superior performance over the other approaches.

**Figure 4.** Composite front of the solutions from Figures 1–3.

**Table 3.** HV score and contribution to composite front of the three approaches.


Since the offline KDO approach is utilizing knowledge discovered from a previous run, it is expected to have a higher performance. However, this example demonstrates that simply adding knowledge as box constraints in the decision space is enough to greatly improve the performance of an optimization run. This example also highlighted that applying a similar approach, by constraining the objective space, is not as effective as this offline KDO approach. This example shows the potential of incorporating offline knowledge into a MOO pipeline by spending a portion of the function evaluation budget on generating solutions, then finding knowledge about high performing solutions, and then utilizing this knowledge offline, for the remaining function evaluation budget, to reach a faster convergence on more preferred solutions.

### **4. Real-World RMS Problem**

The considered RMS comes from a MPFL setup implemented in a truck manufacturer in Sweden. The case is based on a pedal car production, where two product families are manufactured. The MPFL is composed of three reconfigurable WSs able to add, relocate, or remove operators from them in order to cope with production changes (e.g., volumes or capacity changes). Both products need to be produced at specific volumes. As the total production capacity or the production volumes fluctuate, the system configuration, the process plan, and the components of the systems change to meet the new scenario. The changes include the number of operators employed, the assignments of operators to the WSs, the tasks' assignments to WSs, and the buffers' capacities. The company was interested in different scenarios. Initially, they wanted to investigate the system's capacity with seven operators for the specific production volumes, 70/30 and 30/70. These different proportions of production volumes determine the total proportion to be produced of the two product parts. For example, a proportion of 70/30 refers to the fact that 70% of the total parts produced should be of part A, and the remaining 30% should be of part B. Furthermore, the company also wanted to investigate how much capacity could be gained by adding one and two extra operators to the system, including the information regarding how to reconfigure the system, how to re-balance the tasks, and a re-assessment of the capacities of the buffers. Therefore, as the proportion and volume changes, the RMS evolve accordingly. The assumptions of the RMS are:

• A MPFL consisting of several WSs produce several products under different volumes;


The mathematical problem formulation for the considered MPFL-RMS is detailed in [40].

In this paper, we consider an SMO problem using Throughput (THP) and Total Buffer Capacity (TBC) as objectives while striving for the optimal buffer and tasks allocation for the different scenarios. The total manufacturing time for the production is 336.38 s for part A and 293.38 s for part B divided into 29 and 24 tasks, respectively. The tasks precedence relations for both products are shown in Figure 5. Note that each task can be assigned to only one WS.

**Figure 5.** Precedence relation of the tasks for both products.

### *SMO Approach*

The architecture of the SMO approach used can be divided into two major components: the simulation engine and the optimization engine, which are tightly integrated. For the simulation engine, the discrete event simulation software FACTS Analyzer [41] was employed for modeling the production system and simulating the studied scenarios. The optimization engine was implemented in the well-known platform MATLAB. The integration between the simulation and optimization engines allows an accurate representation of a realistic production line involving many types of model variables regardless of their nature (e.g., failure, availability, mean time to repair, process time) while avoiding the simplification found in other production line optimization studies. The process begins in the optimization engine where custom-made encoding and decoding mechanisms generate feasible RMS solutions to, later on, be automatically mapped to the simulation engine. The simulation engine then uses the received combination of input variables to run the

simulation on the model. The results from the simulation experiments are fed back to the optimization engine in order to be evaluated by the optimization algorithm in terms of the designated conflicting objectives. This process in which the optimization engine evaluates the output of the optimization for instructing a new combination of input parameters to be simulated is repeated until the results converge to a set of optimal solutions or the stopping criterion is reached (i.e., a predefined number of generations).

Due to the outstanding performance in handling up to three conflicting objectives and being known as an effective MOEA when handling complex combinatorial problems, a customized NSGA-II with specific encoding and decoding mechanisms for RMS was implemented within the optimization engine to generate feasible solutions [4,12]. There are three main factors behind the success of NSGA-II, the fast non-dominated sorting which establishes a dominance relationship between each pair of solutions, the elitism mechanism to keep the best solutions, and the crowding distance calculation that ensures that ranks the solutions of each individual front maintaining diversity. The general steps of the customized NSGA-II for RMS are shown in Algorithm 1.

### **Algorithm 1** Enhanced SMO-NSGA-II

**Require:** Generation limit *Gmax*; Population size; Precedence relation; RMS inputs regarding WSs, buffers, resources and constraints


Due to the differences in how the considered RMS is encoded from a standard MOOP solved by NSGA-II, the variables for the number of WSs, and task- and buffer assignment are encoded as random keys in the enhanced algorithm, and on Line 4, they are decoded as feasible input for the simulation model. On Line 5, these decoded solutions are sent to and evaluated by the simulation engine, and the objective values are sent back to the algorithm. A complete description of the enhanced algorithm is provided in [40].

### **5. Experimental Results**

In this section, we present the results from the initial optimizations, the knowledge we were able to discover from the solutions to these optimizations, and new results from an offline KDO study using this discovered knowledge. We investigate the improvement in convergence towards the Pareto-optimal front by applying FPM rules as constraints in the decision space. All optimizations refer to the real-world RMS problem described in Section 4. All knowledge discovery was performed using the openly available web-based decision-support system Mimer, enabling the knowledge discovery framework described in [42].

### *5.1. Optimization Results*

We ran six optimizations initially, one scenario for each pair of number of operators (7, 8, 9) and proportion (70/30, 30/70). Each optimization run had a budget of 500 generations and a population size of 50. The resulting non-dominated solutions from these runs are

shown in terms of their task allocation in Figure 6, where each row represents one solution and each column represents one task for either product A or B, and the color of the cell shows the WS it was assigned to. In total, 72 non-dominated solutions were found in these scenarios altogether.

**Figure 6.** Task allocations from the non-dominated solutions from all scenarios in the initial optimizations.

From the figure, it is clear that most of the non-dominated solutions in each scenario share common task allocations. However, all solutions shown in Figure 6 are distinct and have varying buffer allocations which are not shown here.

The number of non-dominated solutions from each scenario is shown in Table 4, where we can see that the number of non-dominated solutions in each scenario varies from 10–19, except for the scenario of nine operators with a proportion of 70/30 where only one nondominated solution was found. The objective space of the non-dominated solutions from all scenarios is also shown in Figure 7, where it is very clear how increasing the number of operators, as expected, has a definite impact on the throughput.


**Table 4.** Number of non-dominated solutions found for each scenario in the initial optimizations.

### *5.2. Knowledge Discovery*

Due to the ability of the system to both increase and decrease the number of operators and to change the proportion between the two parts, in this paper, we are interested in finding generalized knowledge about each of the different number of operators and the different proportions. In other words, if we can generate knowledge from the previous scenarios with seven operators that can be generalized to improve the optimization process for future scenarios with seven operators but new proportions, and if we can generate knowledge from the scenarios with a proportion of 30/70 and use it in future scenarios with different numbers of operators, and so on for each group of scenarios.

**Figure 7.** Objective space of the non-dominated solutions from all scenarios in the initial optimizations.

We generate knowledge in the form of decision rules using the FPM procedure. From the initial results, we are interested in five groups of scenarios to generate knowledge from. The scenarios wherein the numbers of operators equal 7, 8, and 9, and where the proportions equal 30/70 and 70/30. In order to run FPM, we merged the solutions from all optimization scenarios into one combined dataset so that the different scenarios can take all cases into account. For each group, we ran FPM with the non-dominated solutions from the scenarios in the group as the selected set, and all remaining solutions (dominated and non-dominated) from all scenarios as the unselected set. In this way, the generated knowledge is general between scenarios in the MOOP.

For this knowledge discovery, we only focus on the task allocation. Even so, the number of decision variables is high (53) which affects the run-time of the FPM procedure. For this reason, the maximum level of rule interactions was limited to 4, i.e., only interactions of four FPM-rules are considered. As we can see in Figure 6, many non-dominated solutions in one scenario share the same task allocations for many tasks. Therefore, FPM is expected to find many rules with high significance. However, it is the rules that have a high significance while simultaneously having a low unselected significance, which are descriptive, since these rules more accurately distinguish between the selected and unselected sets of solutions.

Table 5 shows the FPM rules discovered for each scenario, indicating the tasks that distinguish the non-dominated solutions in the groups of scenarios more from the other solutions. For all groups, the parameter for the minimum required significance was kept constant at 90% when running the FPM procedure. For the scenarios with seven operators, a rule interaction was found with a significance of 100% and an unselected significance of 10.13%, meaning that all non-dominated solutions in the scenarios support the rules, while only 10.13% of the solutions in the unselected set support the rules. The rule interaction with the highest ratio between significance and unselected significance was found in the scenarios with nine operators, perhaps indicating that the non-dominated solutions in these scenarios are easier to distinguish from the rest. The scenarios with the lowest ratio are where the proportion is 30/70, perhaps indicating that the non-dominated solutions in these scenarios are more difficult to distinguish.


**Table 5.** Rule interactions found by using FPM for each of the scenarios from the initial optimization.

### *5.3. Offline Knowledge-Driven Optimization*

The rules for the different groups were applied to ten new scenarios, using the proportions of 40/60 and 60/40 between parts A and B for 7, 8, and 9 operators, and using 6 and 10 operators with the proportions of 30/70 and 70/30. We compare standard optimization runs for the new scenarios versus runs using the offline KDO approach of applying the rules presented in Table 5 as constraints in the decision space. Due to the high computational cost involved in the evaluation of each solution, each scenario was given an evaluation budget of 2500 solutions (50 generations with a population size of 50).

We compare the rate of convergence of the standard MOO approach and the offline KDO approach in each separate scenario by plotting the Hypervolume (HV) [39] contribution at each generation. Figure 8 shows the convergence plots for all scenarios. We also consider the Area Under the Curve (AUC) of the convergence plots as a quantitative score for the convergence rate. The AUC scores for all scenarios are shown in Table 6.

**Figure 8.** Convergence plots of the scenarios with 7 operators (**top left**), 8 operators (**top center**), 9 operators (**top right**), 6 operators (**lower left**), and 10 operators (**lower right**).

The results show that the offline KDO approach leads to faster convergence in all scenarios with operators equal to 7, 8, and 9 and the new proportions of 40/60 and 60/40 compared with the standard approach, and most of the scenarios using the proportions of 30/70 and 70/30 and the new numbers of operators of 6 and 10. In fact, only in the scenario with 10 operators and a proportion of 30/70 did the offline KDO approach not lead to faster convergence. However, the convergence plots for six operators are very similar for both approaches. This indicates that the offline KDO approach leads to an improved convergence rate for the current RMS MOOP when considering new proportions for the initial assignment of 7, 8, or 9 operators, but may be slightly less fruitful for scenarios with new numbers of operators and the original proportions.


**Table 6.** Area Under the Curve (AUC) of the convergence plots shown in Figure 8 for the different cases of Number of Operators (NO), proportion, and optimization approach.

Bold values indicate higher AUC scores.

### **6. Discussion**

The initial results provided solutions found for six different scenarios from the same real-world RMS MOOP involving task allocation. The approach presented in this paper demonstrates the use of a knowledge discovery method to generate decision rules which are applied in future, different scenarios to help the optimization algorithm reach faster convergence on non-dominated solutions. We grouped scenarios where the numbers of operators were the same and proportions were different, in order to find if there is generalized knowledge that can be applied across scenarios with the same number of operators as well as the same for scenarios with the same proportions and different numbers of operators.

Although the initial optimization runs had a very high evaluation budget, they did not produce many non-dominated solutions in each scenario. In the scenario with nine operators and a proportion of 70/30, only a single non-dominated solution was found. This means that the optimization did not find a diverse set of solutions for this scenario, which might in turn means that the knowledge generated is not general enough. However, despite this, as shown in Figure 8 and Table 6, the offline KDO approach did result in faster convergence compared to the standard approach for both scenarios with nine operators.

To generate knowledge about the different scenarios, we used Flexible Pattern Mining (FPM) to find decision rules. However, the number of aspects in terms of decision variables considered by this study increases the complexity of the SMO and its knowledge discovery post-optimal analysis exponentially. For this reason, the number of rule interactions considered in each scenario was limited to four. Nonetheless, the rules extracted reveal knowledge regarding which tasks are more important and therefore need to be prioritized for finding competitive solutions with respect to different criteria. However, finding more complex rule interactions could potentially lead to more precise knowledge which might be of further benefit.

FPM is expected to identify the rules that are the most interesting to the decision-maker. Considering the rules discovered by FPM and shown in Table 5, we can see that, out of all groups of scenarios, some tasks are repeated in the rule interactions, namely *A*14, *A*17, *B*4, *B*<sup>7</sup> and *B*23. Indicating that these tasks have higher importance for more general scenarios, however, since no rules are common in all scenario groups, likely no rule describes a completely general scenario for the considered RMS MOOP. Only half of these tasks (*A*17, *B*4, *B*7) have the same rule in the different scenario groups.

In the presented offline KDO approach, we applied the discovered rule as hard constraints in the decision space by limiting the values that the corresponding variables could take on. Although this approach did result in faster convergence in most cases, it does not guarantee that the solutions found are Pareto-optimal since it limits the search space. Secondly, the significance of the rule interactions used might also impact the quality of the final non-dominated solutions found. This point is driven further by the possibility that the solutions found in the initial optimization runs did not convergence close to the true Pareto-optimal front. However, in the case of SMO where each evaluation can take a very long time, decision-makers are more interested in finding *good enough* solutions fast rather than finding the true Pareto-optimal solutions.

We used knowledge found from six initial scenarios to drive the search for faster convergence in 10 new scenarios. The results show a bigger increase in the convergence rate in the scenarios with the initial numbers of operators and different proportions than the scenarios with the initial proportions and different numbers of operators, when using the offline KDO approach. This indicates that, for the considered MOOP, more general knowledge may be derived for the scenarios grouped by considering the different numbers of operators. The rule interactions also confirm this for the scenarios grouped by the proportion of 30/70, where the unselected significance is high compared to the rest of the scenario groups. This implies that it was more difficult to find rules that distinguish this group. One possible explanation for why the offline KDO approach did not lead to more of an improvement in a convergence rate for these scenarios is that no simple rule interaction is able to capture the distinguishing features regarding the proportions in the initial results. Tasks that are more important for changes in the proportion may be overshadowed by tasks more important for differences in the number of operators.

In this paper, we only considered the variables related for task allocation in the knowledge discovery and offline KDO approach. However, it would also be interesting to investigate the possible convergence rate improvement by generating knowledge also about the operators' assignments to WSs and the capacities of the inter-station buffers.

### **7. Conclusions**

In this paper, we propose the use of an offline KDO approach for increased convergence rates in a real-world reconfigurable manufacturing system simulation-based multi-objective optimization problem. We first showcase an offline KDO approach for populating nondominated solutions in the real-world inspired test problem RE3-5-4, by dividing found solutions into different clusters and finding specific knowledge for each cluster. This knowledge was then used to constrain the decision space to guide the optimization to converge on more non-dominated solutions. This approach was also shown to outperform a crude approach of constraining the objective space. We use a similar offline KDO approach on the real-world RMS problem.

RMSs are considered a key enabler for manufacturing systems to produce the required capacity and volume when needed. However, prior research in real-scale industrial applications is sporadic and seemingly ignores the importance of post-optimal analysis on the combined decision-objective space for supporting decision-making about the requirements of the future system. The use of offline KDO on SMO data sets of RMS is a novel area that can support the RMS research community, and accordingly, this paper illustrates an example of how it can be achieved.

In this paper, we considered knowledge discovery through the FPM procedure to generate if-then decision rules about the decision variables in relation to selections made in the objective space of the solutions. We considered variables related to task assignment in workstations. The results show how the offline KDO approach was able to lead the optimization to faster convergence in the majority of tested scenarios of new proportions and numbers of operators; however, for the considered MOOP, the offline KDO approach leads to a greater improvement in scenarios based on new proportions.

In additional to offline KDO, rules discovered through the FPM procedure can also be used to inform the decision-maker about various aspects about the MOOP and the solutions. Actionable insights from a post-optimal analysis using FPM for knowledge discovery may lead to improvements in the MOOP formulation and be a tool in decision-making.

For the future work, we would like to further investigate how the qualities of the generated rules correlate with the convergence when using offline KDO on more real-world applications. In this study, only knowledge in the form of FPM-rules was considered for offline KDO. Future work should also be focused on finding other appropriate knowledge representations for RMS applications.

**Author Contributions:** Conceptualization, H.S., C.A.B.-D., S.B. and A.H.C.N.; methodology, H.S. and C.A.B.-D.; software, H.S., C.A.B.-D. and A.N.; validation, H.S. and C.A.B.-D.; formal analysis, H.S.; investigation, H.S. and C.A.B.-D.; data curation, H.S. and C.A.B.-D.; writing—original draft preparation, H.S. and C.A.B.-D.; writing—review and editing, H.S., C.A.B.-D., S.B., A.H.C.N. and A.N.; visualization, H.S.; supervision, S.B. and A.H.C.N.; project administration, H.S. and C.A.B.- D.; funding acquisition, A.H.C.N. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was funded by the Knowledge Foundation (KKS), Sweden, through the KKS Profile Virtual Factories with Knowledge-Driven Optimization, VF-KDO, Grant No. 2018-0011.

**Data Availability Statement:** All reported data will be made available upon acceptance for publication.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

### **References**


### *Article* **An Experimental Study of Grouping Mutation Operators for the Unrelated Parallel-Machine Scheduling Problem**

**Octavio Ramos-Figueroa \*, Marcela Quiroz-Castellanos, Efrén Mezura-Montes and Nicandro Cruz-Ramírez**

Artificial Intelligence Research Institute, Universidad Veracruzana, Campus Sur, Calle Paseo Lote II, Sección Segunda 112, Nuevo Xalapa, Veracruz 91097, Mexico

**\*** Correspondence: oivatco.rafo@gmail.com

**Abstract:** The Grouping Genetic Algorithm (GGA) is an extension to the standard Genetic Algorithm that uses a group-based representation scheme and variation operators that work at the group-level. This metaheuristic is one of the most used to solve combinatorial optimization grouping problems. Its optimization process consists of different components, although the crossover and mutation operators are the most recurrent. This article aims to highlight the impact that a well-designed operator can have on the final performance of a GGA. We present a comparative experimental study of different mutation operators for a GGA designed to solve the Parallel-Machine scheduling problem with unrelated machines and makespan minimization, which comprises scheduling a collection of jobs in a set of machines. The proposed approach is focused on identifying the strategies involved in the mutation operations and adapting them to the characteristics of the studied problem. As a result of this experimental study, knowledge of the problem-domain was gained and used to design a new mutation operator called 2-Items Reinsertion. Experimental results indicate that the state-of-the-art GGA performance considerably improves by replacing the original mutation operator with the new one, achieving better results, with an improvement rate of 52%.

**Keywords:** grouping genetic algorithm; grouping mutation operator; grouping problem; unrelated parallel-machine scheduling

### **1. Introduction**

Over the last decades, the interest of the scientific community in solving Combinatorial Optimization Problems (COPs) has grown considerably since these types of problems emerge in many practical issues in industry, logistics, and engineering. In general, the optimization of a COP comprises the search of the suitable values for a set of discrete variables, so that the objective function is optimized, satisfying the given conditions and constraints. Thus, the solution of this type of problems can involve a feasible disposition, grouping, order, or selection of discrete objects that typically are finite in number [1]. It is well-known that many COPs have high complexity, and in the worst-case scenario, there is no efficient algorithm that solves all their possible cases optimally. Such problems belong to the NP-hard class [2]. In this order of ideas, this work focuses on grouping problems, a special type of COPs that in general consist of looking for an efficient arrangement of a set of elements among a collection of groups [1].

Parallel-Machine Scheduling (PMS) is a classical NP-hard grouping problem, consisting of looking for the most efficient sequential scheduling of a set of *n* jobs *N* = {*j*1,..., *jn*} among a collection of *m* parallel-machines *M* = {*i*1,..., *im*}, in such a way that each machine *i* can process only one job *j* at a time, and each job *j* must be processed by a single machine *i* [3].

The PMS variants can consider different parameters in the problem definition, such as resource and scheduling environments, job characteristics, and optimization criteria, among others. The most general classification of PMS problems is according to the machine

**Citation:** Ramos-Figueroa, O.; Quiroz-Castellanos, M.; Mezura-Montes, E.; Cruz-Ramírez, N. An Experimental Study of Grouping Mutation Operators for the Unrelated Parallel-Machine Scheduling Problem. *Math. Comput. Appl.* **2023**, *28*, 6. https://doi.org/10.3390/ mca28010006

Academic Editors: Carlos Coello, Erik Goodman, Kaisa Miettinen, Dhish Saxena, Oliver Schütze and Lothar Thiele

Received: 7 December 2022 Revised: 26 December 2022 Accepted: 29 December 2022 Published: 5 January 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

environment. In this sense, this work focuses on a variant that belongs to the class with unrelated machines, i.e., each machine can require a different time to process each job, and there is not a behavior pattern with respect to the speed of the machines with a machine always being the fastest or the slowest one (Unrelated Parallel-Machine Scheduling, UPMS). This problem family has received much recognition due to its numerous real-world applications [4–6]. Although a large number of mathematical models have been proposed, the exact approaches can solve only small instances in a reasonable time [7]. Given the complexity of several UPMS variants, most approaches are metaheuristic algorithms, such as local searches, swarm intelligence, and evolutionary algorithms. The state of the art contains local searches such as the Hill Climbing [8], the Iterated Greedy Algorithm [9], the Variable Neighborhood Descent [10], and the GRASP Algorithm [11]. In the same spirit, the literature includes several swarm intelligence algorithms, such as the Worm Optimization Algorithm [12], the Firefly Algorithm [13], the Artificial Bee Colony [14], and the Fruit Fly Optimization Algorithm [15]. Additionally, we identified several evolutionary algorithms such as the Genetic Algorithm [16], the Genetic Programming [17], and the Imperialist Competitive Algorithm with memory [18]. Finally, the specialized literature includes some memetic algorithms [19,20]. The literature review reveals that there are a wide variety of UPMS problems, each with particular characteristics and challenges. Given the increasing appearance of these problems, there exists a trend to explore the algorithmic behavior of different metaheuristic approaches that can work well or badly according to the properties of the variant of the problem to solve. One of the main challenges in the development of high-performance algorithms for UPMS problems is the design of efficient strategies that work together with the features of the problem variant to find high-quality solutions.

This work addresses the UPMS variant known as the *R*||*Cmax* problem, where the machines {*i*1,..., *im*} are unrelated, jobs {*j*1,..., *jn*} have no-preemptions, and the objective of interest is the reduction of the maximum completion time *Cmax*, i.e., the processing time *Ci* required by the machine *i* that finishes at the end.

It is well-known that the problem *R*||*Cmax* belongs to the class NP-hard [2]. Hence, over the past forty years, different approaches have been studied to try to solve it efficiently. The specialized literature includes deterministic methods [21,22], two-phase algorithms (or rounding methods) [23,24], and branch and bound algorithms [3,25]. The literature also includes distinct metaheuristic algorithms for *R*||*Cmax*, covering proposals based on local searches [3,26], the swarm intelligence algorithm Particle Swarm Optimization (PSO) [1], the Genetic Algorithm (GA) [27], and some hybrid approaches [3]. According to the scope of this review, the approaches based on local searches have shown the best performance on solving the problem *R*||*Cmax*. The state of the art highlights the results reached by the Iterated Greedy Local Search (NVST-IG+) proposed by Fanjul-Peyro and Ruiz in 2009, considered one of the best solution methods designed for the problem of interest so far. The success key of the NVST-IG+ performance is the incorporation of some techniques to control the way in which the jobs and machines are selected and manipulated during the construction of the neighborhoods [26].

In [1], we presented one of the most recent related works; the experimental results suggested that a GA with a group-based representation GGA has a better performance than a GA with an extended permutation solution encoding and a PSO with a machine-based representation scheme for the 1400 test instances studied. Such GGA was an adaptation of the GGA-CGT designed by Quiroz-Castellanos et al. for the Bin Packing Problem [28]. According to Quiroz-Castellanos et al., the performance of the GGA-CGT is related mainly to the mutation operator, which alone is capable of finding quality solutions. The mutation is one of the most used genetic operators in GGAs. Commonly, mutation operators promote the exploration of the search space by slightly altering the solution genetic material. This behavior is useful for a GGA mainly when it is converging to a local optimum since it provides the capacity to redirect the search to other areas. Section 2.5 includes an experimental study with different parameter configurations that allows observing how the performance of the GGA proposed in [1] is mainly related to the crossover operator, while the mutation operator has a low impact. The above motivates this work that aims to study the performance of different grouping mutation operators to identify the strategies that they use and that positively impact their performance, to employ them in the design of a new operator, and to incorporate that operator into the GGA in order to improve its performance when solving *R*||*Cmax*.

This paper continues as follows. Section 2 describes the components and the problemdomain heuristics of the GGA for *R*||*Cmax*. Section 3 reviews the state-of-the-art grouping mutation operators. Section 4 contains the experimental design proposed to analyze the impact of different strategies in the performance of grouping mutation operators. Section 5 compares the GGA performance with the new and the old mutation operators to analyze the improvement rate. Finally, Section 6 summarizes the conclusions and future paths of research.

### **2. Grouping Genetic Algorithm for** *R||Cmax*

The state of the art suggests that the GGA is one of the most used metaheuristics to solve grouping problems. Such popularity is related to its promising results and its flexibility to adopt new ideas to handle the constraints and conditions of the problem to be solved [1,29,30].

The GGA is an extension to the standard GA; therefore, it has a similar procedure. The GGA starts with the generation of the initial population, generally in a random way. Next, selection strategies and variation operators, mainly crossover and mutation, are used iteratively so as to find better solutions. Each iteration represents a generation that starts utilizing a selection strategy to pick some individuals of the population based on their fitness values; then, the genetic material of the selected individuals is recombined with the crossover operator to generate offspring. Subsequently, the offspring are added to the population using a replacement strategy. Finally, some individuals, chosen with a selection strategy, are slightly modified with the mutation operator. In this way, the GGA iterates performing the before-mentioned procedure until some stopping criterion (e.g., the maximum number of generations, the maximum search time, convergence of solutions, or finding an optimal solution) is met.

One of the main features of the GGA is the group-based scheme that it uses to encode and manage solutions in the search space. According to Falkenauer, this is a more natural way of representing solutions to grouping problems. Moreover, it helps to reduce the search space since it produces fewer isomorphic solutions than a traditional representation scheme [31]. In this encoding, each gene represents a group that contains the collection of elements that correspond to it. Therefore, the length of a solution is equal to the number of groups that it includes.

Another important aspect to consider when developing a GGA is the design of variation operators such as crossover and mutation since they must work at the group level. With this feature, operators can perform procedures in a more controlled way, determining which groups and elements vary according to the constraints and objectives of the problem to solve. The crossover operator uses two or more solutions of the current population to recombine their genetic material, creating offspring with new characteristics. This operator is used to give GGA the ability to converge on the most promising areas identified during the search. One of the advantages of crossover operators for the group-based encoding is that they can use the quality of the groups to determine how parents transmit the genetic material to their offspring to perform a more controlled search. On the other hand, the mutation operator provides GGA the ability to explore new areas of the search space, producing small modifications to the genetic material of some solutions. This procedure is helpful for a GGA, mainly to address highly constrained grouping problems, where there are large possibilities of converging to local optimums. These slight alterations performed by the mutation operator can generate solutions in other regions of the search space, avoiding premature convergence [32].

The next sections describe the elements of the state-of-the-art GGA for *R*||*Cmax*, the object of study in this work, including the population initialization strategy, the variation operators, selection and replacement strategies, and the problem-domain heuristics. This algorithm is an adaptation of the Grouping Genetic Algorithm with Controlled Gene Transmission (GGA-CGT) introduced by Quiroz-Castellanos et al. to solve the Bin Packing problem [28]. The details of the heuristic used to generate the initial population, as well as the mutation and crossover operators appear in the work of Ramos-Figueroa et al. [1], while the remaining mechanisms and operators, as well as the parameter settings can be consulted in the work of Quiroz-Castellanos et al. [28].

### *2.1. Genetic Encoding, Fitness Function, and Initial Population*

The GGA uses the group-based representation scheme to encode and manipulate solutions, where each machine *i* is a gene (or group) *Gi* that will include a set of jobs. Therefore, all solutions have the same number of genes, equal to the number of machines *m*. The quality of each machine *i* is equal to the time it takes to process its assigned jobs, denoted as *Ci*. Thus, the quality of a solution *Cmax* is equal to the *Ci* value of the machine with the longest processing time. The initial population is generated in a random manner by running the well-known Min() heuristic on random permutations of the *n* jobs [33]. For each job *j*, Min() calculates the equation *Ci* = *Ci* + *pij* for all the machines, where *pij* indicates the time that the machine *i* needs to process the job *j*. In this way, Min() assigns the job *j* to the machine *i* that generates the lowest *Ci* value.

Figure 1 describes the procedure followed by the population initialization strategy. To give a comprehensive description, Figure 1a includes an example instance *I* represented as a matrix with *m* = 4 machines depicted by the columns and *n* = 10 jobs represented by the rows. Thus, the example starts from a permutation (Figure 1b) of the ten jobs, {*j*9, *j*5, *j*2, *j*6, *j*3, *j*8, *j*4, *j*7, *j*1, *j*10}, used to generate the partial solution, shown in Figure 1c. The construction of the partial solution can be calculated from the first nine jobs in the permutation, {*j*9, *j*5, *j*2, *j*6, *j*3, *j*8, *j*4, *j*7, *j*1,} and the instance *I* using the heuristic Min(). To exemplify how this heuristic Min() works, Figure 1d shows a complete solution, resulting from the assignment of the last job in the permuted list (i.e., *j*10) to the solution. Therefore, following the Min() procedure mentioned above, the processing time *Ci* of each machine plus the time that they require to process the job *j*<sup>10</sup> results in the following way: *C*<sup>1</sup> = 26 + 8, *C*<sup>2</sup> = 25 + 20, *C*<sup>3</sup> = 20 + 18, and *C*<sup>4</sup> = 10 + 28. Hence, Min() assigned the job *j*<sup>10</sup> to the machine *i*<sup>1</sup> since it generated the lowest *Ci* value. It is important to note that if two or more machines produce the same *Ci* value, this allocation heuristic assigns the job in turn to the machine *i* that appears first from *i*<sup>1</sup> to *im*. Finally, Figure 1d also shows the fitness value of the generated solution that is equal to the longest processing time *Ci*, in this case, the *C*<sup>1</sup> = 34, outlined in bold.

**Figure 1.** Population initialization strategy.

### *2.2. Adapted Gene-Level Crossover Operator*

The GGA uses the Adapted Gene-Level Crossover (AGLX) operator, a variant of the GLX operator proposed by Quiroz-Castellanos et al. [28] that produces two children solutions by using two parent solutions. Algorithm 1 presents the procedure of AGLX. We denote *S* = Sort(*S*) the solution derived from *S* by sorting its machines in increasing order concerning its *Ci* values (lines 1 and 2). Thus, AGLX first transmits the machines that process their jobs fastest and then the slowest ones (lines 3–6). In this way, the first child *C*<sup>1</sup> starts inheriting the fastest machine from the first parent *S*1, next the fastest machine from the second parent *S*2, then the second-fastest machine from the first parent *S*1, and so on (line 4). Similarly, the second child *C*<sup>2</sup> receives genes alternately from both parents but starting with the fastest machine from the second parent *S*<sup>2</sup> (line 5). We denote *C* = Inherit(*C*, *ia*, *ib*) the child solution *C* upgraded with the machines *ia* and *ib*, one for each parent. It is important to remark that before inheriting each machine, the Inherit() function verifies that it has not already been transmitted by the other parent to the child *C*. Otherwise, the machine is discarded. Likewise, before inheriting each job, this function validates that it has not already been transmitted. Otherwise, it is discarded to avoid infeasible solutions. It is important to note that in most of cases this procedure generates infeasible solutions, since some jobs can be missed during the crossover process. Therefore, it is necessary to re-insert the jobs to transform the solutions into feasible ones (lines 7 and 8). We denote *M J*[] = MissedJobs(*C*) the set of jobs missed during the genetic material transmission of a child *C*. Finally, the missed jobs *M J* are permuted and re-inserted with the Min() heuristic described above (lines 9–12). We denote *M J*[] =Permute(*M J*[]) the set of jobs derived from *M J*[] by permuting it with a uniform distribution and *C* = Min(*C*, *M J*[] ) the child solution obtained from the re-insertion of the jobs in *M J*[] to the solution *C*.

### **Algorithm 1** AGLX operator

**Input:** Two parent solutions *S*<sup>1</sup> and *S*2, and the number of machines *m*. **Output:** Two offspring solutions *C* <sup>1</sup> and *<sup>C</sup>* 2. 1: *S* <sup>1</sup> = Sort(*S*1); 2: *S* <sup>2</sup> = Sort(*S*2); 3: **for each** machine *i* in *S* <sup>1</sup> and *S* <sup>2</sup> **do** 4: *C*1=Inherit(*C*1, *S* <sup>1</sup>[*i*], *S* <sup>2</sup>[*i*]); 5: *C*2=Inherit(*C*2, *S* <sup>2</sup>[*i*], *S* <sup>1</sup>[*i*]); 6: **end for** 7: *M J*1[]=MissedJobs(*C*1); 8: *M J*2[]=MissedJobs(*C*2); 9: *M J*1[] =Permute(*M J*1[]); 10: *M J*2[] =Permute(*M J*2[]); 11: *C* <sup>1</sup>=Min(*C*1, *M J*1[] ); 12: *C* <sup>2</sup>=Min(*C*2, *M J*2[] ); 13: **end process.**

Figure 2 describes the process of the AGLX operator with an example that contains two parent solutions for the test instance of Figure 1a with four machines (groups). The ten jobs, from *j*<sup>1</sup> to *j*10, are distributed among the four machines, from *i*<sup>1</sup> to *i*4, and the time that each machine *i* requires to process its assigned jobs from *C*<sup>1</sup> to *C*<sup>4</sup> is stored in the vector *Ci*. Figure 2a depicts the transmission process. Therefore, it shows the two parents with their groups in increasing order, which indicates the gene transmission sequence, i.e., from best (Lowest *Ci*) to worst (Highest *Ci*). Figure 2b indicates the way the repeated genetic material is handled. Thus, it contains the two solutions produced during the transmission process, which only keep the machine *i* of the parent in which it appears first according to the gene transmission sequence. Furthermore, this figure includes the repeated jobs, highlighted in bold, that must be removed from the machine with the highest processing time *Ci*. Lastly, this figure shows a list with the jobs missed *M J* during the transmission process. Figure 2c contains the partial solution resulting from the transmission process without the repeating genetic material, as well as a permutation of the jobs in *M J*. Finally, Figure 2d shows the

complete solutions resulting from the assignment of the missed jobs with the heuristic Min(). The processing time *Ci* of each machine *i*, as well as the operations performed by the Min() heuristic to assign the missed jobs, can be calculated using the example instance presented in Figure 1a.


Given two parent solutions for the test instance of Figure 1a, the Adapted Gene-level crossover operator (AGLX) proposed by Ramos-Figueroa *et al*. [ ] works as follows: 1

### **Figure 2.** AGLX operator.

### *2.3. Download Mutation Operator*

The GGA includes the Download mutation operator that uses two phases to modify two genes in a solution. Algorithm 2 contains the procedure of the Download mutation operator. In the first stage, called download, the operator clusters the genes (machines) among two sets, *W* and *O* (line 1). We denote *W*, *O* = Cluster(*S*) the sets derived by grouping the machines in the solution *S*, in such a way that *W* includes the machines with a processing time *Ci* equal to the makespan *Cmax*, while *O* holds the ones with an assigned processing time *Ci* less than the makespan *Cmax*. Next, from each set (*W* and *O*), one machine (*w* and *o*) is randomly selected (lines 2 and 3). We denote *i*=Pick(*M*) the machine *i* randomly selected from the set of machines *M* with a uniform distribution. Subsequently, the jobs in the selected machines are released (line 4). We denote *S* , *R J*[]= Download(*S*, *w*, *o*) the solution derived by releasing the jobs of the machines *o* and *w*, which are placed in the set *R J*[]. Finally, the arrangement of the jobs in *R J*[] is modified with the permute() function mentioned above, giving rise to the set *R J* [] (line 5). Later, in the second stage, the released jobs are redistributed among the selected machines *w* and *o* with the heuristic Best() (lines 6–8). We denote *S* = Best(*S* , *j*, *w*, *o*) the solution obtained by applying the Best() heuristic. For each job *j*, this heuristic calculates the equations *Cw* = *Cw* + *pwj* and *Co* = *Co* + *poj*, where *Cw* and *Co* represent the assigned processing time of machines *w* and *o*, respectively, and *pwj* and *poj* the processing time required for machines *w* and *o* to process the job *j*. In this way, Best() assigns *j* to the machine that generates the lowest *Ci* value. It is important to highlight that the main difference between the reassignment heuristics Min() and Best() is that Min() re-inserts the jobs considering all the machines, while Best() re-inserts them by considering only the two selected machines *o* and *w*.


Figure 3 describes the mutation process of the Download operator with an example that contains an initial solution for the instance presented in Figure 1a with four genes (groups). The ten jobs, from *j*<sup>1</sup> to *j*10, are distributed among four groups, from *i*<sup>1</sup> to *i*4, and the time that each group *i* requires to process its assigned jobs from *C*<sup>1</sup> to *C*<sup>4</sup> is stored in the vector *Ci*. Figure 3a shows the result of clustering the machines with processing time *Ci* equal to the makespan *Cmax* in the set *W* = {*i*1} and the remaining machines in set *O* = {*i*2, *i*3, *i*4}. Figure 3b indicates the machines *w* = *i*<sup>1</sup> and *o* = *i*4, outlined in bold, randomly selected from the sets *W* and *O*, respectively. Figure 3c contains the solution with the selected machines to be altered, outlined in bold, downloaded by releasing their jobs and placing them in the box of released jobs *R J*. Finally, Figure 3d shows a permutation of the jobs in *R J* and the result of reinserting them with the allocation heuristic Best(). The calculation of the processing time *Ci* of each machine *i*, as well as the operations performed by the allocation heuristic Best() to assign the released jobs, can be calculated using the example instance *I* presented in Figure 1a. As this example shows, the quality of the mutated solution is better than that of the initial solution, demonstrating the effectiveness of the Download mutation operator.

Given the following potential solution for the test instance of Figure 1a:



The Download mutation operator proposed by Ramos-Figueroa *et al*. [ ] works as follows: 1

**Figure 3.** Download mutation operator.

### *2.4. Selection and Replacement Strategies*

The GGA employs an adaptation of the controlled reproduction technique proposed by Quiroz-Castellanos et al. [28], which uses an elitist approach together with two inverted rankings to give all the solutions a chance to contribute to the next generation but forcing the survival of the best solutions. The replacement strategy preserves the population diversity and the best solutions by replacing duplicated fitness individuals and the worst fitness solutions with new offspring. Algorithm 3 contains the procedure of the ranking strategy. First, this algorithm ranks the population (line 1). We denote *P* = Rank(*P*) the individuals arranged by sorting them from best to worst according to their fitness. Next, if there are solutions with repeated fitness, only one solution is kept in the ranked, and the others are placed at the end of the ordered list (line 2). We denote *P* = Rearrange(*P*) the population rank resulting from placing the similar solutions at the end of the ordered list. Subsequently, the solutions in *P* are distributed among the sets *G*, *R*, and *B* (line 3). We denote *G*, *R*, *B* = Distribution(*P* ) the sets obtained by placing the ranked solutions in the sets *G*, *R*, *B*. In this way, *G* includes the best *nc* solutions, where *nc* is a parameter to be configured that determines the number of individuals selected for the crossover process of each generation. On the other hand, the set *R* contains the solutions in the population *P* without the best *nc*/2 solutions. Finally, the set *<sup>B</sup>* holds the best <sup>|</sup>*B*<sup>|</sup> individuals, called elite solutions, that receive special treatment since they have the best characteristics of the population. Therefore, |*B*| is another parameter to be configured.


Given this solution hierarchical structure, *nc*/2 parent solutions are randomly taken from the set *G*, and the remaining *nc*/2 parents are randomly picked up from the solutions in the set *R*. In this way, each pair of parents is created with a parent selected from the set *G* and the other one from the set *R*. Hence, it is necessary to validate that parent pairs do not have the same solution since some solutions can be selected more than once. After applying the crossover operator to each pair of parents, the new individuals are incorporated into the ranked population *P* in the following way. Half of the generated children replace the parents selected from the set *R*, and the remaining offspring replace first the solutions with repeated fitness and then those with worse fitness, i.e., the solutions at the end of the ranked population *P* .

Once the replacement strategy is applied, the population is rearranged again with the same ranking strategy, described in Algorithm 3, to later select the best *nm* solutions for mutation, where *nm* is a parameter to be configured that determines the number of mutated solutions each generation. When applying the mutation operator, if a solution belongs to the elite group *B*, the solution is first cloned and later mutated. The clones can be entered into the population, replacing first the solutions with repeated fitness and then those with worse fitness.

### *2.5. Impact Analysis of Crossover and Mutation Rate on GGA*

In order to identify the impact of each variation operator (crossover and mutation) on the GGA performance, an experimental study was performed by using three different values for the number of individuals selected for the crossover process (*nc*) and the number of solutions to be mutated (*nm*): 20, 40, and 60. In this way, GGA was run with the 9 configurations (*Conf*) generated from all possible combinations of these three parameters: *Conf*1: *nc* = 20, *nm* = 20, *Conf*2: *nc* = 20, *nm* = 40, ... *Conf*9: *nc* = 60, *nm* = 60. Figure 4 shows a bar graph of the results obtained from this study, where each bar represents 1 of the 9 configurations grouped according to the number of mutated solutions (*nm*), and each pattern indicates the number of selected individuals for the crossover process (*nc*): squares = 20, waves = 40, and circles = 60. As Figure 4 indicates, the GGA performance tends to improve (lower error rate) as the number of individuals considered for the crossover and mutation processes increases, although the crossover operator shows a higher impact on its performance. This behavior is different from the one presented by the GGA-CGT, where the mutation operator has the greatest positive impact on the final performance of this algorithm. The results and conclusions obtained from this study motivated the review of the mutation operator, exploring different strategies to include those that contribute to the impact improvement of this operator on the GGA final performance on solving the *R*||*Cmax* problem.

**- -** *-* **-**

**Figure 4.** Impact analysis of the parameters: number of individuals selected for crossover *nc* and number of mutated solutions *nm* in the GGA final performance.

### **3. Grouping Mutation Operators**

The mutation is a genetic operator generally used to control population diversity during the GGA search process. The mutation operators for the GGA are called grouping mutation operators since they work at the group level. That is, they select *g* groups using some criterion (such as selecting the best, the worst, or random groups) to slightly modify them employing different operations. According to Ramos-Figueroa et al. [32], the specialized literature holds seven mutation operators designed for GGAs in addition to the Download operator. Three of them, the Swap, the Insertion, and the Item Elimination, perform small alterations in the solutions with operations directly applied to some items of the selected groups. In contrast, the remaining operators, called Elimination, Creation, Merge and Split, and Reordering, promote more severe disturbances in solutions since they perform operations involving all the items of the selected groups.

The seven mutation operators have been used to solve a wide variety of grouping problems with different conditions and constraints. Due to these differences, mutation operators must be adapted to the characteristics of the problem to be solved. As a result, grouping mutation operators for the *R*||*Cmax* problem can differ in the tactics that they use to select the jobs and machines involved in the mutation operations, the strategies employed to handle the jobs and the selected machines, and the problem-domain heuristics included. The following sections show the general procedure of four state-of-the-art grouping mutation operators: Swap, Insertion, Elimination, and Merge and Split. This study contemplates the best state-of-the-art mutation operators that apply for the *R*||*Cmax* problem, discarding the infeasible ones and those which have not shown outstanding performance. However, in [32] a more detailed description of the seven group-oriented mutation operators procedure can be found, as well as a compilation of other mutation

operators applied to different grouping problems and the parameter settings approach that they use. It is important to note that in addition to the Download operator, none of the four mutations described below have been used to solve the *R*||*Cmax* grouping problem. The above motivates this experimental study, whose main objective is to explore the performance of the most used mutation operators to solve *R*||*Cmax*.

### *3.1. The Swap Operator*

The Swap operator selects two groups to later pick *k* items from each selected group and exchange the items from one group to another. Due to its way of working, it can be adapted and used to solve grouping problems with different constraints and conditions. Thanks to this quality, the Swap operator has been used to solve classic problems such as Bin Packing [34] as well as new problems such as Maximally Diverse [35].

### *3.2. The Insertion Operator*

Similar to the Swap operator, the Insertion operator selects two groups to later pick *k* items from one selected group and insert them to the other group. This operator has been used to solve from classic problems such as Graph Coloring [36] to newer problems such as Group Stock Portfolio [37], covering problems with different constraints and conditions [38].

### *3.3. The Elimination Operator*

The Elimination operator chooses *g* groups to remove them, release their items, and re-insert them by applying problem-domain heuristics, for example, the heuristic Min() used by the state-of-the-art GGA for *R*||*Cmax*. According to the scope of the literature review, this is the most used mutation operator to solve grouping problems because it has shown promising results, mainly in classic problems such as Bin Packing [28], Cell Formation [39], Multiple Knapsack [40], and Timetabling [41].

### *3.4. The Merge and Split Operator*

The Merge and Split, also known as Division and Combination operator, works in two phases. In the first stage, it selects two groups and transforms them into a single one. Then, in the second stage, it picks a group to distribute its items between two distinct groups. Merge and Split has been used to solve grouping problems such as Cell Formation [42] and Multivariate Micro-aggregation [43].

### **4. Computational Experiments**

This section presents the experimental design proposed to analyze the way different elements involved in the mutation process can impact the performance of grouping mutation operators. The objective of this work is to design an efficient grouping mutation operator that includes the best features identified during the experimentation, to later incorporate it to the state-of-the-art GGA for *R*||*Cmax* to improve its performance [1]. The experimental design consists of four phases. The first stage covers the analysis of the state-of-the-art grouping mutation operators to determine which one has the best performance for *R*||*Cmax*. The second phase comprises an exploratory analysis to observe the influence of the numbers of machines and jobs involved in mutation operations. The third phase includes the assessment of different machine selection strategies, including biased, random, and mixed approaches. Finally, the fourth phase studies the contribution of distinct rearrangement heuristics based on insertion and swap operations. The main objective of these strategies is to reorganize some jobs of the solutions applying more complex and expensive processes. Although they involve a computational cost, they are of vital importance when the mutation operator alone is unable to leave a local optimum. The information collected is used to design an efficient grouping mutation operator for *R*||*Cmax*.

The performance assessment of each operator involves solving 1400 test instances introduced by Fanjul-Peyro in 2010, distributed among 7 sets [26]. The first 5 sets differ in the range employed to generate the *pij* values with a uniform distribution: *U*(1, 100), *U*(10, 100), *U*(100, 120), *U*(100, 200), and *U*(1000, 1100). From the remaining sets, one includes instances with correlated machines (*MacCorr*) and the other instances with correlated jobs (*JobsCorr*). These instances can consider 100, 200, 500, or 1000 jobs and 10, 20, 30, 40, or 50 machines. Each set contains 200 instances, 10 for each combination of the number of machines *m* and jobs *n*.

To analyze the performance of each operator, we generate a population of 100 individuals with the heuristic Min() to later mutate them for 500 generations. For a fair comparison, we use the same seed for each operator. Finally, we use the average Relative Percentage Deviation (*RPD*) to compare the operators performance. Given an instance *i*, the *RPD* is defined as in (1), where *Cmax*(*i*) depicts the *Cmax* value found by the operator, and *C*<sup>∗</sup> *max*(*i*) represents the best *Cmax* found using two hours of the commercial solver CPLEX. Thus, *RPD* indicates the deviation from the evaluated grouping mutation operators to CPLEX.

$$RPD = \frac{\mathbb{C}\_{\text{max}}(i) - \mathbb{C}\_{\text{max}}^\*(i)}{\mathbb{C}\_{\text{max}}^\*(i)} \tag{1}$$

### *4.1. State-of-the-Art Mutation Operators*

This experiment aims to study the performance of the state-of-the-art grouping mutation operators in the problem *R*||*Cmax*. Recalling from Section 3, this study comprises four operators: Swap, Insertion, Elimination, and Merge and Split, since this work focuses on the best state-of-the-art mutation operators that apply for the *R*||*Cmax* problem. However, the specialized literature contains other mutation operators applied to various grouping problems with different constraints and conditions [32]. Next, the procedures of the four mutation operators adapted to work with the constraints and conditions of the problem *R*||*Cmax* are presented. This information is reinforced by Algorithms 4–7 and Figure 5 that includes an example for each operator.

Algorithm 4 contains the procedure of the Swap mutation operator. First, it selects two machines *iA* and *iB* (lines 1 and 2). We denote *i*= PickMachine(*S*) the machine *i* randomly selected from the solution *S* with a uniform distribution. Later, this operator selects one job for each chosen machine (lines 3 and 4). We denote *j*= PickJob(*i*) the job *j* randomly selected from the machine *i* with a uniform distribution. Finally, the operator interchanges the two picked up jobs (line 5). We denote *S* = Interchange(*S*, *iA*, *iB*, *jA*, *jB*) the solution derived by interchanging the jobs *jA* and *jB* in machines *iA* and *iB*. Figure 5a explains the mutation process of the Swap operator adapted to solve the problem *R*||*Cmax* with an example in which the jobs *jA* = *j*<sup>1</sup> and *jB* = *j*7, selected from machines *iA* = *i*<sup>1</sup> and *iB* = *i*4, respectively, are exchanged. In this way, in the initial individual (*Solution*), the machines *iA* and *iB* outlined in bold and the jobs in bold *jA* and *jB* depict the machines and the jobs selected, respectively; and the final individual (*Mutation*) shows the jobs in their new position.







Similarly, Algorithm 5 includes the procedure of the Insertion mutation operator. First, it uses the before-mentioned PickMachine() function to select two machines *iA* and *iB* (lines 1 and 2). Next, it employs the PickJob() function described above to select a job *jA* from the first selected machine *iA* (line 3). Finally, this operator inserts the job *jA* into the second selected machine *iB* (line 4). We denote *S* = Insertion(*S*, *iA*, *iB*, *jA*) the solution derived by inserting the job *jA* from machine *iA* into machine *iB*. Figure 5b describes the mutation process of the Insertion operator implemented to solve the problem *R*||*Cmax* with an example, where the job *jA* = *j*7, selected from machine *iA* = *i*4, is inserted into machine *iB* = *i*1. For a clear explanation, the example outlines in bold the selected machines *iA* and *iB* and highlights the inserted item in bold *jA* in the initial individual (*Solution*). Thus, the final individual (*Mutation*) shows the picked job *jA* in its new position.


On the other hand, Algorithm 6 describes the procedure of the Elimination operator. Like the Swap and the Insertion operators, the Elimination process starts by picking up two machines *iA* and *iB* by using the PickMachine() function (lines 1 and 2). Next, it places all the jobs of both machines in the set of released jobs *R J*, employing the before-mentioned Download() function (line 3). It is important to remark that this process is performed instead of the elimination, since the machines cannot be removed due to the characteristics of the problem. Subsequently, the location of the jobs in *R J* is modified by using the Permute() function (line 4). Finally, the permuted jobs in *R J* [] are re-inserted with the Min() heuristic (lines 5–7). Figure 5c explains the mutation process of the Elimination operator adapted to solve the problem *R*||*Cmax* with an example, where the machines outlined in bold *iA* = *i*<sup>3</sup> and *iB* = *i*<sup>4</sup> depict the machines selected to remove their jobs *j*3, *j*5, *j*6, *j*7, and *j*<sup>8</sup> highlighted in bold from the initial individual (*Solution*). The *Incomplete Solution* shows the chromosome without the released items placed in the box *R J*. Lastly, the box *Permutation* represents the jobs in *R J* reordered randomly, and the final solution *Mutation* depicts the chromosome generated by assigning the jobs in the box *Permutation* by using the problem-domain heuristic Min().


Lastly, Algorithm 7 contains the procedure of the Merge and Split operator. Similar to the before-described mutation operators, Merge and Split begins by choosing two machines *iA* and *iB* in a random way with the PickMachine() function (lines 1 and 2). Later, it locates the jobs of the selected machines in the set of released jobs *R J* with the Download() function (line 3). As can be seen, it is a similar case to the elimination since the machines cannot be joined or split. Hence, the operator uses the Permute() function to modify the location of the jobs in *R J* (line 4), and later, it simulates the splitting part by re-inserting the permuted jobs in *R J* [] among the two selected machines *iA* and *iB* using the abovedescribed heuristic Best() (lines 5–7). Figure 5d includes the mutation process of the Merge and Split operator with an example that contains an initial individual (*Solution*) with the two selected machines *iA* and *iB* outlined in bold and the released jobs *j*3, *j*5, *j*6, *j*7, and *j*<sup>8</sup> highlighted in bold. Moreover, the example contains the *Incomplete Solution* without the jobs in *iA* ∪ *iB* placed in a box with the same name (*iA* ∪ *iB*). Lastly, this figure includes the final solution *Mutation* that depicts the chromosome resulting from the allocation of the jobs in *Permutation* (a box with the jobs in *iA* ∪ *iB* reordered randomly) by applying the problem-domain heuristic Best().

Table 1 shows the results obtained from the computational experiments. For a comprehensive study, the performance of the operators was analyzed considering the number of jobs *n*, the number of machines *m*, the distribution of the processing times *pij*, and the 1400 instances together. In this way, the first column indicates the criterion used to study the performance of the operators, the second one contains the classes covered for each grouping criterion, and the following columns represent the average *RPD* (Relative Percentage Deviation) achieved by each operator: Swap, Insertion, Merge and Split, and Elimination, respectively. Finally, this table highlights in bold the results obtained by the best operator for each group of instances. From Table 1, it can be observed that the Elimination operator excelled in all the criteria used to distribute the instances. It is important to note that the four operators had a similar performance since their average *RPD* differs only by hundredths.

Moreover, it is remarkable that the Download mutation operator procedure of the studied GGA is quite similar to the state-of-the-art mutation operator Merge and Split, since although the operations merge and split cannot be applied to groups explicitly due to the characteristics and conditions of the problem, they can be emulated by considering the jobs. In this way, the first stage of the Download mutation operator represents the combination of the groups, where the jobs of the two selected machines are released and placed in a single set. Similarly, the second stage depicts the split operation, where the jobs are redistributed among the selected machines. Finally, it is also important to mention that the only difference between the Merge and Split operator and the Elimination operator (the two operators with the best performance) is the job reassignment strategy they work with, since Merge and Split re-inserts the jobs only on the two selected machines, while the Elimination operator tries to re-insert the jobs on all the machines.

The following stages of this experimental study contain the analysis of different aspects involved in the mutation operator with the reassignment heuristic that considers all the machines, such as the number of machines to handle, the number of jobs to remove, the machine selection strategy, and the rearrangement heuristics.


**Table 1.** Comparison of Swap, Insertion, Merge and Split, and Elimination mutation operators using RPD.

### *4.2. Handled Machines and Removed Jobs*

After observing that the four operators of the state of the art showed quite similar performance and that the Elimination operator slightly excelled, the second phase of the experimental study focused on analyzing how the number of handled machines and removed jobs impact the performance of the mutation operator. To analyze this phenomenon, we explore thirty-five variants of the operator. This study consists of evaluating the suitability of removing 1, 2, 3, 4, 6, 8, and 10 jobs from 2, 4, 6, 8, and 10 different machines, where each combination of removed jobs and managed machines results in an operator. For this study, we designed an enhanced version of the Elimination operator, called Elimination operator-v2. Algorithm 8 contains the procedure of this version that is able to adapt itself to the number of machines *f* and jobs *h* to handle. Therefore, this version receives the solution and the number of machines and jobs to consider. Thus, it starts by using a cycle to select the machines with the PickMachine() function (lines 1 and 2). Furthermore, for each machine, it employs another cycle to choose the *h* jobs with the PickJob() function and place them in the set of released jobs *R J* (lines 3–5). It is important to highlight that if a machine does not have enough jobs *h*, all of them are released and placed in *R J*. Finally, the functions Permute() and Min() are used to modify the location of the jobs and re-insert them, respectively (lines 7–10).

For a fair comparison, all the operators use randomness to select the machines and the jobs that intervene in their mutation process. Thus, each operator releases *k* jobs from *g* machines and then re-inserts them with the heuristic Min(). As in the first phase, for each operator, 100 individuals were generated and mutated during 500 generations using the same seed.

Table 2 shows the experimental results of the thirty-five variants of the mutation operator. The first column indicates the number of machines that each operator manages, the second one represents the number of jobs removed from each of the handled machines, and the last column contains the average *RPD* of each operator for the 1400 test instances, highlighting in bold the result obtained by the best variant of the thirty-five mutation operators.

### **Algorithm 8** Elimination operator-v2

**Input:** A solution *S*, number of machines *f* and jobs *h* handle.

**Output:** A mutated solution *S* . 1: **for each** machine *i* from 1 to *f* **do** 2: *i*= PickMachine(*S*); 3: **for each** job *j* from 1 to *h* **do** 4: *R J*[] = PickJob(*i*); 5: **end for** 6: **end for** 7: *R J* []= Permute(*R J*[]); 8: **for all** job *<sup>j</sup>* <sup>∈</sup> *R J* [] **do**


**Table 2.** Comparison of handled machines and removed jobs using *RPD*.


It appears from Table 2 that the operators that release only one job from each machine perform better than those that release more and that the best option is to consider only two machines. Moreover, to graphically observe the behavior of the 35 designed operators, the 1400 instances were grouped into 20 groups concerning each combination of jobs (100, 200, 500, and 1000) and machines (10, 20, 30, 40, and 50) to calculate the average *RPD* of each group and analyze the impact of each operator in more detail, e.g., the group where *m* = 10 and *n* = 100, the group where *m* = 10 and *n* = 200, and so on. Figures 6 and 7 contain

two representative graphs of the behavior presented by the thirty-five mutation operator variants, which allow observing the impact of the two evaluated features, i.e., the number of machines to be handled and the number of jobs to be removed from each machine.

**Figure 6.** Behavior of the mutation operators grouped by the number of handled machines.

**Figure 7.** Behavior of the mutation operators grouped by the number of removed jobs from each handled machine.

Figure 6 allows observing the behavior of the operator's performance grouped according to the number of machines that they handle for all instances with 200 jobs and 30 machines. The *x*-axis of this figure indicates the number of machines handled, and the *y*-axis contains the average *RPD* reached for each operator. On the other hand, Figure 7 groups the operators according to the number of jobs removed from each machine in instances with 500 jobs and 20 machines. The *x*-axis contains the operators grouped according to the number of jobs that they remove, and the *y*-axis contains the average *RPD* reached for each operator. In this way, Figure 6 allows graphically observing that the performance of the operators improves as the number of handled machines decreases, while Figure 7

shows that the operators removing fewer jobs have better performance. In this fashion, the analysis suggests that the operators handling a fewer number of machines and releasing fewer jobs are more suitable.

### *4.3. Machines Selection Strategy*

Once identifying that the variant that considers two machines and releasing one job from each machine has the best performance, in this stage, we evaluate the performance of four machine selection strategies, Random, Worst, Worst Best, and Worst Random, to analyze how they affect the performances of the mutation operators. Given a solution to be mutated, these strategies work as follows. Random chooses the two machines randomly. Worst selects the two machines with the worst *Ci* values (i.e., the machines with the highest loads). Worst Best picks the worst and the best machine (i.e., the machines with the highest and the lowest loads). If there are several machines with the lowest or highest load, first, they are identified to later use a uniform distribution to select one of them randomly. Finally, Worst Random divides the machines into two groups (*W* and *O*) in such a way that *W* contains the machines with *Ci* = *Cmax* and *O* the remaining machines. Next, it randomly selects the machines *w* and *o* from sets *W* and *O*, respectively. It is important to note that for each machine selection strategy, the two released jobs are selected randomly using a uniform distribution and later re-inserted employing the heuristic Min().

Table 3 shows the experimental results of the operators with the four machine selection strategies. As can be seen, this table has the same structure as Table 1. That is, it clusters the instances according to the number of jobs *n*, the number of machines *m*, the distribution of the processing times *pij* of the instances, and the 1400 test instances together. Therefore, the first column indicates the criterion used to study the performance of the operators, the second one contains the classes covered for each grouping criterion, and the following columns represent the average *RPD* (Relative Percentage Deviation) achieved by the operators with each machine selection strategy: Random, Worst, Worst Best, and Worst Random. Finally, this table highlights in bold the results obtained by the best mutation operator for each group of instances. The experimental results in Table 3 suggest that the most suitable machine selection strategy is Worst Random, with an average *RPD* of 0.0674 since the other approaches (Random, Worst, and Worst Best) reached higher *RPD* averages of 0.0913, 0.0875, and 0.0912, respectively.

**Table 3.** Comparison of mutation operators with selection strategies Random, Worst, Worst Best, and Worst Random using *RPD*.


### *4.4. Rearrangement Heuristics*

After identifying the machine selection strategy that provides the best performance to the mutation operator, we noted that there are high possibilities that the genetic material of many solutions does not undergo any alteration during the mutation process. Such a phenomenon can occur because it is likely that the two released jobs can be re-inserted in the same machine to which they belonged. In order to analyze the above, we evaluated the success rate (i.e., the number of the alterations in the genetic material divided by the mutation attempts) of the mutation operator with the best properties identified in the two previous stages. The experimental results revealed that only about the 42% of the mutation attempts are successful.

The above motivates this stage of the experimental study that consists of evaluating the utility of incorporating two rearrangement heuristics, called Insertion and Assemble, to increase the operator's success rate and improve its performance. These heuristics are only used if, after releasing and reinserting the jobs, the genetic material of the mutated solution has not been altered.

The rearrangement heuristic Insertion seeks to reduce the number of jobs in one of the two selected machines by trying to insert each of their jobs into the other ones. Algorithm 9 has the procedure of the rearrangement heuristic Insertion. We denote *S* = Insertion(*S*, *jsm*, *sm*, *i*) the solution derived from *S* by inserting job *jsm* (*jw* or *jo*) from the selected machine *sm* (*w* or *o*) into machine *i*. As can be seen, this heuristic goes through the jobs *jw* and *jo* of the machines *w* and *o* selected with the machine selection strategy Worst Random (line 1). Thus, for each pair of jobs (*jw* and *jo*), this algorithm traverses the *m* machines (line 2). In this way, for each machine *i* different from machine *w* and *o* (line 3 and line 9), it tries to insert the job *jw* of the worst machine *w* (line 3) and then the job *jo* from the other machine *o* (line 7) following two conditions, denoted as Cnd\_1 and Cnd\_2.


Cnd\_1(*S*, *jsm*, *sm*, *i*) (line 4 and line 10) allows verifying that the mutated solution (*S* ) will have equal or better quality than the initial solution (*S*). In this way, Cnd\_1 checks out that the sum of the processing time resulted from the insertion in the intervened machines *i* and *sm* (*w* or *o*) will be less than or equal to the sum of their processing times without performing the insertion. Hence, for each job *jw*, Cnd\_1(*S*, *jw*, *w*, *i*) returns TRUE if *Cw* − *pwjw* + *Ci* + *pijw* ≤ *Cw* + *Ci*, where *Cw* and *Ci* represent the time that machines *w* and *i* require to process their assigned jobs, respectively, while *pwjw* and *pijw* depict the processing time that machines *w* and *i* require to process job *jw*, respectively. Otherwise, it returns

FALSE. In the same way, for each job *jo*, Cnd\_1(*S*, *jo*, *o*, *i*) returns TRUE if *Co* − *pojo* + *Ci* + *pijo* ≤ *Co* + *Ci*, where *Co* and *Ci* represent the time that machines *o* and *i* require to process their assigned jobs, respectively; while *pojo* and *pijo* depict the processing time that machines *o* and *i* require to process job *jo*, respectively. Otherwise, it returns FALSE.

On the other hand, Cnd\_2(*S*, *jsm*, *sm*, *i*) (line 4 and line 10) checks out that the mutated solution (*S* ) will have equal or better quality than the initial solution (*S*). Cnd\_2 verifies that the processing time *Ci* of the machine *i* with the new job, either *jw* or *jo*, will be less than or equal to the current makespan *Cmax*. Therefore, for each job *jw*, Cnd\_2(*S*, *jw*, *w*, *i*) returns TRUE if *Ci* + *pijw* ≤ *Cmax*. Otherwise, it returns FALSE. Similarly, for each job *jo*, Cnd\_2(*S*, *jo*, *o*, *i*) returns TRUE if *Ci* + *pijo* ≤ *Cmax*. Otherwise, it returns FALSE.

In this way, the function Insertion(*S*, *jsm*, *sm* , *i*) (lines 5 and 11) is applied to *S* if and only if a job *j* (*jw* or *jo*) satisfies the two conditions (Cnd\_1 and Cnd\_2). The rearrangement process ends once an insertion is performed (lines 6 and 12), but if none of the jobs satisfied the three conditions, the mutated solution would remain with its genetic material without any modification.

On the other hand, the rearrangement heuristic Assemble uses two functions. The first one is the Insertion(*S*, *jsm*, *sm*, *i*) that works similarly to the above rearrangement heuristic. Additionally, it incorporates a second function called Interchange that seeks to exchange each job of the selected machines with each job of the other machines in an attempt to reduce the processing time of the selected machines. Algorithm 10 contains the procedure of the rearrangement heuristic Assemble. We denote *S* = Interchange(*S*, *jsm*, *sm*, *ji*, *i*) the solution derived from *S* by exchanging job *jsm* (*jw* and *jo*) from the selected machine *sm* (*w* or *o*) with each job *ji* in machine *i*. Like the Insertion rearrangement heuristic, Assemble loops through the jobs *jw* and *jo* of the machines *w* and *o* selected with the machine selection strategy Worst Random (line 1). Thus, for each pair of jobs (*jw* and *jo*), this algorithm goes through the *m* machines (line 2). In this fashion, first, it tries to insert the jobs *jw* of the worst machine *w* and *jo* of the other machine *o* into every machine *i* different from machines *w* and *o* (line 3 and line 9) according to the two conditions described in Algorithm 9: Cnd\_1 and Cnd\_2 (line 4 and line 10). Next, it attempts to interchange the same jobs *jw* and *jo* with each job *ji* in every machine *i* (line 15) different from machine *w* and *o* (line 16 and line 22), validating two conditions: Cnd\_3 and Cnd\_4 (line 17 and line 23).

Cnd\_3(*S*, *jsm*, *sm*, *ji*, *i*) (line 17 and line 23) allows verifying that the mutated solution (*S* ) will have equal or better quality than the initial solution (*S*). In this way, Cnd\_3 checks out that the processing time resulted from the exchange in the intervened machines *i* and *sm* (*w* or *o*) will be less than or equal to the sum of their processing times without swapping their jobs. Hence, for each job *jw*, Cnd\_3(*S*, *jw*, *w*, *ji*, *i*) returns TRUE if (*Cw* − *pwjw* + *pwji* ) + (*Ci* − *piji* + *pijw* ) ≤ *Cw* + *Ci*, where *Cw* and *Ci* represent the time that machines *w* and *i* require to process their assigned jobs, respectively; *pwjw* and *piji* depict the processing time that machines *w* and *i* require to process jobs *jw* and *ji*, respectively; *pwji pijw* indicate the processing time that machines *w* and *i* require to process job *ji* and *jw*, respectively. Otherwise, it returns FALSE. In the same way, for each job *jo*, Cnd\_3(*S*, *jo*, *o*, *ji*, *i*) returns TRUE if (*Co* − *pojo* + *poji* )+(*Ci* − *piji* + *pijo* ) ≤ *Co* + *Ci*, where *Co* and *Ci* represent the time that machines *o* and *i* require to process their assigned jobs, respectively; *pojo* and *piji* depict the processing time that machines *o* and *i* require to process jobs *jo* and *ji*, respectively; and *poji* and *pijo* indicate the processing time that machines *o* and *i* require to process job *ji* and *jo*, respectively. Otherwise, it returns FALSE.

On the other hand, the condition Cnd\_4(*S*, *jsm*, *sm*, *ji*, *i*) (line 17 and line 23) validates that the processing time resulting from the interchange in the intervened machines *i* and *sm* (*w* or *o*) will be less than or equal to the current makespan (*Cmax*) of the initial solution *S*. Hence, for each job *jw*, Cnd\_4(*S*, *jw*, *w*, *ji*, *i*) returns TRUE if (*Cw* − *pwjw* + *pwji* ≤ *Cmax*) and (*Ci* − *piji* + *pijw* ≤ *Cmax*). Otherwise, it returns FALSE. Similarly, for each job *jo*, Cnd\_4(*S*, *jo*, *o*, *ji*, *i*) returns TRUE if (*Co* − *pojo* + *poji* ≤ *Cmax*) and (*Ci* − *piji* + *pijo* ≤ *Cmax*). Otherwise, it returns FALSE.


The Assemble process ends once an operation, either the insertion or the interchange, is accomplished (lines 6, 12, 19, and 25). If none of the jobs met the two conditions, the mutated solution remains with its genetic material without any modification.

In this way, two variants of the operator with the best characteristics identified in the two previous stages (i.e., removing one job from two machines selected with the strategy Worst Random and re-inserting such jobs with the Min() heuristic) were created, one for each rearrangement heuristics presented in this section: Insertion and Assemble. The performance of the two variants, called Insertion and Assemble, was evaluated using the methodology mentioned above, i.e., starting from an initial population of 100 individuals that are subsequently mutated during 500 generations and using the same seed. Table 4 holds the experimental results obtained by the two mutation operators generated in this phase. Moreover, Table 4 includes the performance of the Download mutation operator, the original GGA operator described in Section 2.5, to compare the degree of improvement provided by the variants of the operator proposed in this section. For a comprehensive analysis, the performance of the operators was analyzed clustering the instances with the criteria used in the previous stages: number of jobs *n*, number of machines *m*, distribution of processing times *pij*, and the 1400 instances together. Thus, each column shows the performance of each assessed operator for the different criteria used to group the instances, highlighting in bold the results obtained by the best mutation operator.

As can be observed in Table 4, the best variant is that with the rearrangement heuristic Assemble, which for each pair of jobs first tries the insertion and then the interchange. The variants with the rearrangement heuristics Insertion and Assemble reached an average *RPD* of 0.0552 and 0.0395, respectively. However, it is important to note that the two versions of the mutation operators presented in this section outperformed the original Download mutation operator of the GGA studied that reached an average *RPD* of 0.1139, as well as the four state-of-the-art operators, which had an average *RPD* above 0.1.


**Table 4.** Comparison of mutation operators with the rearrangement heuristics Insertion and Assemble and the Download operator using *RPD*.

### **5. Comparing GGA with the Old and the New Mutation Operators**

Given the knowledge gained from the experimental study, we propose a mutation operator called 2-Items Reinsertion. This operator randomly chooses two jobs from two different machines selected with the strategy Worst Random to release them and later reinsert them with the allocation heuristic Min(). Furthermore, it employs the rearrangement heuristic Assemble, based on insertion and interchange operations. The rearrangement process is only applied if, after releasing and reinserting the jobs, the genetic material of the mutated solution has not been modified.

To assess the 2-Items Reinsertion mutation operator performance, we run two variants of the state-of-the-art GGA for *R*||*Cmax* [1]. One with the old mutation operator (the Download mutation operator), i.e., the state-of-the-art GGA and the Enhanced GGA (EGGA) that uses the 2-Items Reinsertion mutation instead of the Download operator to evaluate their performance over the 1400 benchmark instances. For an equivalent comparison, the effectiveness and efficiency of both GGA variants were compared by using the same parameter configuration, i.e., the one proposed by Ramos-Figueroa et al. [1]. Table 5 contains the parameter values utilized for the population size |*P*|, number of individuals selected for the crossover *nc*, number of individuals selected for the mutation *nm*, elite population size |*B*|, and maximal number of generations *max*\_*gen*. In this way, we analyze the strengths and weaknesses of the 2-Items Reinsertion mutation operator, distinguishing the quality of the solutions found by each GGA variant, their search time, as well as their ability to escape from local optima.

**Table 5.** Parameter configuration.


For a fair comparison, both algorithms were programmed in the Rust language and were compiled using Visual Studio in the 64-bits mode. The experiments were performed on a computer with an Intel Core i5 (3.10 GHz), and 16 GB in RAM. Similar to Ramos-Figueroa et al. [1], for each instance, a single execution of the algorithms was run, with the same initial seed for the random number generation.

### *5.1. Comparing the effectiveness of GGA with the old and the new mutation operators*

To measure the effectiveness of the designed 2-Items Reinsertion mutation operator, we applied the two GGA variants to the 1400 test instances and measured the improvement degree in the quality of the solutions found by each algorithm based on the *RPD*. Table 6 contains the experimental results. The first and second columns indicate the criteria used to group the test instance based on the number of jobs *n*, the number of machines *m*, the processing time distribution *pij*, and the 1400 instances together. On the other hand, the remaining columns contain the average *RPD* obtained by each metaheuristic algorithm for the four grouping criteria, respectively. Finally, this table highlights in bold the results obtained by the best GGA for each group of instances.


*U*(1000, 1100) 0.0131 **0.0036** *JobsCorr* 0.0522 **0.0380** *MacsCorr* 0.0780 **0.0419** 1400 instances 0.0586 **0.0283**

*Pij U*(100, 200) 0.0820 **0.0229**

**Table 6.** Comparison of the state-of-the-art GGA and the EGGA presented in this work using *RPD*.

Table 6 illustrates that the EGGA showed a better performance than the state-of-the-art GGA using any criteria to group the test instances. Furthermore, it is worth noting that the EGGA reaches an average *RPD* considerably lower than the state-of-the-art GGA by solving the 1,400 test instances, with 0.028 and 0.059, respectively. Additionally, we applied the Wilcoxon rank-sum test to assess whether the differences in the *RPD* achieved by both GGAs for the 1,400 test instances are statistically significant. The Wilcoxon ranksum is a non-parametric test that compares two algorithms without assuming a normal distribution, even for small sample sizes [44]. Table 7 presents the results obtained by the Wilcoxon rank-sum for the *RPD* values reached by both algorithms in the benchmark considered with a 95%-confidence level. For a comprehensive comparison, we generated a hypothesis test for the *RPD* achieved by both GGAs in groups of instances sorted according to the number of jobs *n*, the number of machines *m*, the distribution of the processing times *pij* of the instances, and the complete benchmark (1400 instances). In this way, the first column indicates the criterion used to compare the algorithms, the second one contains the classes covered for each grouping criterion, and the last column indicates the *p*-values obtained by the Wilcoxon test.


**Table 7.** *p*-values for the Wilcoxon test for GGA and EGGA.

Table 7 indicates that the EGGA is indeed statistically better than the state-of-the-art GGA considering the *RPD* that they reached for the test benchmark for all the groups of instances considered since all *p*-values are less than the level of significance *α* = 0.05.

Finally, in order to graphically show the suitability of the designed mutation operator, the experimental study presented in Section 2.5 was repeated but this time for the impact analysis of crossover and mutation rates on the EGGA. In this way, the EGGA that incorporates the 2-Items Reinsertion mutation operator was run with the same 9 configurations, i.e., *Conf*1: *nc* = 20, *nm* = 20, *Conf*2: *nc* = 20, *nm* = 40, ... *Conf*9: *nc* = 60, *nm* = 60. Figure 8 presents a bar graph with the results obtained from this study, where each bar depicts one of the 9 configurations grouped according to the number of mutated solutions (*nm*), and each pattern indicates the number of selected individuals for the crossover process (*nc*): squares = 20, waves= 40, and circles= 60. As Figure 8 indicates, the EGGA performance is mainly related to the number of individuals considered for the mutation processes *nm* in such a way that the performance of the EGGA improves (lower *RPD*) as the number of mutated solutions increases. Similarly, as the number of selected individuals for the crossover process *nc* increases, the GGA performance improves but to a lesser degree.

**- -** *-* **-**

**Figure 8.** Impact analysis of the parameters: number of individuals selected for crossover *nc* and number of mutated solutions *nm* in the EGGA final performance.

The behavior mentioned above shows the suitability of the 2-Items Reinsertion mutation, which is the operator with the biggest impact on EGGA final performance and

improves it considerably. Thus, the EGGA behavior is quite similar to the one presented by the GGA-CGT [28], where the mutation operator has the greatest positive impact on the final performance of this algorithm.

### *5.2. Comparing the Efficiency of GGA with the Old and the New Mutation Operators*

After analyzing the effectiveness of the EGGA, we evaluate the implications associated with the computational time of using the 2-Items Reinsertion mutation operator. Table 8 includes the experimental results. Like Table 6, the first and second columns describe the characteristics used to cluster the instances: the number of jobs *n* and machines *m*, the processing time distribution *pij*, and the 1400 instances together. Thus, the following columns contain the average time in seconds obtained by the state-of-the-art GGA and the EGGA for each instance set, respectively.


**Table 8.** Comparison of the state-of-the-art GAA and the EGGA based on the time (time in seconds).

Table 8 shows that the 2-Items Reinsertion mutation operator causes the EGGA to be much slower. Said computational cost is closely related to the rearrangement strategy Assemble, incorporated to avoid, as far as possible, becoming stuck in a local optima. Although the computational cost of this strategy is high, it is also too useful, since the properties and characteristics of the addressed problem make the mutation operator by itself incapable of avoiding local optima., mainly in the instances with processing times generated in the ranges *U*(1, 100) and *U*(10, 100), where the average times increased from 1.25 to 34.09 and 14.04 seconds, respectively. To review such algorithmic behavior, we analyzed the average generation in which the state-of-the-art GGA and the EGGA find the best solution for each test instance.

Table 9 shows that the GGA becomes quickly trapped in local optima in generation 16 on average, while the EGGA shows a better ability to deal with the landscape characteristics of the *R*||*Cmax* search space, finding its best solutions in generation 362 on average. In this way, Table 9 shows the importance of incorporating the 2-Items Reinsertion mutation operator to the GGA since, although the computational cost is high, it provides to the EGGA a better exploration capability during the search process.


**Table 9.** Comparison of the state-of-the-art GAA and the EGGA based on the generation in which the best solution in population is improved.

From this study, we can conclude that it is still necessary to improve the performance of the EGGA and study its other operators, evaluation function, and stop criteria in order to better explore the search space, since it also becomes stuck in local optima, although not as soon as the original GGA. Additionally, we will focus on analyzing the properties and characteristics of the instances in the sets *U*(1, 100) and *U*(10, 100), where the EGGA stagnates sooner and requires a longer processing time since the rearrangement heuristic is used more times during the solution process of instances with those characteristics.

### **6. Conclusions and Paths of Work**

The GGA has become one of the most outstanding metaheuristics for the solution of combinatorial optimization problems related to the partition of a set of items into different subsets. The development of a GGA involves the definition of variation operators adapted to work at the group level. The main goal of this paper was to promote the design of intelligent operators for GGAs as a more suitable way to obtain high-performance GGAs that incorporate knowledge of the problem-domain.

We present a systematic experimental examination to gain insights into the importance of each phase involved in the mutation operator of a GGA designed to solve the Parallel-Machine scheduling problem with unrelated machines and makespan minimization (*R*||*Cmax*), analyzing whether different strategies actually contribute to the performance of the operator. The overall procedure of a grouping mutation operator for *R*||*Cmax* comprises: (1) selecting one or more machines; (2) selecting one or more jobs from each of the selected machines; and (3) reinserting the selected jobs in some of the machines. In order to learn something about each of these three algorithmic components, this work covered the analysis of each component in isolation by evaluating distinct strategies to deal with it. In this way, the study covered the evaluation of four state-of-the-art grouping mutation operators, thirty-five operators with different numbers of machines and jobs handled, four machine selection strategies, and two rearrangement heuristics for the reinsertion of the selected jobs. The experimental results suggested that the mutation operator with the best performance: (1) selects two machines, one of the machines with the worst *Ci* value and one random machine; (2) selects one random job from each of the selected machines; and (3) reinserts the selected jobs in two stages. First, for each job, each machine is checked in an attempt to insert the job in the machine with the lowest *Ci* value. Second, if the first stage yields the original solution, a rearrangement heuristic is applied to attempt to reduce the processing time of the selected machines by trying to insert one of their jobs into the other machines or to exchange one of their jobs with one job of the other machines. The knowledge gained from the systematic study was used to design a new grouping mutation operator, called 2-Items Reinsertion. The new operator was incorporated into the state-of-the-art GGA (replacing the original mutation operator) to solve 1400 benchmark instances, showing significant differences with an improvement rate of 52%. These results underline the importance of evaluating the performance of the different components of the GGA operators.

We are aware that the current performance of the Enhanced GGA (EGGA) is still far from reaching the performance of state-of-the-art algorithms for *R*||*Cmax*. However, the improvements achieved with the approach proposed in this work are quite promising. Therefore, we believe that with the design and implementation of experimental approaches such as the one presented in this paper we can further improve the performance of EGGA by studying the behavior of other genetic components, such as the population initialization strategy, the selection mechanism, the crossover operator, the replacement mechanism, and the objective function. In this order of ideas, the study of the final performance obtained by the EGGA for the *R*||*Cmax* problem revealed that there still are benchmark instances that show a high degree of difficulty; for these instances, the included strategies in the EGGA do not appear to lead to better solutions. Future work will consist of studying the different components of each operator and technique included in the EGGA, designing a better crossover operator, implementing an efficient reproduction technique, and analyzing the EGGA behavior to understand the impact of each strategy when solving different instances of the *R*||*Cmax* problem. We are also developing a new fitness function that will allow us to discriminate between solutions with the same *Cmax* value but with a different exploitation of the machine's processing time. The knowledge gained from the analysis of each component of the grouping mutation operator for the *R*||*Cmax* problem can help us gain a better understanding of the performance of other heuristics for this problem and opens up an interesting range of possibilities for future research on other Parallel-Machine Scheduling variants. It is expected that the study presented in this paper represents a guideline to carry out similar systematic experimental examinations to analyze the components of other GGAs. This knowledge can be used to develop new intelligent operators for solving NP-hard grouping problems.

**Author Contributions:** Conceptualization, O.R.-F., M.Q.-C., E.M.-M. and N.C.-R.; methodology, O.R.-F. and M.Q.-C.; software, O.R.-F.; validation, O.R.-F. and M.Q.-C.; formal analysis, M.Q.-C.; investigation, O.R.-F. and M.Q.-C.; resources, O.R.-F.; writing—original draft preparation, O.R.- F.; writing—review and editing, O.R.-F., M.Q.-C., E.M.-M. and N.C.-R.; visualization, O.R.-F. and M.Q.-C.; supervision, E.M.-M. and M.Q.-C.; project administration, M.Q.-C. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

### *Article* **The Hypervolume Newton Method for Constrained Multi-Objective Optimization Problems**

**Hao Wang 1,\*, Michael Emmerich 1, André Deutz 1, Víctor Adrián Sosa Hernández <sup>2</sup> and Oliver Schütze <sup>3</sup>**

	- Atizapán de Zaragoza, Mexico City 52926, Mexico
	- <sup>3</sup> Computer Science Department, Cinvestav-IPN, Mexico City 07360, Mexico
	- **\*** Correspondence: h.wang@liacs.leidenuniv.nl

**Abstract:** Recently, the Hypervolume Newton Method (HVN) has been proposed as a fast and precise indicator-based method for solving unconstrained bi-objective optimization problems with objective functions. The HVN is defined on the space of (vectorized) fixed cardinality sets of decision space vectors for a given multi-objective optimization problem (MOP) and seeks to maximize the hypervolume indicator adopting the Newton–Raphson method for deterministic numerical optimization. To extend its scope to non-convex optimization problems, the HVN method was hybridized with a multi-objective evolutionary algorithm (MOEA), which resulted in a competitive solver for continuous unconstrained bi-objective optimization problems. In this paper, we extend the HVN to constrained MOPs with in principle any number of objectives. Similar to the original variant, the first- and second-order derivatives of the involved functions have to be given either analytically or numerically. We demonstrate the applicability of the extended HVN on a set of challenging benchmark problems and show that the new method can be readily applied to solve equality constraints with high precision and to some extent also inequalities. We finally use HVN as a local search engine within an MOEA and show the benefit of this hybrid method on several benchmark problems.

**Keywords:** multi-objective optimization; hypervolume indicator; newton method; evolutionary algorithms; constraint handling; hypervolume scalarization

### **1. Introduction**

Multi-objective optimization problems (MOPs)—i.e., problems where several objectives have to be optimized concurrently –naturally arise in many applications (e.g., [1–4]). As an example, in many portfolio problems, one is interested in maximizing the expected return and social responsibility or sustainability while minimizing the risk to a financial portfolio ([5,6]). In multi-objective optimization, we distinguish between the decision space, which contains the vectors of decision variables, and the objective space, which is the *k* dimensional real vectors and comprises the images of the vector-valued objective function. A typical approach to the solution of MOPs is to compute or approximate the non-dominated (or efficient) set with respect to the Pareto dominance order (the image of which in the objective space is called the Pareto front). One important characteristic of (continuous) MOPs is that in regular cases, the Pareto front is a manifold of *k* − 1 dimensions, where *k* denotes the number of objective functions. In general, it is possible that parts of the Pareto front are of lower dimension, but the Pareto front is never more than *k* − 1 dimensions. Since, in the continuous case, the non-dominated set and the Pareto front can contain infinitely many points, it is usually approximated by a finite set of points. In particular, in the area of evolutionary multi-objective optimization (EMO), many performance indicators have been proposed that propagate optimal approximations of the Pareto front (e.g., [7–10]). While their definitions slightly differ, most have in mind to obtain (more or less) evenly spread solutions along the Pareto front.

**Citation:** Wang, H.; Emmerich, M.; Deutz, A.; Hernández, V.A.S.; Schütze, O. The Hypervolume Newton Method for Constrained Multi-Objective Optimization Problems. *Math. Comput. Appl.* **2023**, *28*, 10. https://doi.org/10.3390/ mca28010010

Academic Editor: Efrén Mezura-Montes

Received: 1 November 2022 Revised: 24 December 2022 Accepted: 4 January 2023 Published: 9 January 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Interestingly, with the *hypervolume indicator* [10], there exists an indicator that does not require the knowledge of the location of the true Pareto front. Still, its maximization leads to well-distributed approximation sets consisting of only non-dominated solutions. In this work, by "well-distributed", we mean the objective points have good coverage of the Pareto front and are gap-free when the population size is large. At the maximum of the hypervolume indicator, the density of objective points is inversely proportional to the local curvature of the Pareto front [11]. Here is where the idea of set-scalarization comes into play. In set-scalarization methods, rather than focusing on the improvement of single points of the approximation set, the focus is on the optimization of a fixed cardinality set as an entity concerning a set-based indicator, e.g., the hypervolume indicator. The objective function of the set-scalarization method, in our case, the hypervolume indicator, provides a mapping from the set of fixed cardinality sets in the decision space to a scalar that has to be maximized. Due to the properties mentioned above of the hypervolume indicator, the resulting set will provide a well-distributed set of points on the Pareto front.

Multi-objective evolutionary algorithms (MOEAs) have long since adopted the idea of set-scalarization. The so-called indicator-based MOEAs (e.g., [12–14]) use performance indicators to guide the search, e.g., by indicator-based selection. In numerical methods, the set scalarization approach was first addressed in gradient-based hypervolume maximization [15–19] and in the maximization of the Averaged Hausdorff Metric [14]). More recently, the approach was generalized to second-order methods with the Hypervolume Newton Method (HVN), a set scalarization-based Newton–Raphson method for the maximization of the hypervolume indicator value of a given MOP (e.g., [20,21]). However, this method has only been discussed for unconstrained and bi-objective optimization problems, which limits its application.

In this paper, we extend the HVN for constrained MOPs with a general number of objectives. To this end, we present the HVN for equality-constrained problems and further discuss a straightforward active set method to handle inequalities. Since the HVN is highly local, we also discuss the hybridization of this method with an MOEA. Finally, we present numerical results indicating the strength of the novel approaches.

The remainder of this paper is organized as follows: in Section 2, we present the necessary background required for understanding the sequel, and we review the related work. In Section 3, we present the HVN for constrained multi-objective optimization problems. Section 4 presents the numerical results of the constrained HVN as a standalone algorithm and a local search strategy within a hybrid evolutionary algorithm. Finally, we conclude and give possible paths for future research in Section 5.

### **2. Background and Related Work**

### *2.1. Notations*

We will always denote a finite Pareto approximate set by **<sup>X</sup>** <sup>=</sup> {**x**(1), **<sup>x</sup>**(2), ... , **<sup>x</sup>**(*μ*)} ⊆ R*n*. When differentiating a set function, e.g., the hypervolume, over the input set, we often concatenate the points in **X** into a much longer vector, i.e., **X** = [**x**(1), **x**(2), ... , **x**(*μ*)] ∈ R*μn*. To make our discussion less cumbersome, we abuse the notation **X** slightly such that it can be interpreted as a finite set in R*<sup>n</sup>* or an R*μn*-vector, depending on the context. (See [16] for a detailed formal discussion of the mapping between fixed cardinality sets and vectors.) We will explain the meaning of **X** on the spot whenever it is unclear from the text. We will always denote by <sup>∇</sup> and <sup>∇</sup><sup>2</sup> the gradient/Jacobian and Hessian operators on real-valued functions, respectively, when the domain of such a function is clear from the text. Otherwise, we take the derivative operator *∂*/*∂***X**. When expressing the Hessian matrix, we will use the numerator layout for matrix calculus notations [22].

### *2.2. Multi-Objective Optimization*

A real-valued multi-objective optimization problem (MOP) involves minimizing multiple objective functions simultaneously, i.e., *<sup>F</sup>* = (*f*1, ... , *fk*), *fi* : X → <sup>R</sup>, X ⊆ <sup>R</sup>*n*, *<sup>i</sup>* <sup>∈</sup> {1, ... , *<sup>k</sup>*}. For every **<sup>y</sup>**(1) and **<sup>y</sup>**(2) <sup>∈</sup> <sup>R</sup>*k*, we say **<sup>y</sup>**(1) weakly dominates **<sup>y</sup>**(2) (written as

**<sup>y</sup>**(1) **<sup>y</sup>**(2)) iff *<sup>y</sup>* (1) *<sup>i</sup>* ≤ *y* (2) *<sup>i</sup>* , *<sup>i</sup>* <sup>∈</sup> [<sup>1</sup> ... *<sup>k</sup>*]. The Pareto order <sup>≺</sup> on <sup>R</sup>*<sup>k</sup>* is defined: **<sup>y</sup>**(1) <sup>≺</sup> **<sup>y</sup>**(2) iff. **<sup>y</sup>**(1) **<sup>y</sup>**(2) and **<sup>y</sup>**(1) <sup>=</sup> **<sup>y</sup>**(2). A point **<sup>x</sup>** ∈ X is called efficient or (Pareto) optimal iff. **x** ∈ X (*F*(**x** ) ≺ *F*(**x**)). The set *PQ* of all Pareto optimal solutions of a MOP is called the Pareto set, and its image *F*(*PQ*) is called the Pareto front. Typically, i.e., under certain (mild) assumptions on the model, one can assume that the Pareto set and front of a given continuous MOP form at least locally an object of dimension *k* − 1 ([23]).

The Pareto order can also be extended to the family of sets [10], i.e., we say *A* ≺ *B* iff. ∀**y** ∈ *B*∃**y** ∈ *A*(**y** ≺ **y**). The set of all efficient points of X is called the *efficient set*. The image of the efficient set under *F* is called the *Pareto front*. Multi-objective optimization algorithms (MOAs) often employ a finite multiset **<sup>X</sup>** <sup>=</sup> {**x**(1), ... , **<sup>x</sup>**(*μ*)} to approximate the efficient set, whose image under *F* is denoted by **Y**. Multi-objective optimization is an active research field that has produced many algorithms for the approximation of the entire Pareto set/front of a given MOP. There exist, for instance, scalarization methods, and mathematical programming techniques that transform the given MOP into an auxiliary scalar optimization problem (SOP) (e.g., [24]). Via solving a clever sequence of such SOPs, one can obtain in many cases suitable Pareto front approximations (e.g., [25–28]). In [29], a Newton method is proposed for multi-objective optimization. Next to these point-wise iterative local search strategies there exist global set-based algorithms such as cell-to-cell mapping techniques and subdivision techniques ([30–32]) as well as specialized evolutionary algorithms ([33–36]). There exist in particular indicator-based evolutionary algorithms (IBEAs) that aim for Pareto front approximations of a given performance indicator (e.g., [12–14]). Widely used performance indicators are the Generational Distance (GD [7]), the Inverted Generational Distance and variants ([8,37,38]), the averaged Hausdorff distance Δ*<sup>p</sup>* ([9,39,40]), and the Hypervolume indicator, which we will use in this work and briefly review in the next section.

Finally, there exist multi-objective continuation methods that make use of the fact that the solution set forms at least locally a manifold (e.g., [23,41–46]).

### *2.3. Hypervolume Indicator and Its First-Order Derivatives*

The hypervolume indicator (HV) [10,47] is defined as the Lebesgue measure of the compact set dominated by a Pareto approximation set **<sup>Y</sup>** <sup>⊂</sup> <sup>R</sup>*<sup>k</sup>* and cut from above by a reference point **r**:

$$\text{HV}(\mathbf{Y}; \mathbf{r}) = \lambda\_k(\{\mathbf{p} \colon \exists \mathbf{y} \in \mathbf{Y} (\mathbf{y} \prec \mathbf{p}) \land \mathbf{p} \prec \mathbf{r}\})\_\wedge$$

where *<sup>λ</sup><sup>k</sup>* denotes the Lebesgue measure in <sup>R</sup>*k*. HV is Pareto compliant, i.e., for all **<sup>Y</sup>** <sup>≺</sup> **<sup>Y</sup>** , HV(**Y**;**r**) > HV(**Y** ;**r**), and is extensively used to assess the quality of approximation sets to the Pareto front, e.g., in SMS-MOEA [12] and multi-objective Bayesian optimization [48]. Being a set function, it is cumbersome to define the derivative of HV. (The derivative of a set function is not defined for an arbitrary family of sets. For some special cases, it can be defined directly, e.g., on Jordan-measurable sets [49].) Therefore, we follow the generic set-based approach for MOPs [16], which considers a finite approximation sets of size *μ* vectors as a point in R*μn*, i.e., **X** = [**x**(1), **x**(2), ... , **x**(*μ*)] <sup>∈</sup> <sup>R</sup>*μn*. Similarly, the image of **X** under *F* can also be represented by a R*μk*-vector: **Y** = [*F*(**x**(1)), *F*(**x**(2)), ... , *F*(**x**(*μ*))] . In this sense, the objective function *F* is also extended as follows:

$$\mathbf{F} \colon \mathcal{X}^{\mu} \to \mathbb{R}^{\mu k}, \mathbf{X} \mapsto \left[ F(X\_1, \dots, X\_n), F(X\_{n+1}, \dots, X\_{2n}), \dots, F(X\_{(\mu-1)n+1}, \dots, X\_{\mu n}) \right]^\top.$$

Taking **F**, we can express the hypervolume indicator as a function on R*μn*:

$$\mathcal{H}\_{\mathbf{F}} \colon \mathbb{R}^{\mu n} \to \mathbb{R}\_{\geq 0}, \quad \mathbf{X} \mapsto \mathrm{HV}(\mathbf{F}(\mathbf{X}); \mathbf{r}) \dots$$

We will henceforth omit the reference point **r** in H**<sup>F</sup>** for simplicity. It is straightforward to express the gradient of H**<sup>F</sup>** with respect to **X** using the chain rule as reported in our previous works [16,19]: ∇H**F**(**X**)=(*∂*H**F**/*∂***F**)(*∂***F**/*∂***X**), in which we also discussed the time complexity of computing the hypervolume gradient. It is noted here that an alternative to the computation of the gradient of the entire set, it was also suggested to compute only the gradient of a single point with respect to the hypervolume indicator; this approach is referred to as hypervolume scalarization [50].

### *2.4. Hypervolume Hessian and Hypervolume Newton Method*

Here, we assume *F* is at least twice continuously differentiable. In general, the Hessian matrix of the hypervolume indicator can be expressed as follows:

$$
\begin{split}
\nabla^2 \mathcal{H}\_{\mathbf{F}} &= \frac{\partial}{\partial \mathbf{X}} \left(\frac{\partial \mathcal{H}\_{\mathbf{F}}}{\partial \mathbf{F}} \frac{\partial \mathbf{F}}{\partial \mathbf{X}}\right) = \left[\frac{\partial}{\partial \mathbf{X}} \left(\frac{\partial \mathcal{H}\_{\mathbf{F}}}{\partial \mathbf{F}}\right)\right]^\top \frac{\partial \mathbf{F}}{\partial \mathbf{X}} + \frac{\partial \mathcal{H}\_{\mathbf{F}}}{\partial \mathbf{F}} \frac{\partial^2 \mathbf{F}}{\partial \mathbf{X} \partial \mathbf{X}^\top} \\&= \nabla \mathbf{F}^\top \frac{\partial^2 \mathcal{H}\_{\mathbf{F}}}{\partial \mathbf{F} \partial \mathbf{F}^\top} \nabla \mathbf{F} + \frac{\partial \mathcal{H}\_{\mathbf{F}}}{\partial \mathbf{F}} \frac{\partial^2 \mathbf{F}}{\partial \mathbf{X} \partial \mathbf{X}^\top}.
\end{split}
\tag{1}
$$

Note that in the above expression, *<sup>∂</sup>*2H**F**/*∂***F***∂***<sup>F</sup>** and *<sup>∂</sup>*2**F**/*∂***X***∂***<sup>X</sup>** denote the Hessian matrix of the hypervolume indicator with respect to objective points and of the objective function **F**, respectively. In our previous work [21], we derived the analytical expression of <sup>∇</sup>2H**<sup>F</sup>** for bi-objective cases and analyzed the structure and properties of the hypervolume Hessian matrix. In addition, we implemented a standalone hypervolume Newton (HVN) algorithm for unconstrained MOPs. Moreover, we have shown that the Hessian <sup>∇</sup>2H**<sup>F</sup>** is a tridiagonal block matrix in bi-objective cases and provided the non-singularity condition thereof, which states the Hessian is only singular on a null subset of R*μ<sup>n</sup>* [21], thereby ascertaining the safety of applying the HVN method.

The analytical expression of the Hessian matrix for higher dimensions contains the derivatives *∂*H**F**/*∂x* (-) *<sup>i</sup> ∂x* (*m*) *<sup>j</sup>* , *m* = 1, ... , *μ*, - = 1, ... , *μ*, *i* = 1, ... , *n*, *j* = 1, ... , *n*. To compute these derivatives analytically, the chain rule can be applied (see [21]). In [21]. However, the Hessian matrix of the second mapping—from the points in the objective space (**y**(1), ... , **y**(*k*)) to the hypervolume indicator—was only given analytically for two dimensions. The Hessian matrix of this second mapping can be generalized to *k* dimensional objective spaces, and it is continuous in regular cases. Here, we will only sketch the construction of this matrix and leave the detailed analysis for future research. It is known that in the *N*-dimensional case, the first derivatives *∂* HV /*∂yi* are given by the (*k* − 1) dimensional Lebesgue measure of the *k* − 1 dimensional faces of the attainment surface that separates the dominated space from the non-dominated space (see Figure 1, *∂* HV /*∂y* (1) <sup>3</sup> ). These faces themselves have a derivative that is given by the (*k* − 2)-dimensional Lebesgue measure of the *k* − 2-dimensional segments (or patches) at the boundary of these faces, which are also changing continuously with *yi* (see Figure 1, examples *∂* HV /*∂y* (1) <sup>1</sup> *∂y* (1) <sup>3</sup> and *∂* HV /*∂y* (2) <sup>2</sup> *∂y* (1) <sup>3</sup> ). Note that points in the objective space need to be in a general position to guarantee differentiability; otherwise, one-sided differentiability applies and one of the two derivatives, i.e., when the derivative with perturbed coordinate falls to the dominated subspace, it is always zero [16].

In this work, however, rather than investigating in detail the analytical and computational properties of the Hessian for more than two objective functions, we compute the second-order derivative *<sup>∂</sup>*2H**F**/*∂***F***∂***<sup>F</sup>** with the automatic differentiation (AD) method [51] and focus on solving equality-constrained MOPs using the Hessian matrix of the hypervolume indicator.

**Figure 1.** Example of a hypervolume indicator Hessian computation in three-dimensional objective space with a collection of points {**y**(1), **<sup>y</sup>**(2), **<sup>y</sup>**(3)} and reference point **<sup>r</sup>**.

### **3. Hypervolume Newton Method for Constrained MOPs**

In this section, we first describe the base method of HVN for the treatment of equality constrained MOPs and will then discuss how to deal with inequalities and with dominated points that may be computed during the run of the Newton method.

### *3.1. Handling Equalities*

Consider a continuous equality-constrained MOP of the form

$$\begin{array}{ll}\underset{\mathbf{x}\in\mathcal{X}}{\min} & F(\mathbf{x}),\\\text{s.t.} & h(\mathbf{x}) = 0,\end{array} \tag{2}$$

where *<sup>h</sup>*(**x**)=(*h*1(**x**), ... , *hp*(**x**)), and *hi* : <sup>R</sup>*<sup>n</sup>* <sup>→</sup> <sup>R</sup>, *<sup>i</sup>* <sup>=</sup> 1, ... , *<sup>p</sup>*, being the *<sup>i</sup>*-th equality constraint. The objective map is defined by *<sup>F</sup>* : X ⊂ <sup>R</sup>*<sup>n</sup>* <sup>→</sup> <sup>R</sup>*k*, where *fi* : X ⊂ <sup>R</sup>*<sup>n</sup>* <sup>→</sup> <sup>R</sup> is the *i*-the individual objective to be considered in the MOP. The feasible set is given by:

$$Q = \{ \mathbf{x} \in \mathcal{X} \; : \; h(\mathbf{x}) = 0 \}. \tag{3}$$

The set (population) based hypervolume optimization problem we are considering in this work is the following one:

$$\max\_{\substack{X \subset Q \\ |X| = \mu}} \text{HV}(F(X)), \tag{4}$$

where HV(*F*(*X*)) denotes the value of the hypervolume for a given set *<sup>X</sup>* <sup>=</sup> {**x**(1), ... , **<sup>x</sup>**(*μ*)} of magnitude *<sup>μ</sup>* <sup>∈</sup> <sup>N</sup>, where each **<sup>x</sup>**(*i*) <sup>∈</sup> <sup>R</sup>*n*. Note that the set *<sup>X</sup>* <sup>⊂</sup> *<sup>Q</sup>* can be interpreted as a point in *Rμ<sup>n</sup>* (via considering **X** = (*x* (1) <sup>1</sup> , ... , *x* (1) *<sup>n</sup>* , *x* (2) <sup>1</sup> , ... , *x* (2) *<sup>n</sup>* , ... , *x* (*μ*) <sup>1</sup> , ... , *x* (*μ*) *<sup>n</sup>* )), and hence, problem (4) can be identified by a scalar objective optimization problem of dimension *μn*.

The feasibility of *X* (i.e., *X* ⊂ *Q*) is identical to

$$h\_i(\mathbf{x}^{(j)}) = 0, \qquad i = 1, \ldots, p, \ j = 1, \ldots, \mu. \tag{5}$$

For the related set-based equality constraints, we define for *i* ∈ {1, ... , *p*} and *j* ∈ {1, . . . , *μ*}

$$h\_{i,j}: \mathbb{R}^{\mu n} \to \mathbb{R}, \qquad h\_{i,j}(\mathbf{X}) = h\_i(\mathbf{x}^{(j)}).\tag{6}$$

For checking the feasibility of all decision points, we define ¯ *<sup>h</sup>* : <sup>R</sup>*μ<sup>n</sup>* <sup>→</sup> <sup>R</sup>*pn* via

$$
\bar{h}(\mathbf{X}) = \begin{pmatrix} h\_{1,1}(\mathbf{X}) \\ h\_{2,1}(\mathbf{X}) \\ \vdots \\ h\_{p,1}(\mathbf{X}) \\ h\_{1,2}(\mathbf{X}) \\ h\_{2,2}(\mathbf{X}) \\ \vdots \\ \vdots \\ h\_{p,2}(\mathbf{X}) \\ \vdots \\ h\_{p,n}(\mathbf{X}) \end{pmatrix} =: \begin{pmatrix} \bar{h}\_{1}(\mathbf{X}) \\ \bar{h}\_{2}(\mathbf{X}) \\ \vdots \\ \bar{h}\_{p}(\mathbf{X}) \\ \bar{h}\_{p+1}(\mathbf{X}) \\ \vdots \\ \bar{h}\_{2p}(\mathbf{X}) \\ \vdots \\ \bar{h}\_{pn}(\mathbf{X}) \\ \vdots \\ \bar{h}\_{pn}(\mathbf{X}) \end{pmatrix}, \tag{7}
$$

then its Jacobian is given by

$$\hat{H} := \nabla \vec{h}(\mathbf{X}) = \text{diag}\left(H(\mathbf{x}^{(1)}), \dots, H(\mathbf{x}^{(\mu)})\right) \in \mathbb{R}^{\mu p \times \mu n},\tag{8}$$

where

$$H(\mathbf{x}^{(i)}) = \begin{pmatrix} \nabla h\_1(\mathbf{x}^{(i)})^\top \\ \vdots \\ \nabla h\_p(\mathbf{x}^{(i)})^\top \end{pmatrix} \in \mathbb{R}^{p \times n}. \tag{9}$$

The Karush-Kuhn-Tucker (KKT) equations of the problem (4) hence read as

$$\begin{aligned} \nabla \mathcal{H}\_{\mathbf{F}}(\mathbf{X}) + H^{\top} \lambda &= 0 \\ \bar{h}(\mathbf{X}) &= 0, \end{aligned} \tag{10}$$

for a Lagrange multiplier (or the dual variable) *<sup>λ</sup>* <sup>∈</sup> <sup>R</sup>*μ<sup>p</sup>* which directly leads to the root finding problem

$$\begin{aligned} \mathcal{G}: \mathbb{R}^{n(\mu+p)} &\to \mathbb{R}^{n(\mu+p)}\\ \mathcal{G}(\mathsf{X}, \mathsf{A}) &= \begin{pmatrix} \nabla \mathcal{H}\_{\mathsf{F}}(\mathsf{X}) + \mathcal{H}^{\top} \mathsf{A} \\\ \bar{h}(\mathsf{X}) \end{pmatrix} = 0, \end{aligned} \tag{11}$$

where *<sup>λ</sup>* <sup>∈</sup> <sup>R</sup>*μn*. The Jacobian of *<sup>G</sup>* at (*X*, *<sup>λ</sup>*)*<sup>T</sup>* is given by

$$DG(\mathbf{X}, \lambda) = \begin{pmatrix} \nabla^2 \mathcal{H}\_\mathbf{F}(\mathbf{X}) + \mathbf{M} & \ddot{H}^\top \\ \ddot{H} & 0 \end{pmatrix} \in \mathbb{R}^{\mu(n+p)\times\mu(n+p)},\tag{12}$$

where

$$\mathbf{M} = \sum\_{j=1}^{\mu p} \lambda\_i \nabla^2 \bar{h}\_j(\mathbf{X}) \in \mathbb{R}^{\mu n \times \mu n}. \tag{13}$$

Denoting by **<sup>X</sup>***<sup>t</sup>* <sup>∈</sup> <sup>R</sup>*μ<sup>n</sup>* and *<sup>λ</sup><sup>t</sup>* <sup>∈</sup> <sup>R</sup>*μp*, the variables in iteration *<sup>t</sup>*, a Newton step for problem (11) is given by

$$
\begin{pmatrix} \mathbf{X}\_{t+1} \\ \lambda\_{t+1} \end{pmatrix} = \begin{pmatrix} \mathbf{X}\_t \\ \lambda\_t \end{pmatrix} - DG(\mathbf{X}\_t, \lambda\_t)^{-1}G(\mathbf{X}\_t, \lambda\_t). \tag{14}
$$

In our computations, we have omitted *M* in *DG*. A Newton step is hence obtained by solving

$$
\begin{pmatrix} \nabla^2 \mathcal{H}\_\mathbf{F}(\mathbf{X}\_t) & H^\top \\ \vec{H} & 0 \end{pmatrix} \begin{pmatrix} \mathbf{X}\_{t+1} - \mathbf{X}\_t \\ \lambda\_{t+1} - \lambda\_t \end{pmatrix} = - \begin{pmatrix} \nabla \mathcal{H}\_\mathbf{F}(\mathbf{X}\_t) + H^\top \lambda\_t \\ \vec{h}(\mathbf{X}\_t) \end{pmatrix}. \tag{15}
$$

### *3.2. Handling Inequalities*

In order to handle inequalities, we have chosen an active set approach which we will discuss in the following. This approach is straightforward; however, it has led to satisfying results in our computations, in particular when the initial candidate set was computed by the evolutionary algorithm.

Assume problem (2) contains inequalities of the form

$$g(\mathbf{x}) \le 0,\tag{16}$$

where *<sup>g</sup>*(**x**)=(*g*1(**x**), ... , *gm*(**x**)) and *gi* : <sup>R</sup>*<sup>n</sup>* <sup>→</sup> <sup>R</sup>, *<sup>i</sup>* <sup>=</sup> 1, ... , *<sup>m</sup>*, is the *<sup>i</sup>*-th inequality constraint. Analogous to the equality-constrained case, we define the feasibility of **X** = (*x* (1) <sup>1</sup> ,..., *x* (1) *<sup>n</sup>* , *x* (2) <sup>1</sup> ,..., *x* (2) *<sup>n</sup>* ,..., *x* (*μ*) <sup>1</sup> ,..., *x* (*μ*) *<sup>n</sup>* ) by

$$g\_i(\mathbf{x}^{(j)}) \le 0, \qquad i = 1, \dots, m, \ j = 1, \dots, \mu. \tag{17}$$

Define for *i* ∈ {1, . . . , *m*} and *j* ∈ {1, . . . , *μ*}

$$g\_{i,j}: \mathbb{R}^{\mu n} \to \mathbb{R}, \qquad g\_{i,j}(\mathbf{X}) = g\_i(\mathbf{x}^{(j)}) \tag{18}$$

and *<sup>g</sup>*¯ : <sup>R</sup>*μ<sup>n</sup>* <sup>→</sup> <sup>R</sup>*mn* by

$$\mathbf{g}(\mathbf{X}) = \begin{pmatrix} \mathcal{g}\_{1,1}(\mathbf{X}) \\ h\_{2,1}(\mathbf{X}) \\ \vdots \\ h\_{m,1}(\mathbf{X}) \\ h\_{1,2}(\mathbf{X}) \\ \vdots \\ h\_{m,2}(\mathbf{X}) \\ \vdots \\ h\_{m,2}(\mathbf{X}) \\ \vdots \\ h\_{m,n}(\mathbf{X}) \end{pmatrix} =: \begin{pmatrix} \mathcal{g}\_{1}(\mathbf{X}) \\ \mathcal{g}\_{2}(\mathbf{X}) \\ \vdots \\ \mathcal{g}\_{m}(\mathbf{X}) \\ \mathcal{g}\_{m+1}(\mathbf{X}) \\ \vdots \\ \mathcal{g}\_{2m}(\mathbf{X}) \\ \vdots \\ \mathcal{g}\_{2m}(\mathbf{X}) \\ \vdots \\ \mathcal{g}\_{nm}(\mathbf{X}) \end{pmatrix}. \tag{19}$$

The active set we have used is as follows: if for an inequality constraint it holds

$$
\mathfrak{g}\_l(\mathfrak{X}) > -\text{tol} \tag{20}
$$

for a given tolerance tol > 0 at **X**, then we impose the equality

$$
\overline{g}\_l(\mathbf{X}) = 0,\tag{21}
$$

(i.e., it will be added to the set of equalities) while all other inequalities are disregarded at **X**.

### *3.3. Handling Dominated Points*

Since Newton's method tends to realize relatively longer steps, it often occurs that some decision points are dominated after a Newton step/iteration. Therefore, it is necessary to discuss how the equality-constrained HVN method behaves in this case. For the reason that will become clear during our discussion, we will investigate two scenarios: (1) *infeasible and dominated points* and (2) *feasible but dominated points*.

For the first scenario, we consider the simplest case, where *p* = 1 and there is only one dominated point. Without loss of generality, we can assume that for an approximation set **<sup>X</sup>** <sup>=</sup> {**x**(1), **<sup>x</sup>**(2), ... , **<sup>x</sup>**(*μ*)}⊆X , **<sup>x</sup>**(1) is dominated by at least one of the remaining *<sup>μ</sup>* <sup>−</sup> <sup>1</sup> points (as the indices are assigned to **X** arbitrarily). Denoting by **X**(−1) the approximation set after removing **x**(1), we can express the constraint function on **X**(−1) as:

$$\begin{split} & \bar{h}^\*(\mathbf{X}^{(-1)}) \colon \mathbb{R}^{(\mu-1)n} \to \mathbb{R}^{\mu-1}, \quad \mathbf{X}^{(-1)} \mapsto \left( \bar{h}^\*\_2(\mathbf{X}^{(-1)}), \bar{h}^\*\_3(\mathbf{X}^{(-1)}), \dots, \bar{h}^\*\_{\mu}(\mathbf{X}^{(-1)}) \right)^\top, \\ & \bar{h}^\*\_j(\mathbf{X}^{(-1)}) \colon \mathbb{R}^{(\mu-1)n} \to \mathbb{R}, \qquad \mathbf{X}^{(-1)} \mapsto h(\mathbf{x}^{(j)}), \quad j \in [2 \dots \mu]. \end{split}$$

Note that we are only considering the special case of one constraint, i.e., *p* = 1. The root finding problem *G* can re-expressed in the following form, equivalent to Equation (11):

$$G(\mathbf{X}, \boldsymbol{\lambda}) = \begin{pmatrix} \lambda\_1 \nabla h(\mathbf{x}^{(1)}) \\ \nabla \mathcal{H}\_\mathbf{F} \Big( \mathbf{x}^{(-1)} \Big) + \sum\_{i=2}^\mu \lambda\_i \nabla \bar{h}\_j^\*(\mathbf{X}^{(-1)}) \\ h(\mathbf{x}^{(1)}) \\ \bar{h}^\*(\mathbf{X}^{(-1)}) \end{pmatrix}.$$

Let *μ* = *μ* − 1 and **H X**(−1) = [∇¯ *h*∗ <sup>2</sup> (**X**(−<sup>1</sup>)), ... , <sup>∇</sup>¯ *h*∗ *<sup>μ</sup>*(**X**(−<sup>1</sup>))] <sup>∈</sup> <sup>R</sup>*μ n*×*μ* , we express the derivative of *G* as a block matrix:

.

$$DG(\mathbf{X},\boldsymbol{\lambda}) = \begin{bmatrix} \frac{\lambda\_{1}\nabla^{2}h(\mathbf{x}^{(1)})}{\mathbf{0}\_{\boldsymbol{\mu}^{\prime}\boldsymbol{n}\times\mathbf{H}}} & \mathbf{0}\_{\boldsymbol{n}\times\boldsymbol{\mu}^{\prime}\boldsymbol{n}} & \nabla h(\mathbf{x}^{(1)}) & \mathbf{0}\_{\boldsymbol{n}\times\boldsymbol{\mu}^{\prime}} \\ \frac{\mathbf{0}\_{\boldsymbol{\mu}^{\prime}\boldsymbol{n}\times\mathbf{H}}}{\nabla h(\mathbf{x}^{(1)})^{\top}} & \nabla^{2}\mathcal{H}\_{\mathbf{F}}\Big{(}\mathbf{X}^{(-1)}\Big{)} + \sum\_{j=2}^{\mu}\nabla^{2}\tilde{h}\_{j}^{\ast}\Big{(}\mathbf{X}^{(-1)}\Big{)} & \mathbf{0}\_{\boldsymbol{\mu}^{\prime}\boldsymbol{n}\times\mathbf{1}} & \mathbf{H}\Big{(}\mathbf{X}^{(-1)}\Big{)} \\ \nabla h(\mathbf{x}^{(1)})^{\top} & \mathbf{0}\_{1\times\boldsymbol{\mu}^{\prime}\boldsymbol{n}} & \mathbf{0} & \mathbf{0}\_{1\times\boldsymbol{\mu}^{\prime}} \\ \mathbf{0}\_{\boldsymbol{\mu}^{\prime}\times\mathbf{n}} & \mathbf{H}\Big{(}\mathbf{X}^{(-1)}\Big{)}^{\top} & \mathbf{0}\_{\boldsymbol{\mu}^{\prime}\times\mathbf{1}} & \mathbf{0}\_{\boldsymbol{\mu}^{\prime}\times\mathbf{M}^{\prime}} \end{bmatrix}$$

Note that the upper left 2 <sup>×</sup> 2 block equals <sup>∇</sup>2H**F**(**X**) <sup>+</sup> <sup>∑</sup>*<sup>μ</sup> <sup>i</sup>*=<sup>1</sup> <sup>∇</sup>2*h*<sup>−</sup> *<sup>i</sup>* (**X**). The inverse of *DG* can be obtained by applying the Schur complement recursively (first consider the block partition indicated above and then apply it again to each partition), provided that both <sup>∇</sup>2*h*(**x**(1)) and <sup>∇</sup>2H**<sup>F</sup> X**(−1) are non-singular.

After simplification, the inverse of *DG* admits the following form:

[*DG*(**X**, *λ*)]−<sup>1</sup> = ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ **<sup>I</sup>***n*×*<sup>n</sup>* <sup>−</sup> - **<sup>g</sup>Ag**−<sup>1</sup> **Agg A 0***n*×*μ<sup>n</sup>* - **<sup>g</sup>Ag**−<sup>1</sup> **Ag 0***n*×*μ* **0***μn*×*<sup>n</sup>* **B** - **<sup>I</sup>** <sup>−</sup> **<sup>H</sup>**(**<sup>H</sup>BH**)−1**<sup>H</sup><sup>B</sup> <sup>0</sup>***μn*×<sup>1</sup> **<sup>0</sup>***μn*×*μ* - **<sup>g</sup>Ag**−<sup>1</sup> (**Ag**) **<sup>0</sup>**1×*μ<sup>n</sup>* <sup>−</sup>- **<sup>g</sup>Ag**−<sup>1</sup> **<sup>0</sup>**1×*μ* **<sup>0</sup>***μn*×*<sup>n</sup>* **<sup>0</sup>***μ*×*μ<sup>n</sup>* **<sup>0</sup>***μ*×<sup>1</sup> <sup>−</sup>(**<sup>H</sup>BH**)−<sup>1</sup> ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ ,

where **<sup>g</sup>** <sup>=</sup> <sup>∇</sup>*h*(**x**(1)), **<sup>A</sup>** = [*λ*1∇2*h*(**x**(1))]−1, **<sup>H</sup>** <sup>=</sup> **<sup>H</sup>**(**X**(−<sup>1</sup>)), and

$$\mathbf{B} = \left[ \nabla^2 \mathcal{H}\_\mathbf{F} \left( \mathbf{X}^{(-1)} \right) + \sum\_{j=2}^{\mu} \nabla^2 \bar{h}\_j^\* \left( \mathbf{X}^{(-1)} \right) \right]^{-1}.$$

The first row of blocks is of particular interest to us since it determines the search step of **x**(1). It is obvious that

$$\begin{split} \Delta \mathbf{x}^{(1)} &= -\Big( [DG(\mathbf{X}, \boldsymbol{\lambda})]^{-1} \Big)\_{[1:n, 1: \mu(n+1)]} \mathbf{G}(\mathbf{X}, \boldsymbol{\lambda}) \\ &= -\Big( \lambda\_1 \Big( \mathbf{I}\_{\mathbf{H} \times \mathbf{H}} - \left( \mathbf{g}^{\top} \mathbf{A} \mathbf{g} \right)^{-1} \mathbf{A} \mathbf{g} \mathbf{g}^{\top} \Big) \mathbf{A} \mathbf{g} + h(\mathbf{x}^{(1)}) \Big( \mathbf{g}^{\top} \mathbf{A} \mathbf{g} \Big)^{-1} \mathbf{A} \mathbf{g} \Big) \\ &= -\frac{h(\mathbf{x}^{(1)})}{\nabla h(\mathbf{x}^{(1)})^{\top} [\nabla^{2} h(\mathbf{x}^{(1)})]^{-1} \nabla h(\mathbf{x}^{(1)})} [\nabla^{2} h(\mathbf{x}^{(1)})]^{-1} \nabla h(\mathbf{x}^{(1)}), \end{split} \tag{22}$$

where notation (**M**)[1:*n*,1:*μ*(*n*+1)] takes rows from 1 to *n* and columns from 1 to *μ*(*n* + 1) in matrix **M**. Similarly, the search step of the dual variable is:

$$\begin{split} \Delta\lambda\_{1} &= -\left( \left[ DG(\mathbf{X},\boldsymbol{\lambda}) \right]^{-1} \right)\_{\left[ \mu n+1,1;\mu(n+1) \right]} G(\mathbf{X},\boldsymbol{\lambda}) \\ &= -\lambda\_{1} \left( \mathbf{g}^{\top} \mathbf{A} \mathbf{g} \right)^{-1} (\mathbf{A} \mathbf{g})^{\top} \mathbf{g} + \left( \mathbf{g}^{\top} \mathbf{A} \mathbf{g} \right)^{-1} h(\mathbf{x}^{(1)}) \\ &= \lambda\_{1} \left( \frac{h(\mathbf{x}^{(1)})}{\nabla h(\mathbf{x}^{(1)})^{\top} [\nabla^{2} h(\mathbf{x}^{(1)})]^{-1} \nabla h(\mathbf{x}^{(1)})} - 1 \right). \end{split} \tag{23}$$

Now, consider the function ˆ *h*(**x**) = *h*2(**x**)/2, whose first- and second-order derivatives are:

$$
\nabla \hat{h}(\mathbf{x}) = h(\mathbf{x}) \nabla h(\mathbf{x}), \quad \nabla^2 \hat{h}(\mathbf{x}) = h(\mathbf{x}) \nabla^2 h(\mathbf{x}) + \nabla h(\mathbf{x}) \nabla h(\mathbf{x})^\top.
$$

The global minimum/maximum of ˆ *h* corresponds to the feasible set, i.e., *h*(**x**) = 0. Hence, Newton iterations that optimize ˆ *h* will equivalently find the feasible set. Computing the Newton direction of ˆ *h*, we have:

$$\begin{split} & - \left[ \nabla^2 \hat{h}(\mathbf{x}) \right]^{-1} \hat{h}(\mathbf{x}) \\ &= - \left[ h(\mathbf{x}) \nabla^2 h(\mathbf{x}) \right]^{-1} \left( \mathbf{I}\_{\mathbf{x} \times \mathbf{z}} - \frac{\nabla h(\mathbf{x}) \nabla h(\mathbf{x})^\top \left[ h(\mathbf{x}) \nabla^2 h(\mathbf{x}) \right]^{-1}}{1 + \nabla h(\mathbf{x})^\top \left[ h(\mathbf{x}) \nabla^2 h(\mathbf{x}) \right]^{-1} \nabla h(\mathbf{x})} \right) h(\mathbf{x}) \nabla h(\mathbf{x}) \\ &= - \frac{h(\mathbf{x})}{h(\mathbf{x}) + \nabla h(\mathbf{x})^\top \left[ \nabla^2 h(\mathbf{x}) \right]^{-1} \nabla h(\mathbf{x})} [\nabla^2 h(\mathbf{x})]^{-1} \nabla h(\mathbf{x}). \end{split} \tag{24}$$

Setting **x** = **x**(1) in the above equation and comparing it to Equation (22), we notice that the Newton direction of ˆ *h* and the hypervolume Newton step Δ**x**(1) only differ by a scalar, which can be neglected in practice since we implement a step-size control to re-scale the search step (see the following sub-section). Therefore, we conclude that for infeasible and dominated points, our HVN method (Equation (15)) only considers the constraint function and moves such decision points to the feasible set rapidly (ideally at quadratic speed when the point is close to the feasible set). This satisfactory property allows for handling infeasible and dominated points without modifying our HVN method.

In addition, due to this nice property, an infeasible point will eventually lie on the feasible set, where it can still be dominated if other feasible points exist. This is precisely the second scenario of our discussion, in which the hypervolume of feasible but dominated points will be zero. To move such points, we propose to employ the famous *non-dominated sorting* [36] procedure, where we partition all feasible points into "layers" of mutually nondominated ones (formally, anti-chains of Pareto order) and compute the Newton direction for each layer (using Equation (15)) regardless of other dominating layers. In this manner, the HVN method can move all feasible points along the feasible set for achieving a good distribution.

### *3.4. The HVN Method for Constrained MOPs*

Taking the above considerations regarding the HVN method, in this section, we aim to devise and implement a standalone HVN algorithm, which is outlined in Algorithm 1. First, we check if any decision point is feasible (i.e., **h**(**x**) = **0** for some **x**), where the feasibility can be tested numerically with a pre-defined small threshold (e.g., 10−<sup>4</sup> used in this work) for the equality constraints. Then, we employ the non-dominated sorting point procedure [36] to partition the feasible points **X**<sup>f</sup> into "layers" of mutually non-dominated ones, where the Newton direction (Equation (15)) is calculated separately on each layer. Taking *L* for the indices of points in a layer and **X**f[*L*] for the subset indexed by *L*, we express this partitioning as **X**f[*L*1] ≺ **X**f[*L*2] ≺ ··· ≺ **X**f[*Lq*], ∀*i* = *j*(*Li* ∩ *Lj* = ∅), ∪*iLi* ⊆ [1 ... *μ*]. Note

that the dominance relation for the remaining infeasible and dominated points is not welldefined, considering the equality constraints since they are incomparable to the feasible ones (also among themselves). In this case, we simply merge them into the first layer *L*<sup>1</sup> and compute the Newton direction thereof, which can be justified by the observation in Equations (22) and (24). The resulting search direction of the infeasible and dominated points is a Newton direction of the function *h*2/2. In this treatment, a special case arises when there are no feasible points, usually in the first several iterations of the algorithm.

**Algorithm 1:** Standalone hypervolume Newton algorithm for equalityconstrained MOPs


Finally, another important aspect is the step-size control for each Newton step. We propose maintaining individual step-sizes for each partition, which is determined using the well-known Armijo's backtracking line search [52]. In detail, this method starts with an initial step-size *σ*<sup>0</sup> and tests whether the Euclidean norm of *G*(**X**, *λ*) has sufficiently decreased after applying the Newton step to the primal-dual pair (**X**, *λ*). Since Newton's direction for equality-constrained problems (Equation (15)) is not necessarily an ascent direction for the hypervolume, we take the Euclidean norm ||*G*(**X**, *λ*)|| as the convergence measure since (1) the optimality condition is *G*(**X**, *λ*) = 0 (Equation (11)) and (2) the Newton step is always a descent direction of ||*G*(**X**, *λ*)||. Let **Z** = (**X**, *λ*) be the primal-dual variable and <sup>Δ</sup>**<sup>Z</sup>** <sup>=</sup> <sup>−</sup>[*DG*(**Z**)]−1*G*(**Z**), then we have ( <sup>d</sup> <sup>d</sup>*<sup>σ</sup>* ||*G*(**Z** + *σ*Δ**Z**)||)|*σ*=<sup>0</sup> = −||*G*(**Z**)|| ≤ 0. If the test fails, then we halve the step-size and repeat the test. Notably, for infeasible and dominated points, the test checks if the value of the squared constraint value is sufficiently

decreased as the HVN method computes the Newton direction of *h*2/2 for those points. In our implementation, we use maximally six iterations of such tests, resulting in a minimal step-size of *σ*0/64. As for the initial step-size *σ*0, the commonly used value *σ*<sup>0</sup> = 1 often leads to Newton steps that jump out of the decision space when the Newton direction is large or the point is in the vicinity of the decision boundary. Therefore, we set it to the minimum of one and the maximal step-size that the primal vector **X** can take without leaving the decision space, i.e., *σ*<sup>0</sup> = min{1, *σ*max}. The value of *σ*max can be calculated in a straightforward way when the decision space is a convex and compact subset of R*n*, e.g., a hyperbox.

### *3.5. Computational Cost*

The above method requires the knowledge of the Jacobian and the Hessian of both objective and constraint functions. In this work, we have used automatic differentiation (AD) techniques [53]. Note that finite differences can also be utilized when the AD-computation is not applicable. The AD-computation takes maximally four times the used multiply–add operations taken in evaluating the function value [54]. Hence, to make a fair comparison between HVN and MOEA methods, we will take 4 function evaluations (FEs) and 4 + 6*n* FEs to quantify the computational cost of each AD-computed Jacobian and Hessian, respectively. In total, the number of FEs consumed in each iteration comprises:

$$\#\text{FEs}: \underbrace{\mu}\_{F} + \underbrace{\mu}\_{h} + \underbrace{4\mu}\_{\nabla F} + \underbrace{4\mu}\_{\nabla h} + \underbrace{4\mu}\_{\nabla h} + \underbrace{(4+6n)\mu}\_{\nabla^2 F} + \underbrace{(4+6n)\mu}\_{\nabla^2 h} + \underbrace{6(4\mu+4\mu)}\_{\text{step-size control}} = (69+12n)\mu, \quad \mu = 1, 2, 3, \dots$$

which amounts to computations of function evaluation, constraint evaluation, Jacobian of the objective function and the constraint, Hessian of the objective function and the constraint, and the backtracking line search of the step-size. Computing the hypervolume Hessian takes Θ((*μn*)3) time in addition to the AD-computation of derivatives in Equation (1). For solving Equation (15), we use Cholesky decomposition, which has a computational complexity of *O*((*μ*(*n* + *p*))3). It is certainly desired either to have an analytic expression of the HV Hessian or to exploit the block diagonal structure this matrix will certainly have for AD, which we, however, have to leave for future research.

We have implemented the standalone algorithm in Python, which is accessible at https://github.com/wangronin/HypervolumeDerivatives (accessed on 1 November 2022).

### **4. Numerical Results**

In this section, we present some numerical results of the HVN both as standalone algorithms as well as a local search engine within the NSGA-III algorithm.

### *4.1. HVN as Standalone Algorithm*

We showcase the behavior of the proposed Newton method as a standalone method on three example problems:

$$\begin{split} \text{(P1)}: \quad F(\mathbf{x}) &= \left[ (\mathbf{x} - \mathbf{1})^2, (\mathbf{x} + \mathbf{1})^2 \right]^\top, \\ h(\mathbf{x}) &= \mathbf{x}^2 - \mathbf{1}, \quad \mathcal{X} = [-2, 2]^2, \quad \mathbf{r} = [20, 20]^\top. \end{split}$$

$$\begin{split} \text{(P2)}: \quad F(\mathbf{x}) &= \left[ (\mathbf{x} - (1, 1, 0)^\top)^2, (\mathbf{x} - (1, -1, 0)^\top)^2, (\mathbf{x} - (-1, 1, 0)^\top)^2 \right]^\top, \\ h(\mathbf{x}) &= \left( \mathbf{x} - \left( \frac{2\sqrt{3}}{3} - 1, 0, -1.5 \right)^\top \right)^\top - 1, \quad \mathcal{X} = [-2, 2]^3, \quad \mathbf{r} = [38, 38, 38]^\top. \end{split}$$

$$\begin{split} \text{(P3)}: \quad F(\mathbf{x}) &= \left[ (\mathbf{x} + (1, 1, 1)^\top)^2, (\mathbf{x} + (1, 0, 0)^\top)^2, (\mathbf{x} + (2, 2, -4)^\top)^2 \right]^\top, \\ g(\mathbf{x}) &= -\mathbf{x}\_0, \quad \mathcal{X} = [-4, 4]^3, \quad \mathbf{r} = [90, 90, 90]^\top. \end{split}$$

Importantly, we will use different initializations of the decision points that are specific to each problem in order to investigate the behavior of the standalone HVN with respect to the characteristic of each problem; We do not aim to provide a unified and systematic initialization method for the standalone HVN in this section. Note that problem P3 defines an inequality constraint on the first component *x*<sup>0</sup> of the decision point, where the feasible set is {**x** ∈ X : *x*<sup>0</sup> ≥ 0}, and the optimum lies on the active set of *g*, i.e., *x*<sup>0</sup> = 0. This problem is meant to test if the proposed HVN algorithm can manage to solve inequalityconstrained problems where the optimum is on the active set of the constraint. To measure the empirical performance of the HVN algorithm, we take the Euclidean norm ||*G*(**X**, *λ*)|| since the Newton direction is not necessarily an ascent direction for the hypervolume.

Moreover, since it is well-known that the Newton-like method can be affected by choice of initial solutions, we investigate the performance of the HVN algorithm on problem P1 with three different initializations. Specifically, in the two-dimensional decision space, we create *μ* = 50 initial decision points on the line segment *x*<sup>2</sup> = *x*<sup>1</sup> − 2, *x*<sup>1</sup> ∈ [0, 2], where we determine the value of *x*<sup>1</sup> by (i) taking evenly spaced points (linear), (ii) logistictransformed evenly spaced points (which makes the points denser around the tails of the line segment), or (iii) logit-transformed evenly spaced points (higher density of points in the middle). The results are illustrated in Figure 2 and Table 1, which shows a set of well-distributed points on the feasible set in the objective space (the red dashed sphere) for all three initializations. In addition, the empirical convergence rate is quadratic regardless of the choice of initialization methods, as reported in Table 1.


**Table 1.** The evolution of *G*(**X**, *λ*) on problems P1 with three different initialization strategies.

The results on problem P2 are depicted in Figure 3 and Table 2 for three different sizes *μ* ∈ {20, 40, 60} of the approximation set. The initial decision points are sampled uniformly at random in the convex hull of three points (1, 1, 0),(1, −1, 0), and (−1, 0, 0). Whereas the final approximation set is well-distributed in the objective space, we observe that empirical convergence of ||*G*(**X**, *λ*)|| is considerably rugged in the first 20–25 iterations, after which quadratic convergence appears. This is indeed attributed to the fact that decision points often become dominated in the first couple of iterations on this problem, resulting in zero hypervolume gradient thereof and hence quite a large norm of ||*G*(**X**, *λ*)||. Nevertheless, the proposed treatment of those dominated points (Algorithm 1), which is based on the non-dominated sorting procedure, is capable of bringing the dominated points to the active set with a quadratic speed. Similarly, the same ruggedness is seen in the convergence chart of problem P3 (shown in Figure 4). On this problem, we again take the setting *μ* ∈ {20, 40, 60}, and the initial decision points are sampled uniformly at random in the feasible space of [0, 4] × [−4, 4] 2. We extend the HVN algorithm slightly for this inequality-constrained problem in the following way: whenever the decision points are feasible, i.e., *g*(**x**) ≤ 0, and quite distant from the active set (*g*(**x**) = 0, shown as the red plane in Figure 4), we ignore the constraint function when computing the Newton step. When the feasible decision points are sufficiently close to the active set (the distance is

less than 10−<sup>4</sup> in our implementation), we consider *g*(**x**) an equality constraint and utilize Equation (15) to compute the Newton step.

**Figure 2.** On problem P1, the convergence of the HVN method is shown for three different initializations of the starting approximation set (*μ* = 50)—linear (**top row**), logistic (**middle**), and logit spacing (**bottom**). We depict the final approximation set (**left column**; green stars), the corresponding objective points (**middle column**; green stars), and the evolution of the HV value and *G*(**X**, *λ*) (**right column**).

Moreover, we test the standalone HVN method on large-scale, complicated MOPs. We choose the well-known DTLZ problems with one spherical constraint [55,56] with *μ* = 200 decision points, resulting in a relatively large Hessian matrix (for an 11-dimensional decision space and one constraint, the *DG*(**X**, *λ*) object is of size 2400 × 2400). In this case, we use sparse matrix operations for computation efficiency, exploiting the sparsity of the Hessian. Since the DTLZ problems are highly multi-modal, the initial approximation set is generated in a local vicinity of the Pareto set, i.e., **X**<sup>∗</sup> + 0.02U(0, 1), where **X**<sup>∗</sup> is sampled uniformly at random on the Pareto set. We execute the standalone HVN method for 15 iterations and illustrate the result in Figure 5. In the plot, we observe well-distributed final points (green dots) in contrast to non-uniform initial ones (black crosses), showing the standalone HVN works properly as a local method for large-scale problems.

**Figure 3.** On problem P2 with a spherical constraint, we depict for three sizes of the approximation set (*μ* ∈ {20, 40, 60}; from **top** to **bottom**), the final approximation set (**left column**; green stars), the corresponding objective points (**middle column**; green stars), and the evolution of the HV value and *G*(**X**, *λ*) (**right column**). The initial points are sampled uniformly at random in the convex hull of three points (1, 1, 0),(1, −1, 0), and (−1, 0, 0).





**Figure 4.** On problem P3 with a spherical constraint, we depict for three sizes of the initial approximation set (*μ* ∈ {20, 40, 60}; from **top** to **bottom**), the final approximation set (**left column**; green stars), the corresponding objective points (**middle column**; green stars), and the evolution of the HV value and *G*(**X**) (**right column**). The initial decision points are sampled uniformly at random in the feasible space of [0, 4] × [−4, 4] 2.

**Figure 5.** On Eq-DTLZ1-3 problems, the HVN method starts from a small local perturbation (black crosses) of the Pareto set (sphere in the decision space), i.e., **X**<sup>∗</sup> + 0.02U(0, 1), where **X**<sup>∗</sup> (of size 200) is sampled uniformly at random on the Pareto set. The final approximation set of the HVN method is depicted as green points. Only the first three search dimensions are shown for the decision space.

### *4.2. HVN within NSGA-III*

In this section, we investigate the empirical performance of the HVN algorithm on more complicated, equality-constrained DTLZ (Eq-DTLZ) problems [55,56] and their inverted counterparts (Eq-IDTLZ). As Newton-like algorithms are local methods, running the standalone algorithm (Algorithm 1) will stagnate at local Pareto sets. Therefore, we hybridize the HVN algorithm with an MOEA, in which we first execute the MOEA for a pre-defined budget to overcome the local optimum and get close to the global Pareto set, and then initialize the HVN algorithm from the final approximation set of the MOEA to make local refinements. We summarize this hybrid approach in Algorithm 2. Notably, in line 3, we transfer the whole approximation set (rather than only the non-dominated points) to HVN upon the termination of MOEA since the standalone HVN method is able to move dominated points towards the feasible set at quadratic speed, as proven in Section 3.3.


The following empirical study aims to check whether the hybridization approach can achieve a better final approximation set/front than an MOEA alone under the same computation budget. As for the test problem, a single spherical constraint *h*(**x**)=(*x*<sup>1</sup> − 0.5)<sup>2</sup> + (*x*<sup>2</sup> <sup>−</sup> 0.5)<sup>2</sup> <sup>−</sup> 0.16 is imposed on problems DTLZ1 <sup>−</sup> 4. The decision space is [0, 1] 11, the reference point is **r** = (1, 1, 1) for HVN, and the approximation set is of size 200. Here, we choose the well-known NSGA-III algorithm [34,35], where the equality constraints are handled using the adaptive *ε*-constraint handling technique. We utilize the implementation

in the Pymoo library: https://pymoo.org/constraints/eps.html (accessed on 1 November 2022). The method considers a solution feasible subject to a small *ε* threshold, which decreases linearly to zero. The initial value of *ε* is set to the average constraint value of the initial population. In our experiment, we control the *ε* decrease to zero after 50% of the iterations of NSGA-III. In addition, we use Das and Dennis's approach [28] to generate well-spaced reference directions (18 partitions which lead to 190 directions) for NSGA-III. As for its hyperparameters, we use the default setting: *η* = 30 and *p* = 1 for simulated binary crossover and *η* = 20 for polynomial mutation. Furthermore, the hybrid algorithm first executes NSGA-III with *μ* = 200 for 1000 iterations and then runs the HVN method for 10 iterations. In HVN, the total function evaluations and AD operations take ca. 270 s CPU time on an Intel(R) Core(TM) i5-8257U CPU. Considering the CPU time of a single function evaluation, which is on average ca. 5.6 <sup>×</sup> <sup>10</sup>−<sup>5</sup> s measured on the same hardware, the total function evaluations plus the AD operations are equivalent to roughly 270/5.6 <sup>×</sup> <sup>10</sup>−<sup>5</sup> <sup>≈</sup> 4.8 <sup>×</sup> <sup>10</sup><sup>5</sup> FEs. Therefore, the total budget of the hybrid algorithm is roughly 4.8 <sup>×</sup> <sup>10</sup>5/200 <sup>+</sup> <sup>1000</sup> <sup>≈</sup> 3400 iterations. We will execute the standalone NSGA-III algorithm for the same iterations to keep the fairness of comparisons.

We first depict one example of the final approximation set (only the non-dominated subset is shown) in Figure 6 for both methods, where we clearly observe that the hybridization achieves much more non-dominated points than NSGA-III. Second, we show, in Table 3, the hypervolume indicator value and the number of final non-dominated points for both algorithms obtained from 15 independent runs. In addition, we compute the above metrics for the hybrid algorithm right before the HVN phase starts (NSGA-III (1000) in the table), showing the progress that HVN manages to make. From the results, we conclude that the hybrid algorithm significantly improves upon the hypervolume metric and outputs substantially more non-dominated points than NSGA-III alone. We conjecture that the observed advantage of the hybrid algorithm is very likely attributed to HVN's ability to move dominated points to the feasible set with quadratic convergence (see Section 3), which disregards the objective function and thereby its multi-modal landscape.

**Table 3.** On Eq-DTLZ1-4 and Eq-IDTLZ1-4 problems, the sample mean and standard error of the hypervolume (HV) value and the number of final non-dominated (ND) points over 15 independent runs for each algorithm. The hypervolume values are computed with reference point (1, 1, 1) for all problems except Eq-DTLZ4, Eq-IDTLZ3, and Eq-IDTLZ4, which we use (1.2, <sup>5</sup> <sup>×</sup> <sup>10</sup>−3, <sup>5</sup> <sup>×</sup> <sup>10</sup>−4), (800, 800, 700), and (−0.4, 0.6, 0.6), respectively. The initial population is *μ* = 200 for all algorithms. Hybridization = NSGA-III (iter = 1000) + HVN (iter = 10), which consumes roughly the same CPU time on function evaluations with NSGA-III for 3400 iterations (see caption of Figure 6 for the detail).


**Figure 6.** On the Eq-DTLZ2 (**a**) and the Eq-IDTLZ1 (**b**) problem, we compare the hybridization of HVN and NSGA-III to NSGA-III with roughly the same budget: for the former, the hybrid algorithm first executes NSGA-III with *μ* = 200 for 1000 iterations and then runs the HVN method for 10 iterations. In HVN, the total function evaluations and AD takes ca. 270 s CPU time on an Intel(R) Core(TM) i5-8257U CPU, which corresponds to ca. 4.8 <sup>×</sup> 105 FEs. Hence, for the latter, we set 3400 (=4.8×105/200 <sup>+</sup> 1000) iterations in total for *<sup>μ</sup>* <sup>=</sup> 200. We use the same hyperparameter setting for the standalone NSGA-III and the one used in the hybridization. The decision space is [0, 1] 11, and the reference point is (1, 1, 1) for HVN.

### **5. Conclusions**

In this paper, we propose a hypervolume Newton method for equality-constrained multi-objective optimization problems (MOPs) under the assumption that both the objective and the constraint functions are twice continuously differentiable. Based on previous works on set-oriented hypervolume Hessian matrix and hypervolume Newton (HVN) method for unconstrained MOPs, we propose, in this paper, the generalization of the HVN for equalityconstrained problems and also elaborate a treatment for inequality-constrained based on an active set approach, which regards an inequality function as equality if the constraint values are within some small tolerance. In addition, we devised and tested two resulting algorithms: the standalone HVN method as an efficient local optimizer and a hybridization of the HVN and an MOEA for solving complicated and multi-modal MOPs. Moreover, in detail, we discuss the search direction for dominated points obtained from the set-oriented

Newton step in which we prove that for dominated and infeasible points, the computed search step is the Newton step of the squared equality constraint function. Therefore, our HVN method can efficiently steer the non-dominated and dominated decision points.

We first illustrate the empirical behavior of the standalone algorithm on three simple MOPs, where we observe quadratic convergence of the two-norm of the root finding problem *G*. Then, on highly multi-modal DTLZ problems with one spherical constraint (Eq-DTLZ), we tested the local convergence of the standalone HVN algorithm with a relatively large approximation set (*μ* = 200) by initializing the approximation set in the neighborhood around the Pareto set, which shows a fast convergence to well-distributed points on the feasible set. Finally, we benchmark the hybrid algorithm against NSGA-III on Eq-DTLZ1-4 and Eq-IDTLZ1-4 problems, in which we observe that with roughly the same computational budget, the hybrid algorithm achieves substantially more non-dominated points in the final population, which leads to significantly higher hypervolume values. We conjecture that such an advantage is attributed to (1) the fast local convergence of the HVN method and (2) HVN's ability to move infeasible and dominated points.

For future works, we contemplate (1) testing the hybridization of the HVN method with other EMOAs for more than three objectives, e.g., SMS-EMOA, to investigate the benefit of the HVN method in a broader setup; (2) comparing the hybrid HVN method to other state-of-the-art algorithms, e.g., MOEA/D (decomposition-based), EHVI-EGO (Bayesian optimization), or the average Hausdorff distance-based Newton method (mathematical optimization) on complex, or even real-world MOPs with multiple non-linear constraint functions; (3) investigating the analytical expression (as sketched in Figure 1) and computation of the hypervolume Hessian matrix, which can reduce the computation cost of the HVN method; (4) devising generic methodologies to handle inequality constraints for the HVN method, which will make it more applicable in practice; (5) extending the HVN to methods that provide non-zero sub-gradients for dominated points as in [17,18]; and (6) incorporating a surrogate-assisted method for tackling high-dimensional and complex problems, e.g., as in [57].

**Author Contributions:** Conceptualization, O.S., H.W., A.D., M.E. and V.A.S.H.; methodology, O.S. and H.W.; software, M.E. geometrical analysis, and visualization, H.W. and V.A.S.H.; validation, H.W., M.E. and O.S.; formal analysis, H.W., O.S. and M.E.; investigation, H.W. and O.S.; resources, O.S.; data curation, H.W.; writing—original draft preparation, all; writing—review and editing, all; visualization, H.W.; supervision, O.S. and M.E.; project administration, O.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Data Availability Statement:** We have hosted all the data sets of this work on Zenodo: https: //doi.org/10.5281/zenodo.7509148.

**Acknowledgments:** We dedicate this work to Kalyanmoy Deb for his pioneering, inspiring, and fundamental contributions to the evolutionary multi-objective optimization (EMO) community, and particularly for his famous non-dominated sorting procedure, which plays a crucial role in this work in order to efficiently handle dominated points that can be generated throughout the Newton iteration.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

### *Article* **COVID-19 Data Analysis with a Multi-Objective Evolutionary Algorithm for Causal Association Rule Mining**

**Santiago Sinisterra-Sierra 1, Salvador Godoy-Calderón <sup>1</sup> and Miriam Pescador-Rojas 1,2,\***


**Abstract:** Association rule mining plays a crucial role in the medical area in discovering interesting relationships among the attributes of a data set. Traditional association rule mining algorithms such as *Apriori*, FP growth, or Eclat require considerable computational resources and generate large volumes of rules. Moreover, these techniques depend on user-defined thresholds which can inadvertently cause the algorithm to omit some interesting rules. In order to solve such challenges, we propose an evolutionary multi-objective algorithm based on NSGA-II to guide the mining process in a data set composed of 15.5 million records with official data describing the COVID-19 pandemic in Mexico. We tested different scenarios optimizing classical and causal estimation measures in four waves, defined as the periods of time where the number of people with COVID-19 increased. The proposed contributions generate, recombine, and evaluate patterns, focusing on recovering promising highquality rules with actionable cause–effect relationships among the attributes to identify which groups are more susceptible to disease or what combinations of conditions are necessary to receive certain types of medical care.

**Keywords:** association rule mining; causality measures; multi-objective evolutionary algorithm; COVID-19 data

### **1. Introduction**

The coronavirus (COVID-19) pandemic has affected societies around the world for more than two years now since 11 March 2020, when the World Health Organization recognized the pandemic [1]. However, unlike similar phenomena experienced several times in human history, this pandemic has been meticulously documented, with millions of records about almost any conceivable aspect of the phenomenon's mechanics, including hospital occupation, infection and death rates, medical care protocols, and medication availability. Even government reactions, safety measures taken, social responsibility, and economic consequences have also been recorded [2]. The availability of this enormous amount of data poses an opportunity to test traditional data mining and knowledge discovery techniques and algorithms, as well as design and test new ones. Association rule mining is the most widely used technique when the goal is to reveal behavioral patterns in phenomena.

As is always the case, both private institutions and government agencies focus their attention only on mined information that is considered useful, namely behavioral patterns that can suggest some course of action to take. In that sense, traditional association rule mining is not enough, and causal rules are needed. Instead of discovering associations that have only strong statistical presence in the data set, causal rule mining aims to discover causality relations that hold in the studied phenomenon, particularly relations that can bring some degree of certainty about the future effects to indicate the rule evaluation measures and regulations established to cope with a situation.

From the computational viewpoint, data sets consisting of thousands of millions of records are not the ideal scenario for performing data mining. Exhaustive search techniques

**Citation:** Sinisterra-Sierra, S.; Godoy-Calderón, S.; Pescador-Rojas, M. COVID-19 Data Analysis with a Multi-Objective Evolutionary Algorithm for Causal Association Rule Mining. *Math. Comput. Appl.* **2023**, *28*, 12. https://doi.org/ 10.3390/mca28010012

Academic Editors: Carlos Coello, Erik Goodman, Kaisa Miettinen, Dhish Saxena, Oliver Schütze and Lothar Thiele

Received: 2 November 2022 Revised: 7 January 2023 Accepted: 10 January 2023 Published: 13 January 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

are evidently not an option. More efficient ways to traverse the data and analyze huge search spaces must be selected, but huge search spaces are the specialty of bio-inspired meta-heuristics, which in part explains why some recent papers have used diverse metaheuristics as the guiding tool to perform data mining [3]. In this paper, we present a new evolutionary algorithm specifically designed to serve both traditional and causal association rule mining. This model allows a more focused data search and offers the user a set of parameters for increased flexibility over the intended mining process. We tested our model with the official COVID-19 pandemic database from the Mexican government [4].

The authors of this article state that the application of artificial evolution processes in this work only partially falls under the field of medicine since no diagnosis, prescription, or treatment decisions are involved. Our mining process only analyzes data from previously treated patients, and an evolutionary algorithm is used as a dynamic model of the studied phenomenon. Moreover, neither the identification nor the interpretation of any rule mined from the database can modify the results of the real phenomenon.

The remainder of this paper is organized as follows. First, we state the conceptual and theoretical basis of the research in Section 2 (Background and Basic Concepts). These include basic concepts about association rule mining, causality relations described by mined rules, and some of the evaluation functions traditionally used to assess the nature and strength of identified causality relations. Then, in Section 3 (Related Previous Works), we briefly review some of the most relevant publications relating to association rule mining and evolutionary algorithms, both as a prediction tool and as a guide for the mining process. Section 4 (Proposal) describes the architecture, mathematical foundations, and implementation details of the proposed causal mining algorithm. This section dives into the artificial evolution process, recombination, and mutation operators, as well as the nuances of the mining process. Section 5 (Experiments and Results) shows the designs of different experimentation scenarios, the conditions of each experiment, the obtained results, and their interpretation. Finally, we draw some relevant conclusions in Section 6.

### **2. Background and Basic Concepts**

### *2.1. Association Rule Mining*

Association rule mining is a set of data analysis techniques aiming to discover the interesting but implicit relational patterns present in a data set. Usually, the data set is expressed in an attribute-value language, and the relations found are expressed as association rules. An *association rule* is a logical expression with the following structure:

$$A\_1 \land A\_2 \land \dots \land A\_m \to \mathbb{C}\_1 \land \mathbb{C}\_2 \land \dots \land \mathbb{C}\_{m'}$$

where both the antecedent (*Ai*) and the consequent (*Cj*) are conjunctive clauses with terms called selectors (item sets). Association rules can be read as "when *A*<sup>1</sup> and ... and *Am* occur in the data set, *C*<sup>1</sup> and . . . and *Cn* also occur".

In traditional association rule mining, a rule is considered interesting if it reveals an association between its antecedent and its consequent with a strong statistical presence (in the source or mine). Since interesting associations can occur in several different ways, evaluation functions are defined for each rule so that the evaluation obtained precisely measures the strength of the association described by the rule. Consequently, there are several measures for assessing the rules discovered during a mining process, such as the classical functions of *support*(*supp*), *confidence*(*conf*), and *lift* defined by Equations (1)–(3), respectively [5]. Traditional association rule mining algorithms seek to find all rules that exceed certain user-defined thresholds for one or more of these functions:

$$\text{supp}(A \to \mathbb{C}) = \frac{|A \cap \mathbb{C}|}{|U|} \tag{1}$$

$$\text{conf}(A \to \mathbb{C}) = \frac{\text{supp}(A \to \mathbb{C})}{\text{supp}(\mathbb{C})} = \frac{P(A \cap \mathbb{C})}{P(\mathbb{C})} \tag{2}$$

$$\operatorname{lift}(A \to \mathbb{C}) = \frac{\operatorname{supp}(A \to \mathbb{C})}{\operatorname{supp}(A) \cdot \operatorname{supp}(\mathbb{C})} = \frac{\operatorname{conf}(A \to \mathbb{C})}{\operatorname{supp}(\mathbb{C})} \tag{3}$$

Here, the support (*supp*) function defined in the above equations computes the quotient of the number of records containing both the A and C item sets and the total number of records (*U*).

A different scenario is found in causal association rule mining. Causal association rules can be read as "The simultaneous occurrence of *A*<sup>1</sup> and *A*<sup>2</sup> and ... and *Am*, causes (is the cause for) the occurrence of *C*<sup>1</sup> and *C*<sup>2</sup> and ... and *Cn*". In causal association rule mining, a rule is considered interesting when it reveals a cause–effect relation between its antecedent and its consequent. Additionally, a causal rule must offer a degree of actionability; that is, it should be possible to modify the situation modeled by the antecedent in order to obtain some specific and predictable effect on the situation modeled by the consequent. Therefore, a causal rule is interesting when it describes a strong causality relation and it has high actionability. However, evaluating those properties is not a trivial task, and that is why the causality relationship has always been elusive to modeling.

Causality has historically been studied from several different perspectives. Within the computational view, actionability is the most important property of a causal model [6]. From the artificial intelligence perspective, Judea Pearl [7] pointed out that an autonomous intelligent system trying to build a model of its environment cannot rely exclusively on preprogrammed causal knowledge. It must have the ability to transform perceptual observations into cause–effect relations. By describing causal relations among the variables considered, a causal model allows estimating new environment states as a result of specific modifications on the causal conditions. In this work, we apply the following causal models to help in the identification and magnitude estimation of the causal effects as well as preview possible actions that could modify the consequent by changing the antecedent.

### 2.1.1. Absolute Risk (*AR*)

In a control case study to verify the hypothesis that "*A* causes *C*", it must first be clear that both the presence and the absence of *A* have measurable effects on *C*. A balanced sample with two data groups is created: the first one, the experimental group with the causal conditions being studied (antecedent *A*), models the rule *A* → *C*, and the second one, the control group without the antecedent, models the rule ¬*A* → *C*. The sample must be balanced. For each observation within the experimental group, there must be another observation within the control group (i.e., both groups must have the same support). Once the control case sample is constructed, the occurrence of the consequent *C* is computed within both groups, and the *confidence* of *A* → *C* is used as the *Experimental Event Rate* (*EER* = *conf*(*A* → *C*)), while the *confidence* of ¬*A* → *C* is used as the *Control Event Rate* (*CER* = *conf*(¬*A* → *C*)). Both *event rates* must then be compared. When the comparison is measured as *EER* − *CER*, the result is labeled the *Absolute Risk* [8] (see Equation (4)). Its range is [−1, 1]. A value greater than zero indicates that the antecedent has a causal effect on the consequent:

$$AR(A \to \mathbb{C}) = \operatorname{conf}(A \to \mathbb{C}) - \operatorname{conf}(\neg A \to \mathbb{C}) = \frac{\operatorname{supp}(A \to \mathbb{C}) - \operatorname{supp}(\neg A \to \mathbb{C})}{\operatorname{supp}(\mathbb{C})} \tag{4}$$

### 2.1.2. Probability of Sufficiency (*PS*)

The probability of sufficiency (*PS*) measures the capacity of *A* to produce *C* when *A* is absent [7]. Equations (5) and (6) represent this measure:

$$PS = \frac{AR}{1 - CER} \tag{5}$$

$$PS = \frac{\operatorname{conf}(A \to \mathbb{C}) - \operatorname{conf}(\neg A \to \mathbb{C})}{1 - \operatorname{conf}(\neg A \to \mathbb{C})} = \frac{\operatorname{supp}(A \to \mathbb{C}) - \operatorname{supp}(\neg A \to \mathbb{C})}{\operatorname{supp}(\mathbb{C}) - \operatorname{supp}(\neg A \to \mathbb{C})} \tag{6}$$

### 2.1.3. Population Attributable Fraction (*PAF*)

The population attributable fraction (*PAF*) or population impact is an evaluation measure used to study the impact of exposure to a specific variable in the population [9]. In data mining, the population refers to the total number of records that show the consequent *C*, the effect being studied. The formula to calculate the impact on the population, proposed by Miettinen [10], is given in Equation (7). The measure involves the support of *C* and the relative risk. The population impact measure has a causal interpretation which indicates the estimated fraction of all observations of the consequent that did not occur when the antecedent also did not occur:

$$AF\_p = supp(\mathbb{C}) \cdot (1 - \frac{1}{RR}) \tag{7}$$

$$AF\_{\mathcal{P}} = \operatorname{supp}(\mathbb{C}) \cdot \left(1 - \frac{\operatorname{conf}(\neg A \to \mathbb{C})}{\operatorname{conf}(A \to \mathbb{C})}\right) = \operatorname{supp}(\mathbb{C}) \cdot \left(1 - \frac{\operatorname{supp}(\neg A \to \mathbb{C})}{\operatorname{supp}(A \to \mathbb{C})}\right) \tag{8}$$

### *2.2. Discrete Multi-Objective Optimization Problems*

Consider a discrete multi-objective optimization problem (DMOP) with *m* objective functions (*fi*, *i* = 1, ... , *m*) and *n* decision variables (*xj*, *j* = 1, ... , *n*). The goal of multiobjective optimization is to minimize all objectives simultaneously. Mathematically, it can be described as follows:

$$\begin{aligned} \text{minimize } \check{f}(\vec{\mathfrak{x}}) &= \left[ f\_1(\vec{\mathfrak{x}}), f\_2(\vec{\mathfrak{x}}), \dots, f\_m(\vec{\mathfrak{x}}) \right]^T \\ \text{subject to } \mathfrak{x} &\in \mathcal{S} \end{aligned} \tag{9}$$

where S = {*x* ∈ N} is the feasible search space and *x* = [*x*1, *x*2,..., *xn*] *<sup>T</sup>* ∈ S is the vector of the decision variables. Each *fi* : <sup>N</sup>*<sup>n</sup>* <sup>→</sup> <sup>R</sup>, *<sup>i</sup>* ∈ {1, ... , *<sup>m</sup>*} is an objective function. Let us assume that we have two vectors *u*,*<sup>v</sup>* <sup>∈</sup> <sup>R</sup>*m*. Then, we say that *<sup>u</sup> dominates <sup>v</sup>* (denoted by *u* ≺ *v*) if *ui* ≤ *vi* for every *i* ∈ {1, ... , *m*}, and *uj* = *vj* for at least one index *j* ∈ {1, ... , *m*}. We say that a decision variable vector *x*<sup>∗</sup> ∈ S is *Pareto optimal* if no other *x* ∈ S such that *f*(*x*) <sup>≺</sup> *f*(*x*∗) exists.

The *Pareto Optimal Set* (*POS*) is defined by *POS* = {*x* ∈ S|*x*<sup>∗</sup> is Pareto optimal}. The *x*∗ vector corresponds to the *non-dominated solutions*. The *Pareto Optimal Front* (*POF*) is defined by *POF* <sup>=</sup> {*f*(*x*) <sup>∈</sup> <sup>R</sup>*n*|*<sup>x</sup>* <sup>∈</sup> *POS*}. We thus wish to determine the POS from the <sup>S</sup> set of all the decision variable vectors that satisfy Equation (9). The *dominance* phenomenon occurs in the decision variable (POS) and the objective function (POF) spaces. From here on, each time we mention *Pareto dominance*, we are referring to the same concept in both spaces.

### **3. Related Previous Works**

This Section shows a review of some related previous works focused on association rule mining in COVID-19 data sets. Two groups were defined: (1) works related to traditional assessment measures (support, confidence, and lift) optimized by classical algorithms such as *Apriori*, FP growth, and Eclat and (2) works that simultaneously optimize more than one association measure function with evolutionary algorithms.

Recently, the work of Cortes et al. in [11] provided an extensive review of the stateof-the-art machine learning techniques and data mining algorithms for predicting the COVID-19 pandemic. Their paper analyzed the role of diverse data mining techniques in classification, regression, text analysis, clustering, and association. Another comprehensive study is the work of Flora et al. [12] with a review of machine learning modeling. There, association rule mining was used as a knowledge discovery tool in the analysis of vaccines and the identification of potential risk factors.

The work of Zicheng Shan and Wei Miao [13] proposed a data mining algorithm based on association rules for the diagnosis and treatment of COVID-19 patients. During the study, some disadvantages of the proposed algorithm were found because of the delicate data preprocessing required in order to improve the efficiency of the *Apriori* algorithm. Moreover, the authors reported notably low values in some association-measure functions such as *support* and *confidence*.

Wasiq et al. [14] proposed a framework for identifying patterns and class associations between demographic attributes and COVID-19 death rates across different regions of the world. Their approach suggested a workflow (pipeline) that includes data preprocessing, class association learning, clustering, and data analysis to discover significant association patterns.

In [15], Tandan et al. showed a comparative study of association rule mining works using the *Apriori*, FP growth, and Eclat algorithms to discover symptom patterns by age, gender, chronic condition, and mortality status among COVID-19 patients. Their study optimized the *support*, *confidence*, and *lift* measures one by one, in order to determine a ranking of symptoms and chronic conditions of COVID-19 patients.

In [16–18], a multi-objective genetic association rule mining algorithm based on NSGA-II was proposed. These pioneering works introduced new concepts such as *comprehensibility*, *surprise*, *interestingness*, and *confidence* as useful measures for extracting interesting rules. However, these studies were only tested on small data sets with categorical or numerical attributes and never with mixed attribute values.

The work of Luna et al. [19] introduced the first grammar-guided genetic programming approach for mining association rules from relational databases. The performance of this algorithm was checked using both synthetic relational data and a real-world database, but this work focused only on *support* and *confidence* measures.

In this paper, we propose a causal association rule mining process guided by an evolutionary algorithm with non-standard recombination and mutation operators. The proposed algorithm was designed precisely to be used on a COVID-19 official database in order to learn the behavior of the contagion and hospitalization phenomena during the pandemic. The causality nature of the mined rules ensures a certain degree of *actionability* that decision makers can leverage while combating the COVID-19 pandemic.

### **4. Proposal**

In this section, we describe the mining methodology proposed to extract association rules in a COVID-19 data set. Figure 1 shows the principal steps for our proposal, while the following subsections describe the details for each one.

**Figure 1.** The rule mining process.

### *4.1. Data Preparation*

Experimentation was performed with an official pandemic database generated by the Mexican government [4]. At the moment of performing these experiments, the database was composed of records from 1 January 2020 to 1 April 2022. The total number of records within this period was 15,578,792 with 37 attributes.

Numerical data were discretized using the quintile-based technique [20] in order to provide the following properties in the information:

• Uniform support: Each selector had approximately the same support, which was 20%.


### *4.2. Group Modeling*

The set of all attributes initially used to describe the data is manually clustered in order to define smaller groups of attributes with related semantics. The user can select any two attribute groups to be related as the antecedent and consequent, starting a causal mining process. This selection helps to narrow the mining process to causal rules with a specific kind of *actionability*. Table 1 shows the sets of attributes manually selected that define a semantic group. Here, the term comorbidities refers to the previous illnesses that a person has suffered, such as diabetes or hypertension.


**Table 1.** Sets of attributes for each semantic group.

Table 2 summarizes the number of attributes in each group, the number of possible selectors, and the total number of possible combinations or rules to estimate the search space size for each scenario. In Table 3, we consider three *scenarios*, with each one defined by a pair of related attribute groups and a target optimization function (used as fitness criteria). The search space contains all possible association rules according to the number of attributes and selectors. Moreover, we considered four periods of time called waves, with each one representing the increase in the number of people with COVID-19. Finally, we applied the process of association rule mining in the following intervals (see Table 4).

**Table 2.** Description of the number of attributes and selectors in COVID-19 data set.


**Table 3.** Experimentation scenarios with search space size.



**Table 4.** Periods of time in which the number of people with COVID-19 increased.

### *4.3. Query Definition by Optimization Problem*

We defined three discrete multi-objective optimization problems (DMOPs). In all three cases, the association-measure functions showed a conflict when optimizing them simultaneously. Additionally, we included three constraints to obtain correct and complete association rules:


**DMOP-1**. Classic association rule mining aims to obtain rules with the highest possible support, confidence, and lift. However, simultaneously optimizing these measures is impossible because a sustained increase or decrease in one does not guarantee behavior in the same direction in the other two. The formal definition of the optimization problem for this query is described by Equation (10):

$$\begin{aligned} \text{maximize } & \operatorname{supp}(A \to \mathbb{C}) \\ \text{maximize } & \operatorname{conf}(A \to \mathbb{C}) \\ & \text{maximize } & \operatorname{lift}(A \to \mathbb{C}) \\ \text{subject to } & \operatorname{supp}(A \to \mathbb{C}) > 0, \\ & AR(A \to \mathbb{C}) > 0, \\ & \mathbb{C}I\_{OR}^{\operatorname{inf}}(A \to \mathbb{C}) \ge 1. \end{aligned} \tag{10}$$

**DMOP-2**. From a logical perspective, a biconditional expression (*A* ↔ *C*) can be interpreted as "A if and only C" or "A is a necessary and sufficient condition for C". Its truth value is equivalent to the expression (*A* → *C*) ∧ (*C* → *A*). The sufficiency condition falls to the association rule *A* → *C*, interpreted as "A is a sufficient condition for C". The sufficiency condition is considered to be satisfied if the causal effect of *A* → *C* is large enough. On the other hand, to satisfy the necessary condition of the biconditional expression, the causal effect of *C* → *A* must be considered as well (see Equation (11)):

$$\begin{aligned} \text{maximize } AR(A \to \mathbb{C})\\ \text{maximize } AR(\mathbb{C} \to A) \\ \text{subject to } \text{supp}(A \to \mathbb{C}) &> 0, \\ AR(A \to \mathbb{C}) &> 0, \\ \mathbb{C}I\_{OR}^{inf}(A \to \mathbb{C}) &\ge 1. \end{aligned} \tag{11}$$

**DMOP-3**. In this problem, we seek to find the rules that maximize susceptibility, a measure that quantifies the capacity of the antecedent to produce the consequent, and the

population attributable fraction, a measure that indicates the proportion of observations of the consequences that were caused by the antecedent (see Equation (12)):

$$\begin{aligned} \text{maximize } &PS(A \to \mathbb{C})\\ \text{maximize } &AF\_P(A \to \mathbb{C})\\ \text{subject to } &supp(A \to \mathbb{C}) > 0, \\ &AR(A \to \mathbb{C}) > 0, \\ &\mathbb{C}I\_{OR}^{inf}(A \to \mathbb{C}) \ge 1. \end{aligned} \tag{12}$$

### *4.4. Evolution Proposal and Heuristically Guided Mining*

Since direct exhaustive search strategies are not an option for mining a large data set, a heuristically guided mining mode is used. When in this mode, previous knowledge about the structure of the data is fed to a meta-heuristic optimization which evolves a set of specific patterns with the adequate structure to be *Pareto front* elements in the process of optimizing a selected objective function (i.e., *absolute risk*, *relative risk*, or any other).

The evolutionary algorithm proposed herein performs artificial evolution based on NSGA-II [22]. Some arguments for using NSGA-II include its mechanisms for solving combinatorial optimization problems with two and three objective functions [23], particularly the following:


Our proposal controls the selector structure of each pattern, allowing the system to answer specific user questions to discover association rules with particular semantics. In addition, the regulation of the search space *exploration/exploitation* process allows the generation of a wide range of causal rule complexities, from very simple rules with only one selector in the antecedent and consequent to more elaborate rules with the antecedent and consequent formed by several selectors. Once these patterns are known to have optimal structures and values, the mining system can directly search for these patterns in the actual data set. This has the effect of speeding up the mining process.

In order to guarantee the statistical significance of causal rules, two criteria are proposed. First, a *diversity preservation* criterion will be used as an essential evolution guide in the algorithm. Second, a statistical significance test on the set of causal rules mined is used.

### Rule Evolution

The proposed algorithm evolves a population of selector lists as any other artificial evolution process would. Each list represents a possible association rule in the data set. The structure of those lists is straightforward, as is the structure of association rules. Each list has two main sections representing the *antecedent* and *consequent* of the rule, and then each section may have one or more subsections in correspondence with the selectors that conform it. The label and domain of all attributes are considered background knowledge, so the proper validation restrictions can be applied every time a new selector enters the expression.

During successive generations, the algorithm selects individuals from its population based on their fitness and applies recombination and mutation operators to generate new individuals, which are also evaluated by their aptitude. As the population size is fixed to *N* individuals, each new generation is selected from the best fit rules among previously known and newly generated rules:

• The stop criterion for the evolution process is triggered after 100 generations without improvement in the fitness value of the fittest rule.

	- 1. Interchange: The antecedent and consequent from the ancestor rules are interchanged. Two new individuals are created: *A*<sup>1</sup> → *C*<sup>2</sup> and *A*<sup>2</sup> → *C*1.
	- 2. Set operations: The *union* (∪), *intersection* (∩), and *symmetric difference* ( ) operators are applied to the sets of selectors in the antecedent and consequent of the ancestor rules. For each one of those set operators, the antecedent of new rules results from applying the operator on sets *A*<sup>1</sup> and *A*2, and the consequent results from applying the same operator on sets *C*<sup>1</sup> and *C*2. Selectors with repeated attributes are pruned, as well as all cases that result in an empty antecedent or consequent. At most, three new individuals are created with this recombination process.

### **5. Experiments and Results**

We designed two main experiments. The first one explored the association rules in the complete data set (15,578,792 records). Here, our data mining methodology described in the previous section was applied while considering the three scenarios (*A*, *B*, and *C*) illustrated in Table 3. In this experiment, we intended to solve a single-objective optimization problem to find the best (maximum) values for each association measure function (equations described in Section 2.1) and validate the convergence of our proposed algorithm. Table 5 shows each case's fitness mean (and standard deviation). The best values are shown in boldface. We considered 10 executions with different seeds for random generation. We established this number of executions for two reasons: the data mining process is computationally expensive, and we corroborated that after 10 executions, there was no variation in the results for the majority of the scenarios (standard deviation equals zero). In the classical measures, the best association rules reached a support function value that was low in each case's fitness means (rather than 0.6), and the lift function was variable in these three scenarios. In the causal association measures, the functions of the probability of sufficiency and attributable fraction reported low values in scenarios A and C, respectively. In general, scenario B reported the maximum values for the association measures.

The second experiment had two purposes: (1) solve the DMOP described in Section 4.3 in order to compare the classic and causal association rule models as adequate and feasible tools for analyzing the COVID-19 pandemic phenomenon and (2) find association rules along four different and well-defined time periods (labeled as waves) to clearly characterize the behavior and tendencies of each contagion wave. Then, we applied our genetic algorithm in the three scenarios and the four waves for each DMOP. Then, we filtered the experimental results using criteria that selected non-dominated rules.


**Table 5.** The maximum mean values found by an evolutionary algorithm for each objective function.

Table 6 reports the mean and standard deviation (in parenthesis) of the number of non-dominated rules found after 10 executions for each case. We can note that scenarios A and B, related to the comorbidities, age, gender, medical care, and outcome, were very consistent in the non-dominated rules. In contrast, scenario C (location and comorbidities) showed more variation in the association rules found.


**Table 6.** The mean of the non-dominated rules found for each DMOP in all scenarios.

Figures 2–4 show the obtained Pareto front with the levels of the maximum values reached by interesting rules according to causal measures. For Scenario B, DMOP-1, and DMOP-2, the mined rules were very similar. We appreciated some differences in scenarios B and C, where the absolute risk function, reciprocal absolute risk, population attributable fraction, and probability of sufficiency reported low values in the last period, called wave 4. Here, we can understand these results as a positive effect of the vaccine on the population.

Table 7 reports the same non-dominated association rules discovered for the classic and causal measures in scenarios A and B. Both logical models for the data mining process found the same patterns demonstrating that causal measures can find interesting rules as a classical model. According to the results, the diseases with the greatest influence on the association rules found for the time periods called waves were diabetes, hypertension, pneumonia, and COPD. The ages of the patients were directly related to the diseases. Therefore, the majority of the population older than 53 had the highest comorbidity statistics as the most vulnerable sector. From the viewpoint of the association measures, we observed that the numerical values for the causal measures were more evident. Unlike these, the support and confidence numerical values were very small.

**Figure 2.** Obtained Pareto fronts of DMOP-2 (absolute risk and reciprocal absolute risk) and DMOP-3 (probability of sufficiency and population attributable fraction) for all waves in Scenario A. The antecedent group is age and gender, and the consequent group is comorbidities.

**Figure 3.** Obtained Pareto fronts of DMOP-2 (absolute risk and reciprocal absolute risk) and DMOP-3 (probability of sufficiency and population attributable fraction) for all waves in Scenario B. The antecedent group is comorbidities, and the consequent group is medical care.

**Figure 4.** Obtained Pareto fronts of DMOP-2 (absolute risk and reciprocal absolute risk) and DMOP-3 (probability of sufficiency and population attributable fraction) for all waves in Scenario C. The antecedent group is location, and the consequent group is comorbidities.

All supplementary material for this research can be found in https://github.com/ sinisterra/mscgp (accessed on 1 November 2022). There, we provide the Python code used to generate all experiments.

**Table 7.** The best association rules were obtained in scenarios A (age and gender → comorbidities) and B (comorbidities and medical care). The evolutionary algorithm found these rules in the last population generated.



**Table 7.** *Cont.*

### **6. Conclusions**

In this research, we used NSGA-II mechanisms for guiding an association rule mining process to learn the behavior of the COVID-19 contagion phenomenon at a country-wide scale from an official government database in Mexico. Our mining algorithm includes nonclassical crossover and mutation operators that have shown certain reliability for optimizing both classical and causal rule evaluation measures. Using artificial evolution as a guide to the mining process, we designed three experimentation scenarios as multi-objective optimization problems and considered the four officially identified waves of contagion.

Each experiment correctly found the rules with the maximum values for *support*, *confidence*, *lift*, *absolute risk*, and *probability of sufficiency* in a DMOP context. Since all those values were obtained under the constraint of having a *confidence interval* greater than or equal to one, they all had a strong correspondence with the concept of *interesting* rules expressed at the end of the Introduction section. Therefore, all mined rules identified the strongest associations between the antecedent and consequent in the database. The rules mined in DMOP-1 experiment were *interesting* in the classic mining sense, while the rules mined in the DMOP-2 and DMOP-3 experiments were *interesting* in the causal mining sense. The set of all rules mined during each experiment constituted the *learned behavioral model* of the studied phenomenon and brought forth interesting information about the phenomenon's behavior.

The main contributions made by this work are the following:


Some of the next steps considered in this research include the following:


**Supplementary Materials:** The supporting information is shared and available online by visiting https://github.com/sinisterra/mscgp.

**Author Contributions:** S.S.-S., S.G.-C. and M.P.-R. have contributed equally in conceptualization, methodology, and formal analysis; software and validation S.S.-S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by Secretaría de Investigación y Posgrado (SIP-IPN) through projects/grants SIP-20230252 and SIP-20230105.

**Acknowledgments:** The authors wish to acknowledge and gratefully thank Consejo Nacional de Ciencia y Tecnología (CONACyT) and Instituto Politécnico Nacional (IPN).

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

### *Article* **Knowledge Transfer Based on Particle Filters for Multi-Objective Optimization**

**Xilu Wang and Yaochu Jin \***

Faculty of Technology, Bielefeld University, 33619 Bielefeld, Germany

**\*** Correspondence: yaochu.jin@uni-bielefeld.de

**Abstract:** Particle filters, also known as sequential Monte Carlo (SMC) methods, constitute a class of importance sampling and resampling techniques designed to use simulations to perform on-line filtering. Recently, particle filters have been extended for optimization by utilizing the ability to track a sequence of distributions. In this work, we incorporate transfer learning capabilities into the optimizer by using particle filters. To achieve this, we propose a novel particle-filter-based multi-objective optimization algorithm (PF-MOA) by transferring knowledge acquired from the search experience. The key insight adopted here is that, if we can construct a sequence of target distributions that can balance the multiple objectives and make the degree of the balance controllable, we can approximate the Pareto optimal solutions by simulating each target distribution via particle filters. As the importance weight updating step takes the previous target distribution as the proposal distribution and takes the current target distribution as the target distribution, the knowledge acquired from the previous run can be utilized in the current run by carefully designing the set of target distributions. The experimental results on the DTLZ and WFG test suites show that the proposed PF-MOA achieves competitive performance compared with state-of-the-art multi-objective evolutionary algorithms on most test instances.

**Keywords:** particle filter; multi-objective optimization; transfer learning

### **1. Introduction**

Many real-world applications in economics, mechanics and engineering can be formulated as multi-objective optimization problems (MOPs) that simultaneously optimize two or more objective functions [1]. The basic statement of an MOP for a minimization task can be formulated as

$$\min \quad \mathbf{F}(\mathbf{x}) = \{f\_1(\mathbf{x}), f\_2(\mathbf{x}), \dots, f\_m(\mathbf{x})\} \\ \tag{1}$$
 
$$\mathbf{x} \subseteq \Omega$$

where *<sup>Ω</sup>* <sup>⊆</sup> <sup>R</sup>*<sup>D</sup>* is the decision space of decision variables, **<sup>x</sup>** = (*x*1, *<sup>x</sup>*2, ··· *xD*) is a decision vector with *D* denoting the number of decision variables, **F**(**x**) consists of *m* objective functions, and *m* is the number of objectives.

Usually, different objectives are conflicting with each other, which means that a decision vector that decreases the values of *fm* may increases that of *fn*. As a result, it is impossible to find only one solution that can optimize all the objectives simultaneously; however, a set of optimal solutions that trade off between different objectives are known as Pareto optimal solutions. The whole set of Pareto optimal solutions in the decision space is called the Pareto set (PS), and the projection of PS in the objective space is called the Pareto front. Various types of algorithms have been proposed for solving MOPs.

For example, the scalarization technique is one of the most popular methods and is used to convert an MOP into a single optimization problem. Scalarization can be achieved by the global criterion method [2], the weighted min-max method [3,4], the *-*-constraint method [5] and reference point methods [6].

**Citation:** Wang, X.; Jin, Y. Knowledge Transfer Based on Particle Filters for Multi-Objective Optimization. *Math. Comput. Appl.* **2023**, *28*, 14. https:// doi.org/10.3390/mca28010014

Academic Editors: Carlos Coello, Erik Goodman, Kaisa Miettinen, Dhish Saxena, Oliver Schütze and Lothar Thiele

Received: 1 November 2022 Revised: 21 December 2022 Accepted: 6 January 2023 Published: 18 January 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Another popular approach is based on evolutionary algorithms (EAs), which have been applied successfully to many real-world complex optimization problems [7,8]. Over the past decades, a large number of multi-objective evolutionary algorithms (MOEAs) have been proposed, such as nondominated sorting genetic algorithm II (NSGA-II) [9], multiobjective evolutionary algorithm based on decomposition (MOEA/D) [10], reference vector guided evolutionary algorithm (RVEA) [11] and strength Pareto evolutionary algorithm 2 (SPEA2) [12]. More recently, many variants have been proposed to further enhance the optimization performance of MOEAs and extend them to many-objective optimization problems, such as NSGA-III [13], *θ*-DEA [14] and MOEA/DD [15].

Particle filter (PF), also known as sequential Monte Carlo (SMC), is a class of importance sampling and resampling techniques designed to simulate from a sequence of probability distributions, and this has gained popularity over the last decade to solve sequential Bayesian inference problems. With the notable exception of linear-Gaussian signal-observation models, the PF theory has become the dominated approach to solving the state filtering problem in dynamic systems. Applications of particle filter theory have expanded to diverse fields, such as object tracking [16], navigation and guidance [17] and fault diagnosis [18].

Recently, particle filters have been extended for optimization [19,20] by utilizing the ability to track a sequence of distributions. In order to deal with a global optimization problem, generally, a sequence of artificial dynamic distribution is designed to employ the particle filter algorithm [21,22]. The crucial element in particle filter optimization (PFO) is how to design the system dynamic function by formulating the optimization problem as a filtering problem, which forces the set of particles to move toward the promising area containing optima.

Although PFO has shown promising performance in certain applications, current PFO methods only work for single-objective optimization problems [23]. As many realworld problems involve multiple objectives to be optimized simultaneously, it is interesting to extend PFO to MOPs. To fill this gap, we make an effort to extend the scope of the application of PFO to multi-objective cases. To achieve this, we propose a novel particlefilter-based multi-objective optimization algorithm (PF-MOA) by transferring knowledge acquired from the search experience.

The key insight adopted here is that, if we can construct a sequence of target distributions that can balance the multiple objectives and make the degree of the balance controllable, we can approximate the Pareto optimal solutions by simulating each target distribution via particle filters. Inspired by the ability of SMC samplers to sample sequentially from a sequence of probability distributions [24], we design a particle filter to perform the optimization. The method of importance updating in particle filters makes it possible to leverage the knowledge readily available for the previous subproblem to optimize the current subproblem, guiding the new particles to concentrate on the more promising area found thus far. As a result, PF-MOA offers an efficient solution to optimize MOPs by tracking the Pareto optimal solutions on the Pareto front via a particle filter.

The rest of this paper is organized as follows. Section 2 presents a brief introduction to particle filters and the application to single-objective optimization. In Section 3, a particlefilter-based multi-objective optimization method is proposed. Numerical simulations are conducted in Section 4, where the results are presented and discussed. Finally, our conclusions are drawn in Section 5.

### **2. Background**

### *2.1. Particle Filter*

Consider the discrete-time nonlinear state-space models relating a hidden state *xk* to the observations *yk*:

$$\begin{array}{l} \mathbf{x}\_{k} = \mathbf{g}(\mathbf{x}\_{k-1}, \mathbf{u}\_{k}), k = 1, 2, \dots, \\ y\_{k} = h(\mathbf{x}\_{k'} v\_{k}), k = 0, 1, \dots, \end{array} \tag{2}$$

where *<sup>k</sup>* is the sample number; *xk* <sup>∈</sup> *<sup>R</sup>nx* is the state; *yk* <sup>∈</sup> *<sup>R</sup>ny* are the observations; *uk* <sup>∈</sup> *<sup>R</sup>nx* and *vk* <sup>∈</sup> *<sup>R</sup>ny* are the system and observation noise, respectively; and *nx* and *ny* are the dimensions of *xk* and *yk*, respectively. We assume *uk* and *vk* are independent and identically distributed (i.i.d.) sequences, independent of each other and also independent of the initial state *<sup>x</sup>*0, which has the probability density function (p.d.f.) *<sup>p</sup>*0. Let *<sup>p</sup>*(*xk* | *xk*−1) denote the transition density, and *<sup>p</sup>*(*yk* | *xk*−1) denote the likelihood function.

The goal of filtering is to estimate the conditional density,

$$b\_k(\mathbf{x}\_k) \stackrel{\triangle}{=} p(\mathbf{x}\_k \mid \mathbf{y}\_{0:k}), \quad k = 0, 1, \dots \tag{3}$$

where *y*0:*<sup>k</sup>* = {*y*0,..., *yk*}, for all the observations from time 0 to *k*. The conditional density *bk*(*xk*) can be derived recursively via the Chapman–Kolmogorov equation and Bayes rule as follows:

$$\begin{split} b\_{k}(\mathbf{x}\_{k}) &= \frac{p(\boldsymbol{y}\_{k} \mid \mathbf{x}\_{k}) p(\mathbf{x}\_{k} \mid \boldsymbol{y}\_{0:k-1})}{p(\boldsymbol{y}\_{k} \mid \boldsymbol{y}\_{0:k-1})} \\ &= \frac{p(\boldsymbol{y}\_{k} \mid \mathbf{x}\_{k}) \int p(\mathbf{x}\_{k} \mid \mathbf{x}\_{k-1}) b\_{k-1}(\mathbf{x}\_{k-1}) d\mathbf{x}\_{k-1}}{\int p(\boldsymbol{y}\_{k} \mid \mathbf{x}\_{k}) p(\mathbf{x}\_{k} \mid \boldsymbol{y}\_{0:k-1}) d\mathbf{x}\_{k}} \end{split} \tag{4}$$

Since *bk*(*xk*) is unknown, we generate the particles by sampling from another known density *q*(*xk* | *y*0:*k*) and adjust the weights of the samples to obtain an estimate of *bk*(*xk*). This approach is known as importance sampling, and the density *q*(*xk* | *y*0:*k*) is referred to as the importance density. Hence, it is easy to see that, in order to approximate *p*(*xk* | *y*0:*k*), for samples ' *xi <sup>k</sup>*, *<sup>i</sup>* <sup>=</sup> 1, . . . , *<sup>N</sup>*( drawn i.i.d. from *<sup>q</sup>*(*xk* <sup>|</sup> *<sup>y</sup>*0:*k*), their weights should be

$$w\_k^i \propto \frac{p\left(\mathbf{x}\_k^i \mid y\_{0:k}\right)}{q\left(\mathbf{x}\_k^i \mid y\_{0:k}\right)}\tag{5}$$

where ∝ means proportional to, and the weights should be normalized.

To perform the estimation recursively, we used the Bayes rule to derive the following recursive equation for the conditional density:

$$\begin{split} b\_{k}(\mathbf{x}\_{k}) & \stackrel{\scriptstyle \mathbf{j}}{=} p(\mathbf{x}\_{k} \mid \mathbf{y}\_{0:k}) \\ &= \frac{p(\mathbf{x}\_{k}, \mathbf{y}\_{k} \mid \mathbf{y}\_{0:k-1})}{p(\mathbf{y}\_{k} \mid \mathbf{y}\_{0:k-1})} \\ & \propto p(\mathbf{y}\_{k} \mid \mathbf{x}\_{k}) \int p(\mathbf{x}\_{k} \mid \mathbf{x}\_{k-1}) p(\mathbf{x}\_{k-1} \mid \mathbf{y}\_{0:k-1}) d\mathbf{x}\_{k-1} \\ & \propto \int p(\mathbf{y}\_{k} \mid \mathbf{x}\_{k}) p(\mathbf{x}\_{k} \mid \mathbf{x}\_{k-1}) b\_{k-1}(\mathbf{x}\_{k-1}) d\mathbf{x}\_{k-1} \end{split} \tag{6}$$

where *<sup>p</sup>*(*yk* | *<sup>y</sup>*0:*k*−1, *xk*) = *<sup>p</sup>*(*yk* | *xk*) and *<sup>p</sup>*(*xk* | *<sup>y</sup>*0:*k*−1, *xk*−1) = *<sup>p</sup>*(*xk* | *xk*−1) both follow from the Markovian property of model Equation (12), the denominator *<sup>p</sup>*(*yk* | *<sup>y</sup>*0:*k*−1) does not explicitly depend on *xk* and *k*, and ∝ means that *p*(*xk* | *y*0:*k*) is the normalized version of the right-hand side. The state transition density *<sup>p</sup>*(*xk* | *xk*−1) is induced from the state equation in Equation (12) and the distribution of the system noise *uk*−1, and the likelihood *p*(*yk* | *xk*) is induced from the observation equation in Equation (12) and the distribution of the observation noise *vk*. Substituting Equation (6) into Equation (5), we find

$$w\_k^i \propto \frac{p\left(y\_k \mid \mathbf{x}\_k^i\right) p\left(\mathbf{x}\_k^i \mid \mathbf{x}\_{k-1}^i\right)}{q\left(\mathbf{x}\_k^i \mid y\_{0:k}\right)} p\left(\mathbf{x}\_{k-1}^i \mid y\_{0:k-1}\right),\tag{7}$$

If the importance density *q*(*xk* | *y*0:*k*) is chosen to be factored as

$$q(\mathbf{x}\_k \mid y\_{0:k}) = q(\mathbf{x}\_k \mid \mathbf{x}\_{k-1}, y\_k) q(\mathbf{x}\_{k-1} \mid y\_{0:k-1}) \tag{8}$$

Moreover, to avoid sample degeneracy, new samples are resampled i.i.d. from the approximate conditional density *p*ˆ(*xk* | *y*0:*k*) at each step; hence, the weights are reset to *wi <sup>k</sup>*−<sup>1</sup> <sup>=</sup> 1/*N*, and

$$w\_k^i \approx \frac{p\left(y\_k \mid \mathbf{x}\_k^i\right) p\left(\mathbf{x}\_k^i \mid \mathbf{x}\_{k-1}^i\right)}{q\left(\mathbf{x}\_k^i \mid \mathbf{x}\_{k-1}^i, y\_k\right)}, i = 1, \dots, N \tag{9}$$

In the plain particle filter, the importance density *q xk* <sup>|</sup> *<sup>x</sup><sup>i</sup> <sup>k</sup>*−1, *yk* is chosen to be the state transition density *p xk* <sup>|</sup> *<sup>x</sup><sup>i</sup> k*−1 , which is independent of the current observation *yk*, yielding

$$w\_k^i \propto p\left(y\_k \mid \mathbf{x}\_k^i\right), i = 1, \dots, N \tag{10}$$

The plain particle filter recursively propagates the support points and updates the associated weights. The algorithm is as follows in Algorithm 1:

**Algorithm 1** General particle filter.


```
5: k ← k + 1 and go to step 2.
```
*2.2. Particle Filter Optimization for Global Optimization*

We consider the global optimization problem:

$$\mathbf{x}^\* \in \operatorname\*{arg\,max}\_{\mathbf{x} \in \mathcal{X}} H(\mathbf{x}) \tag{11}$$

where *x* is a vector of n decision variables, X is the search space, and the objective function *H* is a bounded deterministic function. We denote the optimal function value as *H*∗, i.e., there exists an *x*<sup>∗</sup> such that *H*(*x*) ≤ *H*<sup>∗</sup> -*H*(*x*∗), ∀*x* ∈ X .

Many of the simulation-based global optimization methods, such as the estimation of distribution algorithms (EDAs) [25,26], covariance matrix adaptation evolution strategy [27], cross-entropy (CE) method [28], model reference adaptive search (MRAS) method [29] and particle filter optimization (PFO), fall into the category of model-based methods. They share the similarities of iteratively repeating the following two steps: let *gk* be a probability distribution on *x* at the *k*-th iteration of an algorithm:


The underlying idea is to construct a sequence of iterates (probability distributions) *gk* with the hope that *gk* → *g*<sup>∗</sup> as *k* → ∞, where *g*<sup>∗</sup> is a limiting distribution that assigns most of its probability mass to the set of optimal solutions. Thus, it is the probability distribution (as opposed to candidate solutions as in instance-based algorithms) that is propagated from one iteration to the next [30].

The main idea of PFO is to formulate the optimization problem as a filtering problem, then particle filter construction appears as a natural candidate for the reformulation of the global optimization problem as a filtering problem. More specifically, the optimiza-

<sup>1:</sup> Initialization: Sample ' *xi* 0 (*<sup>N</sup> <sup>i</sup>*=<sup>1</sup> i.i.d. from an initial p.d.f. *p*0. Set *k* = 1.

tion problem Equation (11) can be formulated as a filtering problem by constructing an appropriate state-space model. Let the state-space model be

$$\begin{aligned} \mathbf{x}\_k &= \mathbf{x}\_{k-1}, \quad k = 1, 2, \dots, \\ \mathbf{y}\_k &= H(\mathbf{x}\_k) - \mathbf{v}\_{k'} \quad k = 0, 1, \dots, \end{aligned} \tag{12}$$

where the optimal solution is a static state to be estimated, *xk* is the unobserved state, *yk* is the observation, *vk* is the observation noise that is an i.i.d. sequence, and the conditional density of the state approaches a delta function concentrated on the optimal solution as the system evolves.

We assume that *vk* has a p.d.f. *ϕ*(·), and then the transition density is

$$p(\mathbf{x}\_k \mid \mathbf{x}\_{k-1}) = \delta(\mathbf{x}\_k - \mathbf{x}\_{k-1}) \tag{13}$$

where *δ* denotes the Dirac delta function. The likelihood function is

$$\begin{split}p(y\_k \mid \mathbf{x}\_k) &= \mathfrak{q}(H(\mathbf{x}\_k) - y\_k) \\ &= \mathfrak{q}(H(\mathbf{x}\_{k-1}) - y\_k) \end{split} \tag{14}$$

Substituting Equations (13) and (14) into the recursive equation of conditional density Equation (6), we obtain

$$b\_k(\mathbf{x}\_k) = \frac{\varphi(H(\mathbf{x}\_k) - y\_k)b\_{k-1}(\mathbf{x}\_k)}{\int \varphi(H(\mathbf{x}\_k) - y\_k)b\_{k-1}(\mathbf{x}\_k)d\mathbf{x}\_k} \tag{15}$$

The intuition of model Equation (12) is that the optimal solution *x*∗ is an unobserved static state, while we can only observe the optimal function values *y*∗ = *H*(*x*∗) with some noise. Equation (15) implies that, at each iteration, the conditional density (i.e., *bk*−1) is tuned by the performance of solutions to yield a new conditional density (i.e., *bk*) for drawing candidate solutions at the next iteration.

It should be expected that, if *yk* increases with *k*, the conditional density *bk* will come closer to the density of *xk*, i.e., a Dirac delta function concentrated on *x*∗. From the viewpoint of filtering, *bk* is the posterior density of *xk* that approaches the density of *xk*. From the optimization viewpoint, *bk* is a density defined on the solution space that becomes increasingly concentrated on the optimal solution as *k* increases. The framework of general particle filter optimization is given in Algorithm 2.

**Algorithm 2** General particle filter optimization framework.


3: Bayes Updating: Take *yk* to be the sample function value of *<sup>H</sup>*- *xi k* according to a certain rule. Compute the weight *w<sup>i</sup> <sup>k</sup>* for sample *<sup>x</sup><sup>i</sup> <sup>k</sup>* according to *<sup>w</sup><sup>i</sup> <sup>k</sup>* ∝ *ϕ* - *H*- *xi k* − *yk* , *i* = 1, 2, ... , *Nk* and normalize the weights such that they sum up to 1.

4: Resampling: Sample ' *xi k* (*<sup>N</sup> <sup>i</sup>*=<sup>1</sup> i.i.d. from *p*ˆ(*xk* | *y*0:*<sup>k</sup>* ).

5: *k* ← *k* + 1 and go to step 2.

Generally, the PFO algorithms can be differentiated from each other by the definitions of the target p.d.f. and of the proposal p.d.f. A specific definition of the target and proposal p.d.f. determines how the objective function is implanted in the sampling process and how the random samples (i.e., candidate solutions) are generated, respectively. For example, while a uniform distribution is adopted as the likelihood function *<sup>p</sup>*(*yk* | *xk*−1) in [22,31], the Boltzmann distribution is another choice in defining the target distribution for PFO methods [21].

### **3. Proposed Algorithm**

### *3.1. Algorithm Framework*

As mentioned above, particle filter optimization methods have been applied to singleobjective optimization problems by reformulating an optimization problem into a filtering problem. In this work, we make an effort to extend the scope of application of PFO to multi-objective cases. It is well-known that a Pareto optimal solution to a MOP, under mild conditions, could be an optimal solution of a scalar optimization problem in which the objective is an aggregation of all the objectives [10]. That is to say, MOPs can be formulated as a task of searching a set of Pareto optimal solutions, each of which corresponds to a scalar optimization subproblem with a certain degree of tradeoff among the objectives in an MOP.

With the insight into the decomposition strategy in the context of MOPs, it makes sense to construct a series of target distributions corresponding to a number of scalar objective optimization subproblems, and then the particle filter is adopted to simulate these distributions so that the Pareto optimal solutions can be obtained based on the samples yielded from simulations. There are two main issues: (1) how to design a series of proxy target pdfs for MOPs and (2) how to effectively simulate these p.d.fs via SMC. In the following, we seek answers to these problems and propose a particle filter optimization method for solving multiobjective optimization problems, which will be elaborated in the following. The framework of the proposed PF-MOA is outlined in Algorithm 3.

### **Algorithm 3** Particle filter multiobjective optimization.

**Input:** *N*: the number of particles; *K*: the number of subproblems; set the maximum number of fitness evaluation *FEmax* <sup>=</sup> *<sup>N</sup>* <sup>∗</sup> *<sup>K</sup>*;


```
20: Update k = k + 1, FE = FE + N;
```
21: **end while** 22: Return the particles in *D*;

### *3.2. The Design of Target Distribution*

Based on the theoretical foundation of sequential Monte Carlo samplers [24], SMC allows us to perform global optimization and sequential Bayesian estimation by sequentially sampling from a sequence of probability distributions that are defined on a common space. Specifically, similar to simulated annealing [32], we can move from a tractable distribution

to a distribution of interest through a sequence of artificial intermediate distributions. Consequently, the convergence results are available for SMC samplers [33]. As two or more conflicting objectives are involved in an MOP in Equation (1), the design of the target p.d.f. is different from that in single-objective optimization problems.

To approach to the Pareto optimal set, a set of proxy target pdfs are needed, each of which corresponding to a specific amount of balance among the objectives. To this end, we adopted a decomposition strategy to decompose an MOP into a number of scalar optimization subproblems, followed by designing a target p.d.f. for each single-objective subproblem. More specifically, let *λ*1, ..., *λ<sup>K</sup>* be a set of even spread weight vectors, and let **z**∗ be the reference point. An MOP with *m* objectives, i.e., Equation (1), can be decomposed into *K* scalar/single-objective optimization subproblems using the Tchebycheff (TCH) decomposition [10], and the objective function of the *j*th subproblem is

$$\min\_{\mathbf{x}\in\Omega} \mathcal{g}^{\text{tch}}\left(\mathbf{x}\mid\lambda^{j},\mathbf{z}^{\*}\right) = \max\_{1\le i\le m} \left\{\lambda\_{i}^{j}(f\_{i}(\mathbf{x}) - z\_{i}^{\*})\right\}\tag{16}$$

where *m* is the number of objectives, **z**∗ = (*z*∗ <sup>1</sup>, ..., *z*<sup>∗</sup> *<sup>m</sup>*) with *z*<sup>∗</sup> *<sup>i</sup>* = *minfi*(**x**|**x** ∈ *Ω*) is the reference point, *λ<sup>j</sup>* = *λj* <sup>1</sup>,..., *<sup>λ</sup><sup>j</sup> m* with ∑*<sup>m</sup> <sup>i</sup>*=<sup>1</sup> *λ<sup>i</sup>* = 1 and *λ<sup>i</sup>* ≥ 0 is the weight vector, and *fi* and **x** are the objective function and decision vector, respectively.

In this way, for each Pareto optimal solution **x**∗ of an MOP, there exists a weight vector *λ* such that **x**∗ is the optimal solution of a subproblem (Equation (16)), and each optimal solution of the subproblem is Pareto optimal to the MOP. As a result, to obtain a set of different Pareto optimal solutions of an MOP, one can solve a set of single-objective optimization problems with different weight vectors defined by Equation (16) or any other decomposition approaches. Note that *g*tch is continuous of *λ*, the optimal solution of *g*tch- **<sup>x</sup>** <sup>|</sup> *<sup>λ</sup><sup>i</sup>* , **z**∗ should be close to that of *g*tch- **<sup>x</sup>** <sup>|</sup> *<sup>λ</sup><sup>j</sup>* , **z**∗ if *λ<sup>i</sup>* and *λ<sup>j</sup>* are close to each other. Therefore, any information about these *g*tch with weight vectors close to *λ<sup>i</sup>* should be helpful for optimizing *g*tch- **<sup>x</sup>** <sup>|</sup> *<sup>λ</sup><sup>i</sup>* , **z**∗ .

Obtaining a set of single-objective subproblems, a set of target p.d.fs *π*˜ <sup>1</sup>(**x**), *π*˜ <sup>2</sup>(**x**), ... , *π*˜*K*(**x**) corresponding to the subproblems are constructed as follows,

$$\begin{aligned} \pi\_k(\mathbf{x}) & \triangleq \frac{\pi\_k(\mathbf{x})}{\mathbb{C}\_k}, k = 1, 2, \dots, K\\ \pi\_k(\mathbf{x}) &= \exp\left\{-\mathbf{g}^{\text{tch}}\right\} \end{aligned} \tag{17}$$

where *K* is the number of target p.d.fs (in our case, *K* equals to the number of the weight vector), *Ck* is a normalizing constant which ensures *π*˜ *<sup>k</sup>*(**x**) to be a qualified pdf whose integral equals 1. According to Equations (16) and (17), each p.d.f. corresponds to a specific degree of balance between each objective using the weight vectors.

### *3.3. The Sampling Procedure*

Given the target p.d.fs, the particle filter appears as a natural candidate for the simulation of these target distributions. The first subproblem is optimized, and then the particle filter is used to track the sequence of target distributions that correspond to a set of scalar subproblems. This has three main steps: importance updating, resampling and particle move. The importance updating step takes the current distribution *π*˜ *<sup>k</sup>* (corresponding to a subproblem) as the target distribution and takes the previous distribution *π*˜ *<sup>k</sup>*−<sup>1</sup> (corresponding to the previous subproblem) as the proposal distribution.

Thus, given that the previous samples are updated in proportion to *<sup>π</sup>*˜ *<sup>k</sup>*(·)/*π*˜ *<sup>k</sup>*−1(·), the new empirical distribution formed by samples is already distributed approximately according to *π*˜ *<sup>k</sup>*−1, and the weights of these weighted samples will closely follow *π*˜ *<sup>k</sup>*. The resampling step redistributes the samples such that they all have equal weights. The particle move step is performed on each particle to update their locations towards the

promising region so that we can follow the target distribution of each subproblem as closely as possible.

Note that, instead of updating particles according to a transition equation as in Equation (12), a Metropolis sampling method associated with genetic operators is adopted to sample new particles as the transition equation in the MOPs is unknown.

From the perspective of multi-objective optimization, the advantage of the proposed PF-MOA can be explained by tracking the Pareto optimal solutions on the Pareto front and making the search more efficient. The reason is that the importance weight of particles in the proposed PF-MOA is updated according to the difference between the current and the previous distributions (which correspond to two related subproblems). As we mentioned in Section 3.2, any information about these *g*tch with weight vectors close to *λ<sup>i</sup>* should be helpful for optimizing *g*tch- **<sup>x</sup>** <sup>|</sup> *<sup>λ</sup><sup>i</sup>* , **z**∗ . The method of importance updating makes it possible to leverage the knowledge readily available for the previous subproblem to optimize the next subproblem, guiding the new particles to concentrate on the more promising area found thus far.

More specifically, while the normalization of the weights and the resampling of the particles are the typical operations in Algorithm 2, the calculation of the importance weight for the *i*-th particle according to the set of target distributions is as follows,

$$
\omega\_k^i = \begin{cases}
\ \pi\_k(\mathbf{x}^i), \text{ if } \quad k=1 \\
\ \pi\_k(\mathbf{x}^i) / \pi\_{k-1}(\mathbf{x}^i), \text{ otherwise.}
\end{cases}
\tag{18}
$$

Through the resampling step, we eliminate/duplicate samples with low/high importance weights, respectively, avoiding the issue of particle degeneracy.

### *3.4. Particle Move*

After the resampling step, a Metropolis sampling method based on genetic operators is proposed to promote the divergence of particles as summarized in Algorithm 4. As we demonstrate in Section 2.2, the state transition as function is assumed *xk* = *xk*−<sup>1</sup> in the state space model when solving global optimization problems. If a particle filter is applied to this model directly, with no particle move, the resulting algorithm would be equivalent to importance sampling from the initial sampling distribution directly to the posterior in a single step. This would be problematic if the initial sampling distribution was located in a different region of parameter space entirely, particularly in the context of MOPs. Hence, a Metropolis sampling method for generating new particles is proposed to assist the particle filter to simulate these target pdfs by exploiting the promising region.

The resampling step together with the Metropolis sampling step prevents sample degeneracy or, in other words, maintains the sample diversity and, thus, the exploration of the solution space. To make use of the search information obtained by the particle filter, the mean of particles **x**¯*<sup>k</sup>* and the best particle **x** *<sup>k</sup>* obtained thus far are identified and assumed to be close to the optimum. The new/displacement particles **x** will hence be generated around the promising region using genetic operators, i.e., the typical mutation and crossover operators. Subsequently, the displacement will be either accepted or rejected according to a dynamically calculated probability, called the acceptance probability. In the proposed PF-MOA, the acceptance probability for the displacement of the *i*-th particle - **x***i* is calculated by

$$\rho = \min \left\{ \pi\_k(\mathbf{x'}) / \pi\_k(\mathbf{x^i}), 1 \right\}. \tag{19}$$

### **Algorithm 4** A Metropolis sampling method based on genetic operators.

**Input:** The current particles ' **x***i k* (*<sup>N</sup> <sup>i</sup>*=<sup>1</sup> and the current target pdf *π*˜ *<sup>k</sup>*, the mean of particles **x**¯*<sup>k</sup>* and the best particle **x** *<sup>k</sup>* obtained thus far.

**Output:** the new particles;

1: **for** *i* = 1 : *N* **do**


$$\mathbf{x}\_k^i = \begin{cases} \mathbf{x}', \text{ with probability } \rho\\ \mathbf{x}\_{k'}^i \text{ with probability } 1 - \rho \end{cases} \tag{20}$$

4: Update **x** *<sup>k</sup>* = **x** , if *π*˜ *<sup>k</sup>* (**x** ) > *π*˜ *<sup>k</sup>* - **x** *k* . 5: **end for** 6: Return updated particles;

### **4. Comparative Studies**

In this section, numerical experiments are conducted on nine three-objective benchmark problems taken from the DTLZ test suite. To examine the efficiency of the proposed strategies, the proposed PF-MOA is compared with state-of-the-art multi-objective evolutionary algorithms, NSGA-II [9], RVEA [11], MOEA/D [10], NSGA-III [13], MOEA/DD [15] and *θ*-DEA [14]. Our code is available at https://github.com/xw00616/PF-MOA (accessed on 1 November 2022).

In the following section, we begin with briefly introducing the test problems and performance metrics adopted in our paper. Afterwards, the details of the experimental settings concerning the four compared algorithms are described. Lastly, the experimental results together with the Wilcoxon rank sum test are presented and discussed.

### *4.1. Test Problems*

In our experiments, the proposed algorithm is compared with three state-of-theart multi-objective optimization algorithms on DTLZ [34] and WFG [35] test suites with three objectives. The number of decision variables for the DTLZ test instances is set to *D* = *M* + *K* − 1, where *K* = 5 is adopted for DTLZ1, *K* = 10 is used for DTLZ2 to DTLZ6, and *K* = 20 is employed in DTLZ7. The number of decision variables for the WFG test instances is set to 12. *M* represents the number of objectives; here, we set *M* = 3.

### *4.2. Performance Metrics*

The inverted generational distance (IGD) [36] metric and hypervolume (HV) [37] metric are adopted to assess the performance of the algorithms. IGD and HV provide a combined information of the convergence and diversity of the obtained set of solutions. The PlatEMO toolbox [38] is used to calculate values of the performance metric in our experiments. Let *P*∗ be a set of uniformly distributed solutions sampled from objective space along the theoretical Pareto front. Let *P* be an obtained approximation to the Pareto front. Let *P*∗ be a set of uniformly distributed solutions sampled from objective space along the theoretical Pareto front. IGD measures the inverted generational distance from *P*∗ to *P*, defined as

$$IGD(P^\*, P) = \frac{\sum\_{\upsilon \in P^\*} d(\upsilon, P)}{|P^\*|} \tag{21}$$

where *d*(*υ*, *P*) is the minimum Euclidean distance between *υ* and all points in *P*. The smaller IGD value, the better the achieved solution set is.

HV calculates the volume of the objective space dominated by an approximation set *P* and dominates *P*∗ sampled from the PF.

$$HV = \text{volume}\left(\cup\_{i=1}^{j} \theta\_{i}\right) \tag{22}$$

where *ϑ<sup>i</sup>* represents the hypervolume contribution of the *i*-th solutions relative to the reference points. All HV values presented in this paper are normalized to [0, 1]. Algorithms achieving a larger HV value are better.

### *4.3. Experimental Settings*

We ran each algorithm on each benchmark problem 20 independent times, and the Wilcoxon rank sum test was calculated to compare the mean of 20 running results obtained by PF-MOA and by the compared algorithms at a significance level of 0.05. Symbols "(–)", "(+)" and "(≈))" indicate that the proposed algorithm shows significantly better, worse and similar performance than the compared algorithm, respectively.

The PF-MOA was implemented in MATLAB R2019a on an Intel Core i7 with 2.21 GHz CPU, and the compared algorithms were implemented in PlatEMO toolbox [38]. The general parameter settings in the experiments are given as follows: (1) The maximum number of function evaluations *FEmax* = 10, 000. (2) For PF-MOA: the population size was set to 100 and the maximum number of generations was set to 100. (3) For the three multiobjective evolutionary algorithms: the population size was set to 100 and the maximum number of generations was set to 100. The specific parameter settings for each compared algorithm were the same as recommended in their original papers.

### *4.4. Experimental Results*

The statistical results in terms of IGD and HV values obtained by the four algorithms are summarized in Table 1 and Table 2, respectively. For the DTLZ test problems, it is apparent that the proposed PF-MOA achieved the best approximate Pareto front on all test problems except for DTLZ6 and DTLZ7 (NSGA-II obtained the best IGD values). The reason behind this may be that DTLZ6 has a plenty of disconnected Pareto optimal regions in the decision space, and DTLZ7 has a discontinuous Pareto front. Hence, it is challenging to design proper target distributions in PF-MOA, which further degrades PF-MOA's performance.

According to the Wilcoxon rank sum test, the proposed algorithm significantly outperformed the compared algorithms on most of the test problems. For the WFG test instances, PF-MOA showed significantly better performance than the algorithms under comparison on six out of nine test instances, confirming the promising performance of the proposed PF-MOA. More specifically, taking WFG5 as an example, the objective multimodality was combined with landscape deception, and the proposed PF-MOA showed the worst performance compared with the other algorithms.

A possible explanation for this is that the deceptive objectives may impact the design of the target distributions, and the information form the previous subproblem does not provide sufficient information to help the algorithm generate good tradeoff solutions for the current subproblem. Moreover, similar observations can be made from Table 2.

To further illustrate the performance of the proposed algorithm, the obtained Pareto front for each algorithm is illustrated in Figure 1. We observed that the proposed method can find a set of well-converged and diverse Pareto optimal solutions, thereby, confirming the effectiveness of the particle filter in the PF-MOA.


**Table 1.** Statistical results of the IGD values obtained by NSGA-II, RVEA, MOEA/D, MOEA/DD, NSGA-III, *θ*-DEA and PF-MOA with the same number of real function evaluations.

**Table 2.** Statistical results of the HV values obtained by NSGA-II, RVEA, MOEA/D, MOEA/DD, NSGA-III, *θ*-DEA and PF-MOA with the same number of real function evaluations.


**Figure 1.** The Pareto front obtained by the compared algorithms on DTLZ5.

### **5. Conclusions**

In this paper, we extended the particle filter optimization method from single-objective optimization to multiobjective optimization. The Tchebycheff decomposition was used to decompose a multi-objective optimization into a set of single-objective problems so that a sequence of target distribution was defined. Subsequently, the particle filter was adopted to simulate these target distributions by using its tracking ability, and genetic operators were employed to perform the particle move. The experimental results on the DTLZ test suite showed the promising performance of PF-MOA compared with three state-of-the-art multi-objective evolutionary algorithms.

However, PF-MOA cannot effectively solve certain problems with discontinuous optimization problems, such as DTLZ6 and DTLZ7. The reason may be that PF-MOA always searches around the best particle, thereby, reducing the diversity of all the particles; however, the lack of diversity cannot be addressed by the resampling step, which should be considered in future work. Moreover, for real-world multiobjective optimization problems, uncertainty is an unavoidable issue, and it directly affects the optimization performance. As the filtering methods have been successfully applied to noisy MOPs, the particle filter may benefit MOEAs for solving MOPs with uncertainty.

**Author Contributions:** Conceptualization, X.W. and Y.J.; methodology, X.W.; software, X.W.; validation, X.W.; formal analysis, X.W.; investigation, X.W.; resources, X.W.; data curation, X.W.; writing original draft preparation, X.W.; writing—review and editing, X.W. and Y.J.; visualization, X.W.; supervision, Y.J.; project administration, Y.J.; funding acquisition, Y.J. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

### *Article* **Many-Objectives Optimization: A Machine Learning Approach for Reducing the Number of Objectives**

**António Gaspar-Cunha 1,\*, Paulo Costa 1, Francisco Monaco <sup>2</sup> and Alexandre Delbem <sup>2</sup>**

	- **\*** Correspondence: agc@dep.uminho.pt

**Abstract:** Solving real-world multi-objective optimization problems using Multi-Objective Optimization Algorithms becomes difficult when the number of objectives is high since the types of algorithms generally used to solve these problems are based on the concept of non-dominance, which ceases to work as the number of objectives grows. This problem is known as the curse of dimensionality. Simultaneously, the existence of many objectives, a characteristic of practical optimization problems, makes choosing a solution to the problem very difficult. Different approaches are being used in the literature to reduce the number of objectives required for optimization. This work aims to propose a machine learning methodology, designated by FS-OPA, to tackle this problem. The proposed methodology was assessed using DTLZ benchmarks problems suggested in the literature and compared with similar algorithms, showing a good performance. In the end, the methodology was applied to a difficult real problem in polymer processing, showing its effectiveness. The algorithm proposed has some advantages when compared with a similar algorithm in the literature based on machine learning (NL-MVU-PCA), namely, the possibility for establishing variable–variable and objective– variable relations (not only objective–objective), and the elimination of the need to define/chose a kernel neither to optimize algorithm parameters. The collaboration with the DM(s) allows for the obtainment of explainable solutions.

**Keywords:** objectives reduction; data mining; multi-objective optimization; many objectives

### **1. Introduction**

Real-world optimization problems are usually multiobjective, in which multiple conflicting objectives must be taken into account simultaneously. Manly, there are two ways to tackle these types of problems, scalarization functions and population-based algorithms. The use of scalarization functions presented some drawbacks, which led to the development of population-based metaheuristics that use the concept of Pareto-dominance and niching to evolve a population of solutions in the direction of the Pareto-optimal front [1,2].

There are at least three basic types of population-based algorithms commonly employed to solve Multiobjective Optimization Problems MOPs, namely, evolutionary algorithms, swarm-based methods, and colony-based algorithms, which can use the dominance concept, the metric indicators, or the decomposition strategy [3]. In most of these algorithms, a random initial population of solutions is generated and the new populations are consecutively obtained by selection and variation strategies until a stop criterion is met. It is expected from this procedure that the successive populations evolve towards, or to a good approximation of, the Pareto-optimal frontier. In each one of these populations, complex relations exist between the Decision Variables (DVs) and the objectives, as well as between DVs and DVs and objectives and objectives.

These algorithms work well when the number of objectives is low; however, as the number of objectives grows, the percentage of non-dominated solutions decreases, making it difficult for an algorithm based on Pareto-dominance to work effectively, a problem that

**Citation:** Gaspar-Cunha, A.; Costa, P.; Monaco, F.; Delbem, A. Many-Objectives Optimization: A Machine Learning Approach for Reducing the Number of Objectives. *Math. Comput. Appl.* **2023**, *28*, 17. https://doi.org/10.3390/ mca28010017

Academic Editors: Carlos Coello, Erik Goodman, Kaisa Miettinen, Dhish Saxena, Oliver Schütze and Lothar Thiele

Received: 6 November 2022 Revised: 14 January 2023 Accepted: 25 January 2023 Published: 30 January 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

is known as the curse of dimensionality. There is no consensus on the number of objectives for which this problem occurs; some authors indicate this number as ten [4] and others as four [5], but in reality, these difficulties arise when the number of objectives is four or more.

Two different methods are used to deal with this problem, either using relaxed forms of Pareto optimality or reducing the number of objectives [5]. The reduction of the number of objectives is useful either for the search process or for the decision-making process during and/or at the end of the optimization.

In previous years, some work related to objective reduction for many objectives optimization was proposed in the literature, which can be sub-divided into four different categories: (i) methods in which the aim is to maintain the dominance relation for the non-dominated solutions [6,7]; (ii) methods based on unsupervised feature selection [8]; (iii) methods based into a comparative analysis between the results obtained when the number of objectives is reduced [9]; (iv) methods based on data mining [5,10–12]; and methods based on the use of multi-objective formulations [13]. These approaches will be presented in more detail here.

Brockoff and Zitzler [6,7] suggested the use of two different approaches for objectives reduction, which are based on the definition of two types of problems. The first problem aims to obtain the minimum objective subset that produces a certain error (*δ*), designated by *δ*-MOSS problem (*δ*- Minimum Objective Subset problem), and the second problem aims to obtain an objective subset of a predefined size (*k*) with the minimum possible error, designated by *k*-EMOSS problem. For each one of these cases, two algorithms were presented, an exact and a greedy algorithm, characterized for maintaining the dominance relation. They were tested using different knapsack problems and the DTLZ2, DTLZ5, and DTLZ7 benchmark problems for different numbers of objectives.

In López et al. [8], a methodology based on unsupervised feature selection was proposed to address the *δ*-MOSS and *k*-EMOSS problems. A correlation matrix obtained from the non-dominated set is used to divide the objective set into homogeneous neighbourhoods. Then, based on the idea that if the distance between the objectives is higher, this signifies that those objectives are more conflicting. Thus, only the objectives in the centre of those neighbourhoods are chosen and the others are discarded. The algorithms were validated by comparing the results obtained with those of the reference [7].

Singh et al. [9] proposed an algorithm, designated by the Pareto Corner Search Evolutionary Algorithm (PCSEA) that, instead of searching for the complete Pareto front, searches for the corners of the Pareto front based on a ranking scheme. Those solutions are used to identify the relevant objectives and the others are discarded. Some benchmark problems and two engineering problems were used to show the performance of the methodology proposed.

Deb and Saxena [10] suggested an approach based on Principal Component Analysis (PCA) for the same purpose of objectives reduction, considering the hypothesis that if two objectives are negatively correlated, they are conflicting. In this way, they maintain the objectives that can explain most of the variance in the objective space, which are the most positive and the most negative of the eigenvectors of the correlation matrix. The authors designated this method as PCA-NSGAII. Afterwards, due to the problem of misinterpreting the data when it lies in sub-manifolds, a new proposal is made based on nonlinear dimensionality reduction [11]. For that purpose, the authors developed two new algorithms to replace the linear PCA, one based on correntropy [14] and the other on Maximum Variance Unfolding (MVU). However, the method lacks information on the means by which objective reduction alters the dominance structure, cannot guarantee the preservation of the dominance relation and provides no measure to specify how much the dominance relation changes when objectives are disregarded. The different procedures proposed were applied to solve DTLZ2 and DTLZ5 benchmark problems for different numbers of objectives.

Later, the same group, Saxena et al. [5], proposed a framework for using linear and nonlinear objective reduction algorithms, namely, L-PCA and NL-MVU-PCA, which are based on machine learning techniques, PCA and MVU, to remove the secondary higherorder dependencies in the non-dominated solutions. The idea was very similar to that of given in previous work by the same authors [10,11], but this time, they proposed a reduction of the number of algorithm parameters and an error measure. The algorithms were tested on a broad range of problems and the results were compared with others in the literature. Based on the same methodology, Sinha et al. [15] proposed an iterative procedure to reduce the objectives in which a Decision Maker (DM) chose the best solutions. The methodology was applied to solve some real-world problems, namely storm drainage and car-side impact. Finally, Duro et al. [12] proposed to extend the methodology presented in reference [5] to rank all objectives by a preference order, as well as to solve the *δ*-MOSS and *k*-EMOSS problems, i.e., to obtain the smallest set of objectives that can originate the same POF, and the smallest objective set corresponding to a minimum pre-defined error and the objective sets of a certain size that originates a minimum error.

The main drawback of all these methodologies based on PCA is that they need to use a kernel and, as a consequence, to optimize the kernel parameters. The characteristics of this methodology, NL-MVU-PCA, were compared with the one proposed in the present paper at the end of the next section.

Yuan et al. [13] proposed a methodology based on the use of multi-objective evolutionary algorithms to solve a MOOP formulation. The authors applied this approach to some benchmark problems and two real optimization problems. In both cases, the calculation of the objective functions is based on simple analytical equations where the computational cost is not relevant when compared with the problems that we intend to solve here, which are based on numerical calculation. Therefore, besides performance, this type of methodology will not be explored in the present work.

The present paper aims to propose a method for objectives reduction based on data mining that:


The central aim of the works cited above was to find a reduced set of objectives that could exactly reproduce the results from the original set. Thus, only the redundant objectives could be discarded after a reduction process. That is not the aim of the present work, since our purpose is to apply the proposed methodology to real-world and complex problems where the relations between DVs and the objectives are complex, and the objectives are, in general, partially redundant. Thus, redundancy is not a helpful criterion to eliminate an objective.

For that purpose, a methodology was developed to capture those complex relations and define the relative importance of the objectives based on the determination of the objectives–objectives relations. Doing this makes it possible to determine objectives that can be discarded but with a certain error. In other words, the approximation of the Pareto optimal found (with the reduced number of objectives) has some error when compared with the approximation to the optimal Pareto front (when using all the objectives). Simultaneously, the redundant objectives are also eliminated. Such an approach has at least two significant advantages. First, it aids an optimization algorithm in finding a POF estimate; second, it makes it easier to explain the results found to the DM.

The contents of the paper are as follows: in Section 2, the concepts of machine learning and the methodology proposed are presented; in Section 3, the methodology is tested using some benchmarks; in Section 4, the methodology is applied in a real polymer extrusion problem and the results obtained are discussed and, finally, the conclusions are stated in Section 5.

### **2. Machine Learning Approach**

### *2.1. Concepts*

Bandaru et al. [4] reviewed several proposals from Statistics, Data Mining, and Machine Learning to improve optimization techniques for MaOPs. The approaches usually apply data-driven methods to the solutions in a non-dominated set. The authors arranged the proposals based on the knowledge representation and summarized them into three main classes: (i) Descriptive Statistics, (ii) Visual Data Mining and (iii) Machine Learning itself. Those methods have an origin outside the MOO literature. Thus, they usually are not applied to find properties between variables, objectives, and the non-dominated set. In general, the relatively complex nature of those relations makes their performance inadequate for MaOPs. Other drawbacks relate to some classes of real-world MaOPs that require interactions with a practitioner due to the complexity of the system modelled or for a stakeholder making decisions. Usually, such classes of problems also involve raw or observed data or small datasets (due to the expensiveness of generating, collecting, or simulating samples) involving different data types, varying from continuous to nominal variables. This way, methods that produce explainable models and work with distinct data types are essential for those real-world problems. The strategies proposed by Duro et al. [12] and Bandaru et al. [16] have overcome some of those challenges, including an interactive approach for dealing with two and three objectives and pattern recognition from nominal variables. Another proposal facing those challenges is FS-OPA, initially designed for multidimensional analysis focused on MaOPs. FS-OPA generates explainable (explicit) models, has a relatively low computational cost (aiming at working with high dimensional decisions and objective spaces), and can deal with different data types and their mixtures.

First, this paper compares the principal features of an extension of FS-OPA to the NL-MVU-PCA approach (Duro et al. [12]) for determining the essential objective set. NL-MVU-PCA learns a Kernel matrix by unfolding a high-dimensional data manifold subject to local constraints that preserve the local isometry. Then, eigenvalues are used to identify the principal dimensions that should correspond to a set of conflicting objectives. On the other hand, FS-OPA uses no manifold learning; it maps the problem's fundamental structures into one or more phylograms (not a Cartesian graphical representation). FS-OPA employs data clustering, but not in the usual way, since it instantiates DAMICORE [17], a pipeline with Normalized-compressions distance (NCD), Neighbor-Joining (NJ), and Fast Newman algorithm, that produces intermediate representations enabling the detection of the strongest associations of dimensions. The embedding produced by FS-OPA does not focus on reducing the decision (or objective) space; otherwise, it augments the space by adding new variables, the internal nodes of the phylogram (while the terminal nodes correspond to the original variables). The phylogram construction also searches for preserving the isometry for different neighborhood sizes. Finally, FS-OPA can obtain similar results as manifold learning (i.e., the determination of the essential dimensions) by finding the closest common ancestors in a phylogram (a clade) and the frequency of common ancestors between clades (obtained from several phylograms by data resampling). Such ancestors highlight the principal relationships between variables and/or objectives.

Second, this paper applies the extension FS-OPA to the MOO of extruders, which requires dealing with the relatively poor data from initial populations of an MOEA. In other words, there is an assumption that the solutions belonging to a specific Pareto-optimal front have some characteristics that identify the optimal behavior of the process considered. The critical question is to know if it is possible, from a set of random solutions, as the initial population of an MOEA, to extract information about the complex relationship between the DVs and the objectives and between objectives and objectives. Therefore, the idea is to capture this type of information using data mining methods from multivariate data, independently of its location on the objectives or decision variables spaces, i.e., if the data represents or is not optimal (or near optimal) solutions. Moreover, no distinction between DVs and objectives will be made.

### *2.2. FS-OPA*

The foundations of FS-OPA are based on two methodologies that deal with large-scale and multidimensional data of any type, named DAMICORE [17] and FS-OPA [18]. The latter is a pipeline involving methods from Information Theory, Complex Networks, and Phylogenetic Inference, aiming at revealing hidden relationships of objects from an unstructured (raw) dataset. It runs in three main Steps: (S1) given a metric of similarity, build a distance matrix comparing every two objects; (S2) convert the matrix into a phylogenetic tree by connecting close objects according to hierarchical levels of similarity; (S3) apply a community detection process to group near subtrees into clusters. Figure 1 shows a set of generic objects *xi*. The elements *dij* of the distance matrix correspond to a measure of dissimilarities between objects xi and *xj*, according to some given metric. The matrix is broken down into a tree, where the distance between any two objects (leaves) corresponds to the sum of the lengths of the branches connecting them. Finally, the third step merges objects strongly connected (according to the tree topology) into a community, generating a set of different similarity clusters.

**Figure 1.** The tree-steps of the pipeline DAMICORE (reproduced from [18]).

The first implementation of DAMICORE used three specific algorithms for S1, S2, and S3 (Figure 1), respectively, Normalized Compression Distance (NCD) [19], as it works with for any data type and mixed types; Neighbor-Joining (NJ) [20], widely employed in bioinformatics; and Fast Newman (FN) [21], that constructs a graph partition using a greedy algorithm based on a bottom-up strategy for maximizing the graph modularity function [22]. The pipeline with NCD, NJ, and FN possesses some distinctive properties. NCD makes DAMICORE a data-type agnostic method; in the sense that it works with any object (continuous, discrete, categorical-ordinal, and nominal variables, texts, images, audio, etc.) and a mixture of data types.

DAMICORE has some properties that make it proper for dealing with problems with a low level of previous knowledge, carried out by non-experts, or that would require a large multidisciplinary team of experts. First, it can run without any data pre-processing (such as filtering, outlier detection, feature extraction, parameter setup, and knowledge of the problem domain). Second, it requires no parameters setup to run and is therefore not biased toward arbitrary tuning constants. Naturally, pre-processing steps and some execution options may improve the DAMICORE performance. Its success in such a challenge has been checked for problems in a variety of fields, such as software-hardware co-design [23–25], compiler optimization [26], student profiling in e-learning environments [27,28], identification of phytopathology from sensor data [29], systematic literature review, identification of cross-cut concerns [30], and electrical distribution systems [31].

A Feature Sensitivity (FS) analysis aims to make salient the principal features of a problem (that may differ from selecting the main components), facing common challenges in some classes of real-world problems. For example, the quality of observed data, the database consistency and representativeness, and the discovery of interactions between features and their contributions to each target or objective are hard to check from a raw dataset with low previous domain knowledge. Thus, such a scope differs from those where the standard feature selection algorithms have usually succeeded. Moreover, an FS strategy is expected to aid in learning the fundamental structures of a complex problem from scratch. The learned structures can induce a probabilistic model used by optimization algorithms, such as in the Estimation of Distribution Algorithms [32]. In this research, we use phylogram-based models since they can work with small datasets, they are computationally efficient, and there is an optimization approach designed to use such models: Optimization based on Phylogram Analysis (OPA).

Figure 2 shows a diagram summarizing OPA and its use of the FS analysis. Such a combination is called FS-OPA. The two main FS steps are (A) "Salienting Samples (SS) according to a criterion" and (B) applying DAMICORE to construct a phylogram-based model. SS ranks the samples according to each of the M criteria (or non-dominated fronts), producing the sets of selected samples (Figure 3), denoted BC1 (the samples in the best quantile according to Criterion 1), BC2, ... , BCM. DAMICORE constructs a phylogram (a rough model) from BCi, i = 1, ... , M, generating M models (BC1-based model, ... , BCMbased model). Then, a consensus strategy produces a unified phylogram-based model. An OPA cycle completes when the unified model generates new samples.

**Figure 2.** Diagram of the Optimization based on Phylogram Analysis—OPA.

**Figure 3.** SS procedure that obtains the selected samples is shown in Figure 2.

OPA performance has been verified for relatively complex combinatorial mono- and multi-objective optimization problems [32]. Basic proofs concerning (stochastic) convergence to optima and time–space complexity have been provided [32,33].

### *2.3. Comparison of FS-OPA with NL-MVU-PCA for MaOPs Data-Driven Structural Learning*

NL-MVU-PCA is the primary method used by Duro et al. [12] for finding the essential objective set in MaOPs. Such a scheme also runs PCA based on the objective–function correlation matrix, aiming to improve objectives' preference ranking. On the other hand, NL-MVU-PCA maximizes the variance in objective space while preserving the local isometry (common property in dimensionality reduction through embedding's). NL-MVU-PCA is computationally more complex than PCA since the former solves an optimization problem. The non-linear (NL) approach performs the optimization of the Kernel (Gram) matrix values by minimizing the Maximum Variance Unfolding (MVU) to find the best mapping that preserves the geometric properties of each neighbourhood.

Table 1 synthesizes some relevant properties of NL-MVU-PCA and FS-OPA for MaOPs. The latter analyses three types of associations: variable-variable (producing results similar to the Gibbs measure for Ising Models or Markov Random fields [34]), objective–objective (the dissimilarities, when found, can favour the construction of (non-dominated) front distributions [35]), and the variable–objective (that may benefit inference as Markov Blankets [36]). The former works on the objective space for space reduction to determine the essential objective set [12]. FS-OPA also has other properties that are relevant for some classes of real-world problems: (i) it preserves the original variable space, which favours non-experts interpretability; (ii) it works with any data type (continuous, discrete, categorical—not only ordinal, but also nominal data, addressed by Bandaru et al. [4]) and mixed types (proper for multiple heterogeneous databases with observed data); (iii) it has a relatively low time complexity; (iv) and, finally, it has generated applicable models when applied to learn from small datasets [17,23–31].


**Table 1.** NL-MVU-PCA and FS-OPA for multidimensional data-driven structural learning applied to real-world MaOPs.

\* *M* is the number of objectives, and *q* is the number of clusters; \*\* *l* is the number of variables and objectives, and *n* is the number of data resamples; # Reference [5] shows that one can avoid parameter optimization for a new problem by choosing *q* = *M* − 1 for NL-MVU-PCA.

Reference [5] shows the use of NL-MVU-PCA for a mixed-variable problem, the gearbox problem (with continuous and discrete variables and continuous objectives). NL-MVU-PCA works on the (continuous) objective vectors for the gearbox problem. It differs from the meaning of mixed in Table 1, which relates to both the variable and objective representation (important for the "explicit explainability"), i.e., the mixture may include data vectors simultaneously from both spaces with different types. Moreover, FS-OPA

can naturally work with any number of combinations of data types due to its foundation on NCD.

Concerning Explainability, "Explicit" means to provide a knowledge representation (with clues for "The Why" as the potential influence of variables on objectives) that benefits decision-maker interaction, while "Implicit" refers to the capacity to reveal the objectives' relative importance for an optimization problem, e.g., by ranking them.

The Feature Sensitivity (FS) analysis of FS-OPA aims at finding the variable and/or objective data-driven interactions to construct structural (graph-based) and probabilistic modelling. Probabilistic results are fundamental when dealing with the odds of bias in observed data or small-data sampling. Explainability is also essential for some classes of real-world problems, mainly those concerning decisions by stakeholders. Moreover, a userfriendly tool (instantiating the FS-OPA methodology) is relevant for real-world applications involving practitioners or stakeholders who are not optimization or artificial intelligence experts. Variable–variable and variable–objective interactions can also benefit practitioners' comprehension (The Why), increasing their confidence. Finally, the phylogram-based representation of those interactions has scaled up the understanding of results for some problems with dozens of variables or objectives (note that the interactive data mining approach proposed by Bandaru et al. [4] works with two or three objectives).

Table 1 also shows the time complexity for usual cases and the worst case to estimate the overhead of both procedures. The number of clusters in NL-MVU-PCA relates to the number of constraints to maintain the local isometry (*M q*; but in the worst case *q* = *M* − 1, resulting in M2) [12]. FS-OPA with usual resampling is O(*l* 3) since *<sup>n</sup>* <sup>≤</sup> l (as in leave-one-out resampling) [34]. Moreover, *l* = *M* in a space analysis only uses objectives. Thus, the time complexities of FS-OPA and NL-MVU-PCA have a ratio (*n* + *M*)/*M*<sup>4</sup> (*l*/*q*3) of running time for *l* = *M* in the worst case (in the usual case).

Another relevant factor is the minimal samples required to ensure reliable findings. Usually, the sample size for the PCA-based approach is empirically determined. FS-OPA has a theoretical model to decide the minimal amount of samples that guarantees high confidence in the results, which has been empirically corroborated for relatively complex problems in the decision space of binary variables [32].

### *2.4. FS-OPA Framework*

Figure 4 shows a flowchart of the global procedure of FS-OPA to reduce the number of objectives. Two options exist (i) automatic procedure, and (ii) procedure with the intervention of the DM(s). In the first case, the selection of the number of objectives to be used in the optimization is defined by the program automatically, using the table of the distance between objectives and applying the following rules:


In the second case, the selection is made by the DM(s), using both the phylogram and the table with the distance of objectives–objectives, as follows:


**Figure 4.** The general procedure of FS-OPA for the reduction of the number of objectives.

The reasons for rules 1 and 2 are different: the less distant cluster is the one that transports more information concerning the entire process, since it is near most of the decision variables, while the more distant cluster, besides everything, also has some information about the process that cannot be lost. The idea is that the intermediate clusters, selected by rule 3, have some information regarding the process that is already present in the objectives selected by rules 1 and 2, and thus, the objectives that can be discarded are those that belong to these clusters.

Both cases will be illustrated in the next section using a practical example. However, there are advantages and disadvantages to using one or the other. The first procedure provides the final solution directly, but the DM(s) does not take part in the process, which can imply some discomfort and distrust with the solution found. This does not happen when, after the analysis of the initial population of solutions, the DM(s) is confronted with relevant information about the process and, given these intermediate results, is asked about a possible way to advance. We are facing a situation in which the results may be explainable to the DM(s).

### **3. Examples of Application: DTLZ Benchmark Problems**

A strategy to deal with many-objective real-world complex optimization problems (e.g., those with no explicit objective functions) is prioritizing objectives. In the case of unknown priorities, their relative importance can be estimated from samples of the decision space, as proposed in this paper. Such prioritization has a certain resemblance to the problem of determining the essential objective set, since a redundant objective has low priority.

The DTLZ problems (with and without redundant objectives) have been used to test the method's capacity to find such a set and to evaluate algorithms for many-objective optimization.

Some algorithms have succeeded in finding the set from samples in POF, near POF, or, for example, from the last generation of an NSGAII run, although more recently, some of them failed for new challenging problems with other types of redundancies, as shown in [37]. This way, evaluating how much FS-OPA can estimate objectives' relevance for DTLZs from a random population (or from the first fronts of it) may be useful, since they are well-known problems.

Figure 5A illustrates an FS-OPA output for unconstrained DTLZ5, also used by Duro et al. [12] for explaining the capacity of their method to find redundant objectives (objectives *f* 1, *f* 2, *f* 3, *f* 4, *f* 5, *f* 6, *f* 7, *f* 8, and *f* <sup>9</sup> are linearly correlated in DTLZ5). A random population of size 31 with samples normalized and Euclidian distance was used to obtain a distance matrix. SS procedure in Figure 3 was not applied. The output of Figure 5A shows variables and objectives arranged into a phylogram with leaf nodes (the objects

### under analysis) composing clusters (similarly to the end of the pipeline in Figure 1)—they are identified by the same color.

**Figure 5.** Phylogram and the clusters found: (**A**) for unconstrained DTLZ5 with 10 objectives and (**B**) for constrained DTLZ5 (2,10).

Objective functions *f* 1, ... , *f* <sup>9</sup> are partitioned into three neighbor clusters ({*f* 1, *f* 2}, {*f* 3, *f* 4, *f* 5, *f* <sup>6</sup>*},* and *{f* 7, *f* 8, *f* 9}) in the phylogram structure; while *f* <sup>10</sup> is together with the leaf nodes, corresponding to variables. The phylogram structure aggregates *f* 1, ... , and *f* <sup>9</sup> into the same subtree, while *f10* is isolated from the other objectives in the complementary subtree. The unique node with the label "100" (another type of result from a tree consensus) splits the phylogram into those two subtrees. Such a label ("100") means that the leaf nodes *f* <sup>10</sup> and *x*1, ..., *x*10, and *f* <sup>10</sup> were in the same subtree (with the remaining leaf nodes in the complementary subtree) in 100% of all the constructed phylograms, independently of each subtree topology in a phylogram. Such an interpretation suggests a hypothesis: *f* <sup>10</sup> is weakly correlated to the other objectives, which are significantly associated with themselves. Thus, *f* <sup>10</sup> and one of the other objectives could compose an essential objective set; this result is consistent with the DTLZ5 problem structure.

Figure 5B shows the proposed phylogram for DTLZ5(2,10) with constraints (Saxena et al. [5]). It requires an additional variable, *x11*, to generate samples outside POF, as samples used to construct a phylogram from Figure 5A. The phylogram from Figure 5B shows that *f* <sup>10</sup> is isolated in a subtree, while *f* 1, ... , *f* <sup>9</sup> are in the complementary subtrees. Such a result suggests that *f* <sup>10</sup> and *f* <sup>1</sup> (for example) would enable proper POF estimates; this result agrees with the DTLZ5(2,10) problem structure.

Figure 6 shows the phylograms obtained by FS-OPA for DTLZs 1–4 obtained from random populations of size 31 as a way to check if the FS-OPA clues about the objective relationships are plausible.

**Figure 6.** Phylogram and the clusters found for DTLZ1 to DTLZ4 with 10 objectives.

Given that these problems do not have redundant objectives, the unique possibility is to present some clue about the prioritization of objectives, considering that a reduction in the number of objectives only can be made with a certain error, as explained before. For example, the behaviours of functions DTLZ1, DTLZ2, and DTLZ3 are very similar. The simultaneous analysis of the clusters found and of the distances between objectives and the decision variables show that objectives can be portioned in the following sets:


This signifies that a possible hierarchization of the objectives for these problems can be made by selecting, in the first step, a single objective of the groups identified above and then, by selecting all the others to a second level.

In addition, all the objectives in the phylograms found for DTLZ2 and DTLZ4 are not in a subtree without a variable. That may mean that the disagreement of objectives of those two problems is more salient from an initial random sampling.

However, the objective of this paper is not only to define the minimum number of objectives that can be used without error but also to identify the situations where the reduction can be done with a certain error. Anyway, a deep analysis will be necessary here, which is outside of the scope of the present paper.

The FS-OPA also produces other outputs (useful for human comprehension of some classes of real-world problems), which are explored in Sections related to the extrusion problem.

### **4. Polymer Extrusion Problem**

### *4.1. The Problem to Solve*

To demonstrate the complexity of this system regarding the modelling program and the interrelations between the decision variables and the objectives, some details are given here. However, the system is much more complex, as can be seen in the following references [38–41].

Figure 7A shows an axial cut of the extruder and die fitted with a barrier screw. The sequence of the physical phenomena developing typically along the screw is also

represented, and comprises [38–40]: (i) gravity conveying of the solid material in the hopper; (ii) drag solids conveying in the first screw turns; (iii) development of a thin film of melted material separating the solids from the surrounding metallic walls; (iv) melting of the solid plug, with physical separation of the solid plug from the melt pool; (v) melt conveying following a relatively complex regular helical flow pattern; vi) pressure flow through the die. Figure 7B shows the complex flow pattern quantified by the velocity fields and the temperature profile in the Conventional Screw (CS) and Maillefer Barrier Screw (MBS), while Figure 7C shows the complete system geometry used in the calculations.

**Figure 7.** Single screw extrusion: (**A**) plasticating phases; (**B**) melting mechanism in CS (left) and MBS (right); (**C**) specific system geometry used in the calculations.

The aim is to determine if the best solution is to use a CS or an MBS for fixed and/or for changing operating conditions and, simultaneously, to optimize the corresponding geometry.

The following equations represent the momentum and energy equations for the melted region of the channel (melting and melt conveying in Figure 7), which resulted from some specific simplifications of the general tri-dimensional (3D) set of equations. These equations were solved numerically, considering a 2D space representing the cross-screw channel (X and Y directions) for small increments along the channel (Z direction). However, it is necessary to note that all the regions identified above and in Figure 7 have different thermomechanical models that must be put together using the appropriate boundary conditions. This is a very complex system in which the polymer properties, the operating conditions of the machine, and the screw geometry contribute in a complex way to measure the process performance quantified by the objectives (see Table 2).

**Table 2.** Optimization objectives, aim of optimization and range of variation.


For example, to only illustrate the complexity of this process and the corresponding numerical modelling, Equation (3) shows the melt rate per unit of channel length (Φ) that represents the quantification of solids material that changes the physical state to melt in each of the increments along the screw channel. However, we must take into account that it is an analytical model that resulted from further simplifications of Equations (1) and (2). For more details of the model used, the reader is referred to references [41–43].

$$\frac{\partial P}{\partial z} = \frac{\partial}{\partial y} \left( \eta \frac{\partial V\_z}{\partial x} \right) + \frac{\partial}{\partial y} \left( \eta \frac{\partial V\_z}{\partial y} \right) \tag{1}$$

$$\rho\_{\rm m} \gets\_{s} V\_{\bar{z}}(y) \frac{\partial T}{\partial z} = k\_{\rm m} \left( \frac{\partial^2 T}{\partial x^2} + \frac{\partial^2 T}{\partial y^2} \right) + \eta \cdot \dot{\gamma}^2 \tag{2}$$

$$\Phi = \left( \left\{ \frac{V\_{\rm lx} \rho\_m \left[ k\_m (T\_b - T\_m) + \frac{\eta}{2} V\_j^2 \right]}{2 \left[ \mathbb{C}\_s (T\_m - T\_{s0}) + \mathbb{C}\_m \left( T\_{w \chi} - T\_m \right) + h \right]} \right\} \right)^{1/2} \tag{3}$$

The variables in these equations represent the polymer properties, operating conditions and flow variables: *ρ<sup>m</sup>* is the melt density, *km* is the melt thermal conductivity, *h* is the melting entropy, *Cm* and *Cs* are specific heat of melt and solids, respectively, *Tm* is the melting temperature, *η* is the melt viscosity, *Tso* and *Tc* are the solids and the barrel temperatures, . *γ* is the shear rate, *T* is the melt temperature in each node of the mesh, *Tavg* is the average temperature of the melt, *Vz* is the melt velocity in the Z direction, *Vs* is the solid velocity in the y direction, and *Vbx* is the barrel velocity in the X direction.

Therefore, the performance of the process depends on the polymer properties, machine operating conditions and geometry. In the present example, a Low-Density Polyethylene (LDPE) is used, and for the operating conditions, two situations are considered, as shown in Table 3, i.e., in some cases, they are fixed, and in one of the cases, they are also considered as a DVs. The DVs are the operating conditions and the geometrical parameters as identified in Tables 3 and 4, respectively.

**Table 3.** Cases studied for LDPE—only in case 7 the operating are used as decision variables.



**Table 4.** Geometrical parameters of both CS and MBS screws.

The performance of the machine was quantified using six objectives, two to maximize (output and degree of mixing) and four to minimize (length of screw required to melt the polymer, melt temperature at the exit, mechanical power consumption required to rotate the screw, and viscous dissipation quantified as the ratio between the melt temperature and the fixed barrel temperature), as shown in Table 2.

The geometrical parameters involved in the description of both types of screws are shown in Table 4. Since only one screw can be used each time in the machine, an additional decision variable was added, identified as "case," to trigger the decision variables corresponding to one of the types of screws, i.e., when case ranges in the interval [0.0, 0.5] the decision variables of the conventional screw are used, while when case ranges in the interval [0.5, 1.0], the other screw is considered. Consequently, the total number of decision variables is 15.

For each case studied (Table 3), 11 optimization runs are made for statistical comparison using the hypervolume (HV) and the Inverted Generational Distance (IGD).

### *4.2. Results and Discussion*

The FS-OPA analysis for Cases 1 and 4 are presented in Figure 8 and Tables 5 and 6. The results were very similar, generating the same three groups of objectives, (Q, L), (Power, WATS), and (T, TTb). The application of the methodology defined in Section 2.4 allows for identifying the objectives Q, Power, WATS, and T to be used in the optimization after reduction (see Tables 5 and 6): (i) the objectives with lower distance. Power and WATS; (ii) one objective of the cluster with higher distance, T; and (iii) one objective of the remaining cluster, Q. It is clear, also, that instead of T, it is possible to select TTb, and instead of Q, it is possible to select L.

**Figure 8.** Phylograms for Cases 1 and 4 (Table 3).


**Table 5.** Distances between the objectives for Case 1.

**Table 6.** Distances between the objectives for Case 4.


To assess the capacity of using only the four objectives selected, the optimization results obtained using SMS-EMOA provided with the problem with these four objectives will be compared with the case with the initial six objectives using the Pareto-optimal fronts obtained after 100 generations for a population of 100 individuals in each generation and 11 runs with different seeds values are made for statistical comparison. Additionally, this comparison will be made with a situation with three objectives one of each of the clusters found, specifically Q, WATS, and T.

Figures 9 and 10 show the Pareto-optimal fronts found in each one of the cases (Case 1 and Case 2) using the three sets of objectives: (i) all objectives; (ii) objectives Q, Power, WATS, and T; (iii) objectives Q, WATS, and T. The results are, apparently, very similar when comparing the cases with six and four objectives. In the other situation, with three objectives, the multi-objective optimization algorithm is clearly lost, since the final solution found alternates in the different runs between one type of screw and the other (i.e., between the CS and the MBS). The results for Cases 2 and 3 are very similar to those presented here and, thus, no specific discussion is made here.

By using the 11 runs performed for each case studied, the Hypervolume (HV) and the Inverted Generational Distance (IGD) were applied and the results are presented in Table 7, where it is possible to see the average and the percentage of losses when the number of objectives is reduced [42–44]. To calculate IGD, all Pareto-optimal solutions found in each run were put together in a pool and the non-dominated solutions of this pool were used for comparison.

As shown in Table 7, it is possible to conclude that the use of four objectives (Q, Power, WATS, and T) does not significantly deteriorate the final solutions found, the maximum difference found is 11.6%, which, for a process like the extrusion process, and taking into account a final population of 100 solutions, is not expressive. Additionally, the differences in the IGD value are too small, indicating that at least the solutions found for the case of four objectives are near the best solution found in the 11 runs. The results found for the situation with three objectives corroborate the results shown in Figures 9 and 10.

**Figure 9.** Pareto-optimal fronts after 100 generations for the pair of objectives identified in Figure 8 for Case 1.

**Figure 10.** Pareto-optimal fronts after 100 generations for the pair of objectives identified in Figure 8 for Case 7.


**Table 7.** Performance comparison using Hypervolume and IGD for the total number of objectives and the automatic reduction to four and three objectives (between brackets the standard deviation, and loss percentage relative to six objectives) for the four cases studied.

Finally, it is important to point out that during this process, the DM(s) play an important role in the procedure. Indeed, they have some intervention when selecting the objectives. For example, it is necessary to opt for Q or L, two objectives from the same cluster having apparently the same importance in the process. In this case, an informed DM will make the option for Q because this objective is the output of the machine and is directly linked with the economic issue of the problem, while L is the length for melting that is related to the quality of the product obtained; however, this quality is also quantified by WATS, which was already selected by the algorithm. This example shows the importance of the DM(s) that simultaneously help the optimization process and are informed about the process of obtaining the results.

### **5. Conclusions**

A methodology for reducing the number of objectives for many-objective optimization problems using population-based algorithms is proposed. This approach, based on machine learning, is an improvement over similar state-of-the-art methodologies; namely, it allows analysis of the relations variable–variable and variable–objective relations (and not only objective–objective), does not need kernel function choice and parameters optimization, allows for obtaining explainable solutions to assist the decision maker with interpreting the results, its time complexity is also low, and it supports theoretical and empirical sample sizes.

The approach showed its potential to reduce the number of objectives by capturing the complex relations between the different objectives with an additional possibility, which is to capture the objective-variable relations. This is done by applying the methodology to a set of benchmark and real-world problems. The comparison of the Pareto-optimal fronts obtained with another machine learning approach in the literature allows for the conclusion that its performance is very competitive, but with the great advantage of being much easier to use. Additionally, there is the possibility of strong interaction with the DM(s).

The application of the proposed approach to a difficult real-world problem has proven that it is automatically possible to reduce the number of objectives by losing only around ten percent of the Pareto-optimal frontier obtained, for the case of 100 individuals in the population. The use of a second possibility, which is to require the intervention of the decision maker during the process, e.g., when selecting the objectives to be considered in the optimization, can be very useful because the person interested can see how the process works and interpret the results obtained. Finally, an important characteristic of the method proposed is the capacity to explain the solutions found.

**Author Contributions:** Conceptualization, A.G.-C., F.M. and A.D.; methodology, A.G.-C., F.M. and A.D.; software, P.C.; investigation, A.G.-C. and A.D.; resources, A.G.-C.; data curation, A.G.-C.; writing—original draft preparation, A.G.-C. and A.D.; writing—review and editing, A.G.-C.; visualization; supervision, A.G.-C. and A.D.; project administration, A.G.-C. and A.D.; funding acquisition, A.G.-C., F.M. and A.D. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by POR Norte under the PhD Grant PRT/BD/152192/2021. The authors also acknowledge the funding by FEDER funds through the COMPETE 2020 Programme and National Funds through FCT (Portuguese Foundation for Science and Technology) under the projects UID-B/05256/2020, and UID-P/05256/2020, the Center for Mathematical Sciences Applied to Industry (CeMEAI) and the support from the São Paulo Research Foundation (FAPESP grant No 2013/07375-0, the Center for Artificial Intelligence (C4AI-USP), the support from the São Paulo Research Foundation (FAPESP grant No 2019/07665-4) and the IBM Corporation.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

### *Article* **Single-Loop Multi-Objective Reliability-Based Design Optimization Using Chaos Control Theory and Shifting Vector with Differential Evolution**

**Raktim Biswas † and Deepak Sharma \*,†**

Department of Mechanical Engineering, Indian Institute of Technology Guwahati, Guwahati 781039, Assam, India

**\*** Correspondence: dsharma@iitg.ac.in; Tel.: +91-361-2582661

† These authors contributed equally to this work.

**Abstract:** Multi-objective reliability-based design optimization (MORBDO) is an efficient tool for generating reliable Pareto-optimal (PO) solutions. However, generating such PO solutions requires many function evaluations for reliability analysis, thereby increasing the computational cost. In this paper, a single-loop multi-objective reliability-based design optimization formulation is proposed that approximates reliability analysis using Karush-Kuhn Tucker (KKT) optimality conditions. Further, chaos control theory is used for updating the point that is estimated through KKT conditions for avoiding any convergence issues. In order to generate the reliable point in the feasible region, the proposed formulation also incorporates the shifting vector approach. The proposed MORBDO formulation is solved using differential evolution (DE) that uses a heuristic convergence parameter based on hypervolume indicator for performing different mutation operators. DE incorporating the proposed formulation is tested on two mathematical and one engineering examples. The results demonstrate the generation of a better set of reliable PO solutions using the proposed method over the double-loop variant of multi-objective DE. Moreover, the proposed method requires 6×–377× less functional evaluations than the double-loop-based DE.

**Keywords:** multi-objective reliability-based design optimization; shifting vector approach; reliability analysis; chaos control theory; differential evolution

### **1. Introduction**

The design optimization mostly keeps design variables and parameters deterministic. It ignores the fact that uncertainties can arise owing to manufacturing variations, dimensional inaccuracy, boundary conditions, material properties, and improper loading conditions, which can lead to the infeasibility of the solution obtained through deterministic optimization. Therefore, it is necessary to consider these uncertainties in designing the process to maintain safety and the quality of the solution. Reliability-based design optimization (RBDO) [1,2] is a mathematical tool that is used for obtaining such reliable optimal solutions for problems involving uncertainties. It also enables engineers to identify solutions effectively for complex applications in the fields of the automotive, civil, mechanical, and aerospace industries [3,4]. In RBDO, the uncertainties are manifested by converting the deterministic constraints to probabilistic constraints. This is accomplished by applying a probability operator to performance functions or to limit-state functions in the literature. A generalized single-objective RBDO formulation is given in Equation (1).

> Minimize *f*(*μ***X**), subject to *<sup>P</sup>*[*Gi*(**X**) <sup>≥</sup> <sup>0</sup>] <sup>≤</sup> *<sup>P</sup><sup>T</sup> fi* <sup>=</sup> <sup>Φ</sup>(−*β<sup>T</sup> <sup>i</sup>* ), *i* = 1, . . . , *I*, *μ*(L) **<sup>X</sup>** <sup>≤</sup> *<sup>μ</sup>***<sup>X</sup>** <sup>≤</sup> *<sup>μ</sup>*(U) **<sup>X</sup>** , (1)

**Citation:** Biswas, R.; Sharma, D. Single-Loop Multi-Objective Reliability-Based Design Optimization Using Chaos Control Theory and Shifting Vector with Differential Evolution. *Math. Comput. Appl.* **2023**, *28*, 26. https://doi.org/ 10.3390/mca28010026

Academic Editors: Carlos Coello, Erik Goodman, Kaisa Miettinen, Dhish Saxena, Oliver Schütze and Lothar Thiele

Received: 3 July 2022 Revised: 30 January 2023 Accepted: 15 February 2023 Published: 17 February 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

where *f*(*μ***X**) is the objective function, *Gi*(**X**) is the *i*-th performance/constraint function, and *<sup>μ</sup>***<sup>X</sup>** is the mean value vector of random variable vector **<sup>X</sup>** <sup>∈</sup> <sup>R</sup>*n*, where *<sup>n</sup>* is the number of random design variables. *L* and *U* in the superscript of *μ***<sup>X</sup>** represent the lower and upper limits of the vector. Φ(·) represents the standard normal cumulative distribution function, *βT <sup>i</sup>* is the target reliability index of the *i*-th performance function, and *P*[·] is the probability operator that represents the failure probability of performance function (*Gi*(**X**) ≥ 0) that should be less than the target failure probability (*P<sup>T</sup> fi* ).

Equation (1) demonstrates that solving a single-objective RBDO requires a nested-loop procedure [2], where the outer optimization loop involves the inner-loop for reliability analysis. The reliability analysis can be performed using simulation-based methods [5] and analytical methods [6] on probabilistic performance function to obtain its failure probability. The simulation-based methods show better accuracy with an expense of computational cost [7], such as Monte Carlo simulation (MCS) [5], subset simulation [8], importance sampling [9], and Latin-hypercube sampling [9]. On the other hand, analytical methods are known for their computational efficiency, such as most-probable point (MPP)-based methods, in which the sub-optimization problem is solved for each performance function to obtain their respective MPP. The MPP-based methods can be broadly divided into the performance measurement approach (PMA) [10] and the reliability index approach (RIA) [6]. The optimum solution obtained using PMA and RIA is known as the most probable target point (MPTP) and the most probable failure point (MPFP), respectively. Many advanced methods have been developed to estimate the MPTP and MPFP of performance functions, and they are categorized as double-loop methods, decoupled-loop methods, and singleloop methods.

The classical double-loop methods [11,12] involve a nested optimization loop, where the inner-loop performs reliability analysis and the outer-loop is used for obtaining design solutions. All the random variables are transformed to standard normal variables [13] for performing reliability analysis. Since the nested optimization loop is computationally expensive, the reliability analysis loop (inner-loop) is decoupled and performed separately in decoupled-loop methods [14–17]. Some advanced and efficient reliability-based frameworks were also proposed based on isogeometric analysis [18,19]. The reliability analysis itself is considered as an computationally expensive procedure. Therefore, single-loop methods [20] have been proposed, in which approximate reliability analysis is performed. Different concepts such as Karush-Kuhn Tucker (KKT) conditions and quantile approximation are used to approximate MPTP that can eliminate the reliability analysis loop. The adaptive conjugate single-loop approach (AC-SLA) [21], the enhanced single-loop method (ESM) [22], the chaotic single-loop approach (CSLA) [23], the single-loop shifting vector method (SLShV-CG) [24], the sequential single-loop reliability optimization and confidence analysis method (SROCA) [25], and the approximate single-loop chaos control method (ASLCC) [26] are a few recently developed single-loop methods. Recently, some efficient evolutionary RBDO methods are also proposed to obtain the global reliable solution [27,28].

It has been found that many real-world engineering problems consist of more than one objective, which are conflicting in nature [29], and can also have uncertainties. Evolutionary algorithms are found to be promising for solving deterministic multi-objective optimization problems (MOOPs) because they can generate Pareto-optimal (PO) solutions in one run. However, these evolutionary algorithms need to be modified for generating reliable PO solutions for multi-implemented as a design optimization algorithm, and inverse reliability was performed. objective reliability-based design optimization (MORBDO) problems. To address uncertainty in MORBDO, Deb et al. [3] used a non-dominated sorting genetic algorithm (NSGA-II) [30] for design optimization, and Fast RIA for reliability analysis. A multi-objective differential evolution (MODE) [31] was also Simulation-based techniques are also used for reliability analysis and are coupled with double-loop methods. For example, a radial basis function was used for approximating the responses of the

performance function and was coupled with MCS to implement reliability analysis. NSGA-II was used to obtain PO solutions for solving the multi-objective and multi-case [32] RBDO problem. In another study, MCS and NSGA-II were coupled with entropy weighted grey relational analysis for design optimization [33] to solve the control arm problem. The multi-objective optimization design of the control arm was carried out using the Kriging surrogate model. Sun et al. [34] proposed a radial basis function-based surrogate modeling that was implemented with Latin-hypercube sampling for sensitivity analysis. MCS and multi-objective particle swarm optimization (PSO) were coupled for obtaining the reliable PO solutions. In another study, a multiple response surface method-based artificial neural network was implemented for reliability analysis [35], and a dynamic multi-objective particle swarm optimization algorithm was proposed for obtaining PO solutions. A worst-case scenario was used with fuzzy sets for reliability analysis, and a realcoded population-based incremental learning [36] was implemented with DE for obtaining the PO solutions. A multi-objective robust optimization [37] was proposed, in which the design problems consisted of parametric uncertainties involving both random and interval variables. NSGA-II was implemented to generate robust PO solutions, and MCS was performed to evaluate the impact responses of the mixed uncertainties. Constrained NSGA-II was also implemented to solve the MORBDO problem [38]. It was coupled with the hybrid method using the Kriging surrogate metamodel for reliability analysis.

A time-dependent reliability-based robust design optimization (TRBRDO) problem [39] was solved using NSGA-III [40] and the dimension reduction method. It was developed by constructing an extreme value model using the sparse grid-based stochastic collocation method for time-dependent reliability analysis. A Bayesian multi-objective RBDO [41] was proposed to solve problems involving aleatory and epistemic uncertainties. Multi-objective PSO was implemented for obtaining PO solutions, and Bayesian interference was used for reliability analysis. Another method using nested loop was proposed to solve RBDO problems [42], in which the outer-loop was performed using multi-objective PSO, and the inner-loop was solved using surrogate modeling with MCS sampling. A two-layer nested optimization problem was proposed based on a decoupling strategy. The inter-generation projection genetic algorithm was employed in the inner-loop, and the multi-objective genetic algorithm [43] was implemented at the outer-loop for solving the MORBDO problem. Another multi-objective RBDO [44] was solved by converting it into a single-objective RBDO problem. This was achieved by assigning weights to the objectives based on quantitative analysis and evidence theory. The reliability analysis was estimated using the PMA method.

From the literature, it can be seen that most of the MORBDO methods focus on PMA, RIA, MCS, or surrogate modeling for reliability analysis, and they are based on double-loop or decoupled-loop methods, which make them computationally expensive. Since evolutionary algorithms are population-based methods and require many functional evaluations, a single-loop method for solving MORBDO can improve the computational efficiency. Moreover, single-loop methods that are solved using steepest descent search to estimate MPTP are often stuck with periodic oscillation [26,45] for highly nonlinear functions. This leads to the motivation of this paper, in which a new MORBDO formulation is proposed, based on adaptive multi-objective DE. An adaptive mutation scheme is used for selecting different variants of mutations for exploration in the search space. Both trial and target vectors take part in the MORBDO formulation to estimate the reliable PO solutions. The following are the contributions of the paper.


• The formulation is further developed by incorporating target and trial vectors of differential evolution for better exploration of the search space.

The proposed method is tested on three benchmark examples from the literature. The results are compared with a double-loop variant of multi-objective differential evolution using PMA for reliability analysis.

The organization of the paper is as follows. In Section 2, a brief discussion on multiobjective RBDO, PMA, chaos control method, single-loop method, and shifting vector approach are presented. The proposed single-loop multi-objective reliability-based design optimization method is discussed in Section 3, along with its implementation. The adaptive mutation scheme and the detailed steps of multi-objective differential evolution are also discussed in this section. Numerical examples are solved and discussed in Section 4. Finally, the paper is concluded in Section 5 with a note on future work.

### **2. Preliminaries**

*2.1. Multi-Objective Reliability-Based Design Optimization*

A generalized MORBDO formulation can be written as

$$\begin{array}{ll}\text{Minimize} & f\_{\mathfrak{m}}(\boldsymbol{\mu}\_{\mathbf{X}}), & m = 1, \ldots, M, \\ \text{subject to} & P[G\_i(\mathbf{X}) \ge 0] \le P\_{f\_i}^T = \Phi(-\boldsymbol{\beta}\_i^T), & i = 1, \ldots, I, \\ & \boldsymbol{\mu}\_{\mathbf{X}}^{(\mathcal{L})} \le \boldsymbol{\mu}\_{\mathbf{X}} \le \boldsymbol{\mu}\_{\mathbf{X}}^{(\mathcal{U})}, \mathbf{X}^{(\mathcal{L})} \le \mathbf{X} \le \mathbf{X}^{(\mathcal{U})}, \end{array} \tag{2}$$

where *fm*(·) is the *m*-th conflicting objective function that is written using the mean value (*μ***X**) of the random variable (**X**). **<sup>X</sup>**(L) and **<sup>X</sup>**(U) are the upper and lower limits on **<sup>X</sup>**. Solving Equation (2) generates a set of reliable PO solutions in the design space. The reliability analysis is performed on the probabilistic performance function to estimate the failure probability by solving a multidimensional integral, as given in Equation (3).

$$P\_{f\_i} = P[\mathbb{G}\_i(\mathbf{X}) \ge 0] = \int \cdots \int\_{G\_i(\mathbf{X}) \ge 0} f\_{\mathbf{X}}(\mathbf{X}) d\mathbf{X} \tag{3}$$

where *f***X**(**X**) is the joint probability density function of **X**. Solving this multidimensional integral is difficult, and therefore, it is approximated with reliability analysis [7]. The firstorder reliability method (FORM) [6] and second-order reliability method (SORM) [46] are analytical methods for reliability analysis. Both FORM and SORM estimate the reliability index *β* that represents the minimum distance from the origin to the performance function in the standard normal space. The reliability index *β* can be obtained by solving a suboptimization problem, and the reliability (R) can be estimated using Φ(*β*) (R = 1 − *Pf* = 1 − Φ(−*β*) = Φ(*β*)). Due to its computational efficiency and stability in generating a reliable solution, PMA is widely used to solve the sub-optimization problem [47].

### *2.2. Performance Measure Approach (PMA)*

PMA estimates the failure probability of performance function *G*(**X**) by finding MPTP in the standard normal space (*U*-space). After transforming *G*(**X**) to the *U*-space using the Rosenblatt transformation [13], the MPTP can be estimated using the steepest descent direction. When all the random variables are independent, the joint cumulative distribution function (CDF) is calculated via the product of the marginal CDFs. The Rosenblatt transformation is given as

$$\Phi(\boldsymbol{u}\_i) = \mathcal{F}\_{\boldsymbol{X}\_i}(\mathbf{x}\_i) \implies \boldsymbol{u}\_i = \Phi^{-1}(\mathcal{F}\_{\boldsymbol{X}\_i}(\mathbf{x}\_i)),\tag{4}$$

where *FXi* (*xi*) is the marginal CDF of *Xi* and Φ(·) is the CDF of the standard normal random variable. After transforming variables to the standard normal space by using Equation (4), MPTP is calculated by performing the following sub-optimization problem.

$$\begin{array}{ll}\text{Minimize} & G(\mathbf{U}),\\\text{subject to} & ||\mathbf{U}|| = \boldsymbol{\beta}^T.\end{array} \tag{5}$$

where **U** is the random variable in the standard normal space, and *β<sup>T</sup>* is the target reliability index for the performance function *G*(**U**). To efficiently obtain the optimum solution of Equation (5), the advanced mean value algorithm is used and the expression is presented in Equation (6).

$$\mathbf{U}^{(k+1)} = \boldsymbol{\beta}^T \frac{\nabla G(\mathbf{U})}{||\nabla G(\mathbf{U})||}. \tag{6}$$

If the performance function value at MPTP is less than or equal to zero, it is satisfied for the given target reliability, as presented in Equation (2).

### *2.3. The Chaos Control Method*

It has been observed that PMA performs well for simple nonlinear performance functions, but it fails to converge for highly nonlinear performance functions. To overcome this issue, chaos control theory [45] was proposed based on a stability transformation method [48]. The modification is achieved while updating the iterative point **U**(*k*+1) of Equation (5). The formulation for estimating the iterative point via the chaos control (CC) method is as follows.

$$\begin{aligned} \mathbf{U}\_{\mathbf{CC}}^{(k+1)} &= \mathbf{U}\_{\mathbf{CC}}^{(k)} + \lambda \mathbf{C} [\mathbf{F}(\mathbf{u}^{(k)}) - \mathbf{U}\_{\mathbf{CC}}^{(k)}], \\ \mathbf{F}(\mathbf{u}^{(k)}) &= \mathbf{U}^{(k+1)} = \boldsymbol{\beta}^T \frac{\nabla G(\mathbf{U})}{||\nabla G(\mathbf{U})||}, \end{aligned} \tag{7}$$

where U(*k*) *CC* is the MPTP calculated using CC method in the *k*-th iteration; **C** is the involutory matrix with only one element in each row and is assumed as identity matrix **I** for simplicity. The matrix **C** is usually selected to stabilize the unstable fixed point of the chaotic dynamical system in Equation (7). The chaos control factor *λ* is determined according to the eigenvalues of the original system's Jacobian matrix, and the value is considered within interval [0, 1]. When *λ* is considered as one, the formulation of the CC method is similar to Equation (5) and can have the same issue as discussed earlier. Therefore, a small value of *λ* is considered for stable convergence. **F** is the vector of the response function that is estimated via nonlinear mapping with respect to the iterative values of **U**(*k*+1) , as shown in Equation (7). Although the CC method eliminates the issue of oscillation in the convergence of MPTP, it is considered to be an inefficient process. Therefore, a modified chaos control (MCC) [12] was proposed. The modification is achieved by extending the iterative search to the *β*-hypersphere that is at the constraint boundary in the standard normal space. Thus, MPTP is located on the constraint boundary, and convergence is improved by controlling the tangential step size instead of the radial step size, which was the case for the CC method. The formulation of MCC is given as

$$\begin{aligned} \boldsymbol{\tilde{\mathfrak{n}}}^{(k+1)} &= \mathbf{U}\_{\text{CC}}^{(k)} + \lambda \mathbf{C} [\mathbf{F}(\mathbf{u}^{(k)}) - \mathbf{U}\_{\text{CC}}^{(k)}], \\ \mathbf{U}\_{\text{MCC}}^{(k+1)} &= \boldsymbol{\beta}^{T} \frac{\boldsymbol{\tilde{\mathfrak{n}}}^{(k+1)}}{||\boldsymbol{\tilde{\mathfrak{n}}}^{(k+1)}||}. \end{aligned} \tag{8}$$

where **<sup>n</sup>**˜ *<sup>k</sup>* is the modified search direction updated using **<sup>U</sup>**(*k*+1) *CC* of Equation (7). **<sup>U</sup>**(*k*+1) *MCC* is the MPTP evaluated using the MCC method.

### *2.4. Single-Loop Method*

The single-loop method (SLM) [20] has been proposed to approximate the reliability analysis of the double-loop method, and establish an equivalent deterministic performance function that is computationally efficient. The approximate MPTP is estimated by using the KKT optimality conditions of Equation (5), and is given in Equation (9).

$$
\nabla G(\mathbf{U}) - \hat{\lambda}\nabla H(\mathbf{U}) = 0,\tag{9}
$$

where *<sup>λ</sup>*<sup>ˆ</sup> is the Lagrange multiplier, and *<sup>H</sup>*(**U**) = **<sup>U</sup>** <sup>2</sup> <sup>−</sup> *<sup>β</sup><sup>T</sup> i* <sup>2</sup> after squaring both sides of the equality constraint of Equation (5). Using Equation (9) and ∇*H*(**U**) = 2**U** yields <sup>∇</sup>*G*(**U**) <sup>−</sup> <sup>2</sup>**U***λ*<sup>ˆ</sup> <sup>=</sup> 0. After simplification, **<sup>U</sup>** can be written as <sup>∇</sup>*G*(**U**) <sup>2</sup>*λ*<sup>ˆ</sup> , and multiplying it with ∇*G*(**U**) in the numerator and denominator, and further simplifying, we obtain

$$\mathbf{U} = \frac{||\nabla G(\mathbf{U})||}{2\lambda} \frac{\nabla G(\mathbf{U})}{||\nabla G(\mathbf{U})||} = \boldsymbol{\beta}^T \mathbf{a}\_\prime \tag{10}$$

where *α* = <sup>∇</sup>*G*(**U**) <sup>∇</sup>*G*(**U**) is the unit gradient direction, and *<sup>β</sup><sup>T</sup>* <sup>=</sup> <sup>∇</sup>*G*(**U**) <sup>2</sup>*λ*<sup>ˆ</sup> is a constant at the optimal solution **U**∗. The gradient is calculated in *U*-space and the random design variables lie in the *X*-space. Therefore, the transformation from *X*-space to *U*-space is used for the evaluation of approximate MPTP, using the following relationship.

$$
\mathbf{X} = \mu\_{\mathbf{X}} + \sigma\_{\mathbf{X}} \mathbf{U},
\tag{11}
$$

where *σ***<sup>X</sup>** is the standard deviation of **X**. Substituting **U** from Equation (10) in Equation (11) and using the chain rule, we obtain MPTP in the *X*-space as

$$\mathbf{X}\_{\rm MPTP} = \mu\_{\mathbf{X}} + \sigma\_{\mathbf{X}}\beta\mathbf{a} = \mu\_{\mathbf{X}} + \sigma\_{\mathbf{X}}\beta^T \frac{\sigma\_{\mathbf{X}}\nabla\chi\mathbf{G}(\mathbf{X})}{\|\sigma\_{\mathbf{X}}\nabla\chi\mathbf{G}(\mathbf{X})\|} \tag{12}$$

where **X***MPTP* is the MPTP of the performance function *G*(**X**).

### *2.5. Shifting Vector Approach*

The concept of the shifting vector (**S**(*k*) *<sup>i</sup>* ) has been proposed [14] to decouple the doubleloop structure of the RBDO problem. It separates the optimization and reliability analysis loop and performs it sequentially in the sequential optimization and reliability assessment (SORA) [14] method. Using this process, the computational efficiency of SORA has been improved as compared to the double-loop method. The concept of the shifting vector is used to shift the violated performance function towards the feasible direction. It is given as

$$\mathbf{S}\_{i}^{(k)} = \mu\_{\mathbf{X}}^{(k-1)} - \mathbf{X}\_{i, \text{MPTP}}^{(k-1)} \tag{13}$$

where (**S**(*k*) *<sup>i</sup>* ) is the shifting vector at the *<sup>k</sup>*-th iteration, **<sup>X</sup>**(*k*−1) *<sup>i</sup>*,*MPTP* is the MPTP for the *i*-th constraint, and *μ*(*k*−1) **<sup>X</sup>** is the mean of the random variable **X** in the (*k* − 1)-th iteration. Figure 1 shows the schematic diagram of the shifted constraint based on the MPTP. It can be seen that (**S**(1) *<sup>i</sup>* ) is estimated based on **<sup>X</sup>**(1) *<sup>i</sup>*,*MPTP* and *<sup>μ</sup>*(1) **<sup>X</sup>** , and the shifted constraint is evaluated at *μ*(1) **<sup>X</sup>** <sup>−</sup> (**S**(1) *<sup>i</sup>* ) until the reliability of the constraint is achieved. Here, the shifting vector (**S**(*k*) *<sup>i</sup>* ) is generated via an iterative process that helps to estimate the feasibility of the performance function until its reliability is satisfied.

**Figure 1.** Shifting vector approach.

### **3. The Proposed Method and Its Implementation**

*3.1. Single-Loop MORBDO Formulation Using Chaos Control and the Shifting Vector Approach*

The single-loop MORBDO formulation can be written using the approximate MPTP given in Equation (12) as

$$\begin{aligned} \text{Min.} \, f\_{\mathbf{m}}(\mu\_{\mathbf{X}}), \qquad & m = 1, \dots, M, \\ \text{s.t.:} \, \mathbb{G}\_{i}(\mathbf{X}\_{i, \text{MPTP}}^{(k)}) &\le 0, \qquad & i = 1, \dots, I, \\ \text{where } \mathbb{X}\_{i, \text{MPTP}}^{(k)} &= \mu\_{\mathbf{X}}^{(k)} + \boldsymbol{\beta}\_{i}^{T} \sigma\_{\mathbf{X}} \mathbf{a}\_{i, \mathbf{X}'}^{(k)} \\ \text{a}\_{i, \mathbf{X}}^{(k)} &= \frac{\sigma\_{\mathbf{X}} \, \nabla G\_{i, \mathbf{X}}(\mathbf{X}\_{i, \text{MPTP}}^{(k-1)})}{||\sigma\_{\mathbf{X}} \, \nabla G\_{i, \mathbf{X}}(\mathbf{X}\_{i, \text{MPTP}}^{(k-1)})||}, \\ \mu\_{\mathbf{X}}^{(\text{L})} &\le \mu\_{\mathbf{X}} \le \mu\_{\mathbf{X}}^{(\text{U})}. \end{aligned} \tag{14}$$

where **<sup>X</sup>**(*k*) *<sup>i</sup>*,*MPTP* is the approximate MPTP of the '*i*' performance function at the *k*-th iteration, and *α*(*k*) *<sup>i</sup>*,**<sup>X</sup>** is the unit gradient vector of the performance function '*i*' with respect to random variable (**X**). In Equation (14), the probabilistic performance functions of Equation (2) are converted into deterministic performance functions, which eliminate the MPTP search of the inner-loop at every iteration. Thus, the computational efficiency can be improved significantly. It is to be noted that the steepest descent search is used to evaluate the approximate MPTP, which has a tendency to oscillate during convergence [45].

In the proposed formulation, chaos control theory replaces the steepest descent search for approximating MPTP. The concept of the shifting vector approach is incorporated to formulate a novel single-loop MORBDO formulation, as shown in Equation (15).

$$\begin{aligned} \text{Min.} \, f\_{\mathfrak{m}}(\mu\_{\mathbf{X}}), \quad & i = 1, \ldots, M, \\ \text{s.t.:} \, \mathbf{G}\_{i}(\mathbf{F}^{(k)}) \le 0, \quad & i = 1, \ldots, I, \\ \text{where } \mathbf{F}^{(k)} = \begin{cases} \mathbf{X}\_{i, MTPP}^{(k)}, & \forall \text{ target vectors,} \\ \mathbf{\mu}\_{\mathbf{U}}^{(k+1)} - \mathbf{S}\_{i}^{(k+1)}, & \forall \text{ trial vectors,} \\ \mathbf{\mu}\_{\mathbf{X}}^{(k+1)} - \mathbf{X}\_{i, MTPP}^{(k)}, & \\ \mathbf{\mathcal{X}}\_{i, MTPP}^{(k)} = \mathbf{T}^{-1}(\mathbf{U}) = \boldsymbol{\mu}\_{\mathbf{X}}^{(k)} + \sigma\_{\mathbf{X}} \mathbf{U}\_{i, SLCC}^{(k)} \\ \boldsymbol{\mu}\_{\mathbf{X}}^{(L)} \le \boldsymbol{\mu}\_{\mathbf{X}} \le \boldsymbol{\mu}\_{\mathbf{X}}^{(L)}. \end{aligned} \tag{15}$$

where **<sup>U</sup>**(*k*) *<sup>i</sup>*,*SLCC* is the approximate MPTP in the *U*-space that is estimated using the MCC method. *μ*(*k*+1) **<sup>U</sup>** is the trial vector of differential evolution in the *U*-space in the (*k* + 1)-th iteration. In the proposed formulation, the performance function *Gi*(**Ψ**(*k*) ) includes both **<sup>X</sup>**(*k*) *<sup>i</sup>*,*MPTP* and (*μ*(*k*+1) **<sup>U</sup>** <sup>−</sup> **<sup>S</sup>**(*k*+1) *<sup>i</sup>* ), which are used for evaluating the performance function for each target vector and trial vector, respectively. The vector (*μ*(*k*+1) **<sup>U</sup>** <sup>−</sup> **<sup>S</sup>**(*k*+1) *<sup>i</sup>* ) shifts the violated performance function towards a feasible direction for the population of trial vectors. **<sup>U</sup>**(*k*) *<sup>i</sup>*,*SLCC* in the standard normal space is given in Equation (16).

$$\mathbf{U}\_{i,SLCC}^{(k)} = \beta\_i^T \frac{\mathbf{U}\_i^{(k-1)} + \lambda\_i^{(k)} \mathbf{C}[\mathbf{U}\_i^{(k)} - \mathbf{U}\_i^{(k-1)}]}{||\mathbf{U}\_i^{(k-1)} + \lambda\_i^{(k)} \mathbf{C}[\mathbf{U}\_i^{(k)} - \mathbf{U}\_i^{(k-1)}]||} \tag{16}$$

where **<sup>U</sup>**(*k*) *<sup>i</sup>* and **<sup>U</sup>**(*k*−1) *<sup>i</sup>* are the MPTPs estimated for the *i*-th constraint in the *k*-th and (*k* − 1) th generations, respectively. The value of **<sup>U</sup>**(*k*) *<sup>i</sup>*,*SLCC* is calculated after the transformation, as given in Equation (17).

$$\mathbf{U}\_{i}^{(k)} = \mathbf{T}(\mathbf{X}\_{i}^{(k)}) = (\mathbf{X}\_{i}^{(k)} - \mu\_{\mathbf{X}}) / \sigma\_{\mathbf{X}}.\tag{17}$$

The proposed single-loop MORBDO formulation given in Equation (15) is developed based on a single-loop methodology that eliminates the integrated reliability analysis involved in double-loop formulation, as given in Equation (2). The approximated formulation for reliability analysis is established through KKT optimality conditions, where the search direction is calculated by using modified chaos control theory. Furthermore, the shifting vector is integrated with the single-loop MORBDO formulation that uniquely involves the target and trial vectors of differential evolution.

### *3.2. Multi-Objective Differential Evolution with Adaptive Mutation Scheme*

Differential evolution (DE) [49] is a population-based meta-heuristic algorithm that works with a set of vectors and optimizes an optimization problem by iteratively improving each vector based on an evolutionary process. It explores the design space by maintaining a population of vectors and creating new vectors by combining existing ones. It starts with a random generation of vectors, which are referred to as target vectors, *μ*(*k*) **<sup>X</sup>** (*t*), in which *t* represents the *t*-th target vector, and *k* represents the *k*-th generation counter. Since DE is used for solving the MORBDO problem, the notation for vector is kept the same as the mean value of the random variable. Each target vector (*μ*(*k*) **<sup>X</sup>** (*t*)) is transformed to the mutant vector (*μ*(*k*+1) **<sup>V</sup>** (*t*)) using the randomly chosen vectors (*μ*(*k*) **<sup>r</sup>**<sup>1</sup> (*t*)), (*μ*(*k*) **<sup>r</sup>**<sup>2</sup> (*t*)) and (*μ*(*k*) **<sup>r</sup>**<sup>3</sup> (*t*)). In this paper, an adaptive mutation scheme is used, in which the mutation vector (*μ*(*k*+1) **<sup>V</sup>** (*t*)) is generated, either by using a random vector or the best vector. The scheme for generating (*μ*(*k*+1) **<sup>V</sup>** (*t*)) is given in Equation (18).

$$\mu\_{\mathbf{V}}^{(k+1)}(t) = \begin{cases} \mu\_{\mathbf{r1}}^{(k)}(t) + \hat{\mathbf{r}} \times (\mu\_{\mathbf{r2}}^{(k)}(t) - \mu\_{\mathbf{r3}}^{(k)}(t)), & \zeta > \epsilon, \\\\ \mu\_{\mathbf{best}}^{(k)}(t) + \hat{\mathbf{r}} \times (\mu\_{\mathbf{r2}}^{(k)}(t) - \mu\_{\mathbf{r3}}^{(k)}(t)), & \text{otherwise,} \end{cases} \tag{18}$$

where **r1** = **r2** = **r3** are the three randomly chosen vectors from the current population, and *F*ˆ is the scaling factor. The variant "DE/rand/bin/1" is found to be effective in exploring the search space during the initial generations because the mutant vector is generated a using random vector. When DE starts converging towards the Pareto-optimal front, the "DE/best/bin/1" variant replacing *μ*(*k*) **<sup>r</sup>**<sup>1</sup> (*t*) to *<sup>μ</sup>*(*k*) **best**(*t*) can improve the convergence. The *μ*(*k*) **best**(*t*) vector for each target vector is found by calculating the Euclidean distance of the *t*-th target vector with respect to all non-dominated target vectors in the objective

space. The closest non-dominated target vector is selected as *μ*(*k*) **best**(*t*) for the *t*-th target vector. Since both the variants have their own merits, a heuristic convergence parameter (*ζ*) is proposed that can help DE to use either of these variants, depending on the user-defined parameter *-*. The parameter *ζ* is calculated using the hypervolume (HV) performance indicator [50] that is given as

$$\mathcal{L} = I\_H^{(k)} - I\_H^{(k-1)}\, \text{.} \tag{19}$$

where *I* (*k*) *<sup>H</sup>* and *I* (*k*−1) *<sup>H</sup>* are the hypervolume calculated with respect to the non-dominated target vectors in the (*k*) and (*k* − 1) generations. It is noted that the non-dominated target vectors in the (*k* − 1) and (*k*) generations are normalized together for estimating the hypervolume with respect to the dominated point. Thereafter, the trial vector (*μ*(*k*+1) **<sup>U</sup>** (*t*)) is created for each target vector (*μ*(*k*) **<sup>X</sup>** (*t*)), which is given as

$$\mu\_{\mathbf{U}}^{(k+1)}(t\_{\bar{j}}) = \begin{cases} \mu\_{\mathbf{V}}^{(k+1)}(t\_{\bar{j}}) & \text{if } r \le p\_{\varepsilon} \text{ or } j = rnbr(i), \\\mu\_{\mathbf{X}}^{(k)}(t\_{\bar{j}}) & \text{if } r > p\_{\varepsilon} \text{ and } j \ne rnbr(i), \end{cases} \tag{20}$$

where subscript *j* with *t* in *μ*(*k*) **<sup>X</sup>** (*tj*), *<sup>μ</sup>*(*k*+1) **<sup>V</sup>** (*tj*), and *<sup>μ</sup>*(*k*+1) **<sup>U</sup>** (*tj*) represent the *j*-th component of the target, mutant, and trial vectors, respectively. *r* is a random number between 0 and 1, *pc* is the crossover rate, and *rnbr*(*i*) is a randomly chosen index ∈ {1, 2, ... , *n*}, which ensures that *μ*(*k*+1) **<sup>U</sup>** (*tj*) obtains at least one component from *<sup>μ</sup>*(*k*+1) **<sup>V</sup>** (*tj*). Thereafter, all target vectors and trial vectors are combined (*μ*(*k*) **X** <sup>8</sup> *μ*(*k*+1) **<sup>U</sup>** ) to find the rank of the combined population using the non-dominated sorting [30] of NSGA-II. The crowding distance is also calculated for maintaining the diversity for the selection of the next generation of target vectors. The best *N* target vectors for the next generation are selected by using the environmental selection scheme of NSGA-II [30]. Multi-objective DE is terminated if the generation counter (*k*) is more than the total number of generations (*K*). Otherwise, the generation loop continues till the termination condition becomes satisfied.

### *3.3. Steps for Implementation*

In this section, the steps for implementing DE with an adaptive mutation scheme for the proposed MORBDO formulation are presented, which are as follows.

	- 3.1 Calculate the objective function values, *fm*(*μ*(*k*) **<sup>X</sup>** (*t*)).
	- 3.2 Calculate MPTP for each performance function (*i*) using Equations (15) and (16), and estimate shifting vector **<sup>S</sup>**(*k*+1) *<sup>i</sup>*,*μ***<sup>X</sup>** <sup>=</sup> *<sup>μ</sup>*(*k*) **<sup>X</sup>** <sup>−</sup> **<sup>X</sup>**(*k*) *<sup>i</sup>*,*MPTP*.
	- 3.3 Calculate the constraint violation of each performance function using the MPTP that is estimated through the chaos control theory given in Equation (15).
	- 7.1 Calculate the objective function, *fm*(*μ*(*k*+1) **<sup>U</sup>** (*t*)).

### **4. Numerical Examples**

In this section, three mathematical examples and one engineering example are solved to demonstrate the performance of the proposed method. All the examples consist of two objective functions, along with the nonlinear performance functions. The proposed method is abbreviated as SLMDE since it is developed via a single-loop method using multi-objective DE. The results of SLMDE are compared with double-loop multi-objective differential evolution (DLMDE). It is noted that PMA is used with DLMDE for reliability analysis. The reliable PO solutions are generated via both methods for different values of the target reliability index (*βT*). HV performance indicator values and number of function evaluations are used to compare the outcome. Both the methods are run 30 times with different initial populations. The standard deviation (SD) is also evaluated to see the dispersion of HV values. The Wilcoxon signed-rank test at a 5% significance level is also used to determine the difference for the statistical significance between SLMDE and DLMDE. The parameters of SLMDE and DLMDE are as follows: the scaling factor (*F*ˆ) is taken as 0.3, the crossover probability (*pc*) is 0.9, the population size (*N*) is 200, and the total number of generations (*K*) is 100 for the first example, 250 for the second example, and 200 for the car side impact example. The chaos control factor (*λ*) is considered as 0.2 [26]. The user-defined parameter (*-*) in Equation (18) is considered as 10−3. The MATLAB R2016b platform is used for developing both methods.

### *4.1. Example 1*

The first MORBDO example [3] consists of two objectives that are developed using two independent random normal variables with a standard deviation of 0.03. The example is subjected to two linear performance functions that are shown in Equation (21).

$$\begin{aligned} \min: f\_1(\mu\_\mathbf{X}) &= \mu\_{x\_1}, \\ \min: f\_2(\mu\_\mathbf{X}) &= \frac{1 + \mu\_{x\_2}}{\mu\_{x\_1}}, \\ \text{s.t.:} &P[G\_i(\mathbf{X}) > 0] \le \phi(-\beta\_i^T), \quad i = 1, 2, \\ &G\_1(\mathbf{X}) = x\_2 + 9x\_1 - 6, \\ &G\_2(\mathbf{X}) = -x\_2 + 9x\_1 - 1, \\ &0.1 \le \mu\_{x\_1} \le 1, \; 0 \le \mu\_{x\_1} \le 5. \end{aligned} \tag{21}$$

Table 1 presents the best, median, and worst values of HV obtained via SLMDE and DLMDE. SLMDE has converged to better values of HV for different *β<sup>T</sup>* values. This indicates that SLMDE generates a better set of PO solutions for the given example. It is to be noted that for a larger value of *βT*, the HV value becomes reduced, as compared to the lower *β<sup>T</sup>* value. This is because a larger value of *β<sup>T</sup>* signifies a high degree of reliability that makes the obtained PO solutions more conservative and pushes them away from the deterministic PO front inside the feasible region.


**Table 1.** Best, median, and worst HV values obtained by both methods are presented for Example 1 for different values of *βT*. The best performances are highlighted in bold font.

Figure 2 demonstrates the PO solutions obtained by both methods for different values of *βT*. The reliable PO solutions shown in Figure 2a,b correspond to the median HV values from Table 1. It can be seen that for larger values of *βT*, the PO solutions become conservative and move inside the feasible region. The same figure also demonstrates that some solutions coincide with the deterministic PO front that is located at the bottom right. This is because for those solutions, the target reliability is satisfied for the performance function *G*1(**x**).

**Figure 2.** The obtained PO solutions by both methods for example 1 for different *β<sup>T</sup>* values.

The computational efficiencies of both methods are measured with the help of a number of function evaluations that are presented in Table 2. It can be seen that the proposed method requires 202,000 function evaluations, which is only 14.85% of DLMDE. This is because SLMDE is based on a single-loop method, where the reliability of the performance function is estimated using KKT optimality conditions. On the other hand, DLMDE performs PMA for reliability estimation, which requires many function evaluations. Since the number of iterations for PMA is kept fixed, the number of function evaluations is the same for DLMDE with different values of *βT*.

**Table 2.** Number of function evaluations required by both methods for example 1.


The Wilcoxon test results are shown in the same table with symbols (+, =, −). The symbol '+' suggests a significantly better performance of SLMDE over DLMDE. Other symbols '−' and '=' suggest a significantly bad performance and an equivalent performance of SLMDE over DLMDE, respectively. It can be seen from the table that SLMDE shows a significantly better performance over DLMDE.

The progress of HV and heuristic convergence parameter *ζ* with respect to iterations are shown in Figure 3. It can be seen that there are some initial fluctuations in both HV and *ζ*, which subsidise after 10 generations and stabilize after 50 generations.

**Figure 3.** Progress of hypervolume and *ζ* of SLMDE with respect to number of generations for example 1.

### *4.2. Example 2*

The second example [51] consists of two objective functions which are highly nonlinear. It has four linear and two nonlinear performance functions that are developed using two independent random normal variables, each with a standard deviation of 0.3. The RBDO formulation of this example is given in Equation (22).

$$\begin{aligned} \text{minim: } f\_1(\mu\_\mathbf{X}) &= -[25(\mu\_{\mathbf{x}1}-2)^2 + (\mu\_{\mathbf{x}2}-2)^2 + (\mu\_{\mathbf{x}3}-1)^2 + (\mu\_{\mathbf{x}4}-4)^2 + (\mu\_{\mathbf{x}5}-1)^2], \\ \text{minim: } f\_2(\mu\_\mathbf{x}) &= [\mu\_{\mathbf{x}1}^2 + \mu\_{\mathbf{x}2}^2 + \mu\_{\mathbf{x}3}^2 + \mu\_{\mathbf{x}4}^2 + \mu\_{\mathbf{x}5}^2 + \mu\_{\mathbf{x}6}^2], \\ \text{s.t.: } P[G\_i(\mathbf{X}) > 0] &\le \phi(-\theta\_i^T), \quad i = 1, \dots, 6 \\ G\_1(\mathbf{X}) &= \mathbf{x}\_1 + \mathbf{x}\_2 - 2, \\ G\_2(\mathbf{X}) &= 6 - \mathbf{x}\_1 - \mathbf{x}\_2, \\ G\_3(\mathbf{X}) &= 2 - \mathbf{x}\_2 + \mathbf{x}\_1, \\ G\_4(\mathbf{X}) &= 2 - \mathbf{x}\_1 + 3\mathbf{x}\_2, \\ G\_5(\mathbf{X}) &= 4 - (\mathbf{x}\_3 - 3)^2 - \mathbf{x}\_4, \\ G\_6(\mathbf{X}) &= (\mathbf{x}\_5 - 3)^2 + \mathbf{x}\_6 - 4, \\ 0 &\le \mu\_{\mathbf{x}1}, \mu\_{\mathbf{x}2}, \mu\_{\mathbf{x}3} \le 10, \; 1 \le \mu\_{\mathbf{x}3}, \mu\_{\mathbf{x}5} \le 5, \ 0 \le \mu\_{\mathbf{x}4} \le 6. \end{aligned}$$

Table 3 presents the statistical values of HV obtained via both methods. In can be seen that SLMDE has converged to better values of HV for different *β<sup>T</sup>* values. This indicates that SLMDE generates a better set of PO solutions for this given example. In this case, a similar observation can also be made where for larger values of *βT*, the HV values becomes reduced. This is due to the fact that larger values of *β<sup>T</sup>* signify a larger degree of reliability, which leads to the generation of conservative PO solutions. The Wilcoxon test results are shown in the same table with symbols (+, =, −). It can be seen from the table that SLMDE shows a significantly better performance over DLMDE.

Figure 4 shows the reliable PO solutions generated in the run, corresponding to a median HV value from Table 3. It can be seen that for larger *βT*, PO solutions move inside the feasible region and away from the deterministic PO front. The spread of solutions is less in the case of SLMDE for *β<sup>T</sup>* = 1.0. The solutions are nicely distributed in the case of DLMDE. The shift of the solutions is more for larger values of *βT*, which leads to smaller values of HV that can be seen from Table 3.

**Table 3.** Best, median, and worst HV values obtained by both methods are presented for Example 2 for different values of *βT*. The best performances are highlighted in bold font.


**Figure 4.** The PO solutions obtained via both methods for example 2, for different *β<sup>T</sup>* values.

Table 4 presents the computational efficiency of both methods. The proposed method only requires 3,915,600 function evaluations, which is only 3.5–2.7% of DLMDE. It suggests that DLMDE needs many function evaluations because PMA is performed for reliability estimation.


**Table 4.** Number of function evaluations required by both methods for Example 2.

Figure 5 shows the progress of HV and *ζ* with respect to the number of generations. It can be seen that there are fluctuations for all values of *β<sup>T</sup>* until the termination criterion is achieved. The initial fluctuations can also be observed for *ζ*, which subsidize after 150 generations.

**Figure 5.** Progress of hypervolume and *ζ* of SLMDE with respect to the number of generations for example 2.

### *4.3. Car Side Impact Example*

The car side impact [3] example is considered as an engineering RBDO example, which is formulated by using 2 objectives and 10 performance functions. It consists of 11 random design variables that are normally distributed and that are grouped into random variables (*x*1, ... , *x*7) and random parameters (*x*8, ... , *x*11). The details of the variables with their standard deviation values are given in Table 5. The RBDO formulation is presented in Equation (23). The mathematical expressions for each function are given in Table 6.

$$\begin{aligned} \text{minim: } &f\_1(\mathbf{x}\_{\text{|}}) \equiv \text{Structure risk}, \\ \text{min: } &f\_2(\mathbf{x}\_{\text{|}}) \equiv \text{Average risk deflection}, \\ \text{s.t.: } &P[G\_1(\mathbf{X}) > 0] \le \phi(-\beta\_1^T), \quad i = 1, \ldots, 10 \\ &G\_1(\mathbf{X}) = \text{Abdomen load} \le 1 \text{KN}, \\ &G\_2(\mathbf{X}) = V \times \text{Cupper} \le 0.32 \text{m/s}, \\ &G\_3(\mathbf{X}) = V \times \text{Cupner} \le 0.32 \text{m/s}, \\ &G\_4(\mathbf{X}) = V \times \text{Cuper} \le 0.32 \text{m/s}, \\ &G\_5(\mathbf{X}) = \text{Upper risk deflection} \le 32 \text{mm}, \\ &G\_6(\mathbf{X}) = \text{Midderib deflection} \le 32 \text{mm}, \\ &G\_7(\mathbf{X}) = \text{Lower risk deflection} \le 32 \text{mm}, \\ &G\_8(\mathbf{X}) = \text{Public force} \le 48 \text{K}, \\ &G\_9(\mathbf{X}) = \text{Velocity of V-Pillar} \le 9.9 \text{mm/ms}, \\ &G\_{10}(\mathbf{X}) = \text{Front door velocity of V-Pillar} \le 15.7 \text{mm/s}, \\ &0.5 \le \mu\_{x\_1}, \mu\_{y\_1}, \mu\_{x\_4} \le 1.5, 0.45 \le \mu\_{x\_2} \le 1.35, 0.875 \le \mu\_{x\_3} \le 2.625, \\ &0.4 \le \mu\_{y\_2} \le 1.2, 0.4 \le \mu\_{y\_3} \le 1.2, 0.192 \le \mu\_{y\_4}, \mu\_{y\_5} \le 0.75. \end{aligned}$$

The statistical values of the HV values obtained from both methods with respect to different *β<sup>T</sup>* are presented in Table 7. The proposed method has converged to larger values of HV for all *β<sup>T</sup>* values. This signifies a better distribution of PO solutions of SLMDE as compared to DLMDE. The observation of reducing HV values with larger *β<sup>T</sup>* values remains the same. The Wilcoxon test results are shown in the same table with symbols (+, =, −). It can be seen from the table that SLMDE shows significantly better, bad, and equivalent performances over DLMDE for *β<sup>T</sup>* = 1, 2 and 3, respectively.

The obtained reliable PO solutions for both methods are shown in Figure 6. As observed with previous examples, for larger values of *βT*, the PO solutions start moving away from the deterministic PO front inside the feasible region.

Table 8 presents the computational efficiency of both methods. In this example, SLMDE requires only 4,623,000, the number of function evaluations, which is only 0.3–0.26% that of DLMDE. Since SLMDE performs an approximate reliability estimation by using the KKT optimality conditions, it saves many function evaluations compared to DLMDE. Figure 7 shows a similar progress for HV and *ζ* with respect to the number of generations. There are initial fluctuations for all values of *βT*, which subside after 80 generations.

**Table 5.** Details of design variables and their standard deviation values.


**Table 6.** The objectives and performance functions of Example 3.


**Table 7.** Best, median, and worst HV values obtained via both methods, presented for car side impact example for different values of *βT*. The best performances are highlighted in bold font.


**Table 8.** Number of function evaluations required by both methods for car side impact example.


**Figure 6.** The obtained PO solutions by both methods for car side impact example for different *β<sup>T</sup>* values.

**Figure 7.** Progress of hypervolume and *ζ* of SLMDE with respect to number of generations for car side impact problem.

### *4.4. Example 4*

The fourth example [44] consists of two objective functions and both of them are quadratic functions. The example has a linear performance function developed with three independent random normal variables. The RBDO formulation is given in Equation (24).

$$\begin{aligned} \text{min:} & f\_1(\mu\_\mathbf{X}) = (\mu\_{x\_1} - 1)^2 + (\mu\_{x\_2} - 2)^2 + (\mu\_{x\_3} - 3)^2, \\ \text{min:} & f\_2(\mu\_\mathbf{X}) = \mu\_{x\_1}^2 + 2\mu\_{x\_2}^2 + 3\mu\_{x\_3}^2, \\ \text{s.t.:} & P[G\_i(\mathbf{X}) > 0] \le \phi(-\beta\_i^T), \quad i = 1, \\ & G\_1(\mathbf{X}) = \mathbf{x}\_1 + \mathbf{x}\_2 + \mathbf{x}\_3 - 1, \\ & 0.1 \le \mu\_{x\_i} \le 6, \ i = 1, 2, 3. \end{aligned} \tag{24}$$

where *x*<sup>1</sup> ∼ *N*(1, 0.05), *x*<sup>2</sup> ∼ *N*(2, 0.1), and *x*<sup>3</sup> ∼ *N*(3, 0.15).

Table 9 presents the statistical values of HV obtained via SLMDE and DLMDE. It can be observed that in most of the cases, SLMDE converged to better values of HV for different *βT*. The HV values become reduced with larger values of *βT*. The Wilcoxon test results are shown in the same table with symbols (+, =, −). It can be seen from the table that SLMDE shows an equivalent performance with DLMDE for *β<sup>T</sup>* = 2 and 3, and a bad performance for *β<sup>T</sup>* = 1.


**Table 9.** Best, median, and worst HV values obtained by both methods are presented for example 4 for different values of *βT*. The best performances are highlighted in bold font.

Figure 8 shows the reliable PO solutions generated in the run corresponding to the median HV value obtained via both methods for different values of *βT*. As observed in the previous examples, for larger *βT*, PO solutions move inside the feasible region, away from the deterministic PO front.

**Figure 8.** The PO solutions obtained via both methods for example 4 for different *β<sup>T</sup>* values.

Table 10 presents the computational efficiencies of both methods. The proposed method requires only 50% of function evaluations as that of DLMDE. Figure 9 also shows similar observations for HV and *ζ* during the progress of the generations. There is an initial fluctuation which reduces after 20 generations.

**Table 10.** Number of function evaluations required by both methods for Example 4.


**Figure 9.** Progress of hypervolume and *ζ* of SLMDE with respect to number of generations for Example 4.

### **5. Conclusions**

A single-loop multi-objective reliability-based design optimization has been proposed for generating reliable PO solutions quickly. It was developed by applying KKT optimality conditions to PMA for generating an approximate expression of MPTP. The search direction of approximate MPTP was modified via chaos control theory. The concept of the shifting vector approach was implemented with the novel formulation to include both target and trial vectors. DE was made adaptive, using the heuristic parameter that helped DE to perform different mutation operators. The proposed SLMDE was tested on three mathematical and one engineering bi-objective RBDO examples. It was found that SLMDE generated more reliable PO solutions for all examples compared to DLMDE. The results demonstrate that the convergence of SLMDE takes less function evaluations than DLMDE. For all four examples, the SLMDE was able to generate better HV values. For example 2, a lot of fluctuations during the progress of hypervolume can be observed, which stabilize gradually. The user-defined parameter *ζ* shows stable progress for all the examples. In the future, the proposed method can be modified for quick convergence by incorporating quantile approximation for reliability analysis. The proposed method can also be tested on other real-world examples having many nonlinear functions.

**Author Contributions:** Conceptualization: R.B. and D.S.; methodology: R.B.; validation: R.B.; formal analysis: R.B. and D.S.; investigation: R.B.; writing—original draft preparation: R.B.; writing—review and editing: R.B. and D.S.; visualization: R.B.; supervision: D.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflicts of interest.

### **Abbreviations**

The following abbreviations are used in this manuscript:


### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Mathematical and Computational Applications* Editorial Office E-mail: mca@mdpi.com www.mdpi.com/journal/mca

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel: +41 61 683 77 34

www.mdpi.com ISBN 978-3-0365-6981-9