**1. Introduction**

In robotics, exploration pertains to the process of scanning and mapping out an environment to produce a map, which can be used by a robot or group of robots for further work. Based on the type of environment, exploration can be one of the following: outdoor, indoor, or underwater, using a mobile-robot or multi-robot systems [1,2]. In this study, we focused on indoor exploration by robots, which are equipped with ranging sensors. It can be assumed that these robots with onboard sensors can scan an environment without any difficulties by walking randomly around. However, their motion is not efficient, which can result in an incomplete map coverage. As a solution to this issue, this paper proposes an algorithm that enhances the efficiency of the multi-robot exploration by using a multi-objective optimization strategy.

Naturally, all real-world optimization problems in engineering pursue multiple goals. They may differ from each other in various fields. However, it is common for all to optimize problems by maximization or minimization functions. In the past, multiple optimization problems were solved by one function because of the lack of suitable solution methodologies for solving a multi-objective optimization problem (MOOP) as a single-objective optimization problem. With the development of evolutionary algorithms, new techniques, which seek to optimize two or more conflicting objectives in one simulation run, have been applied to MOOPs. This new research area is named multi-objective optimization (MOO) [3].

In robotics, studies related to optimization have been gaining wide attention [4]. If we consider multi-robot systems [5], optimization is popularly applied in path planning [6], formation [7], exploration [8], and other fields where decision-making control needs to be optimized. Previous research conventionally found the optimal solutions as separated single-objective tasks: short path, obstacle-free motion, smoothest route, and constant search of uncertain terrain. The new impact of optimization in robotics is obtained due to the metaheuristics and its nature-inspired optimization techniques [9]. The nature-inspired algorithms are not only restricted to robotics but also have significant applications in different fields. Due to this, they have attracted the attention of scholars [10,11].

Metaheuristic algorithms are optimization approaches, which emulate the intelligence of various species of animals in nature. The number of agents classifies the metaheuristic algorithms into single-solution-based and population-based algorithms. In both classes, the solutions improve over the course of iterations with one single agent or an entire swarm of agents, respectively. The main advantage of population-based approaches is their ability to avoid stagnation in the local optima due to the number of agents. The swarm can explore search space more and faster than a single agent. Regardless of this, the benefit of one class over the other depends on its application in a certain problem. However, it is important to mention the No-Free-Lunch (NFL) theorem for optimization, which assures that there is no algorithm with universal optimal solution by all criteria and in all domains [12].

Despite the number of agents, the metaheuristic algorithms can be classified into single and multi-objective optimization techniques according to the number of objective functions. A multi-objective optimization is an extended approach to single optimization. It allows finding an optimal solution of two independent objectives simultaneously. In order to select just one best solution from the available ones, a trade-off should be considered among them. The Pareto-optimal front helps pick up one of the suitable solutions satisfying two objective functions.

Examples of the single-objective algorithms include the Particle Swarm Optimization (PSO) [13], Genetic Algorithm (GA) [14], Ant Colony Optimization (ACO) [15], Grey Wolf Optimizer algorithms (GWO) [16,17], which have the extended multi-objective optimization variations, namely MOPSO [18], MOGA [19], m-ACO [20], and MOGWO [21], respectively. In our previous study [8], we used the coordinated multi-robot exploration [22] and GWO algorithms together as a hybrid. In this study, our interest is to solve the multi-robot exploration problem as the MOOP using Multi-Objective Grey Wolf Optimizer (MOGWO).

Using the MOGWO exploration, we defined two objectives for optimization, namely the maximization of the search for new area and the minimization of the inaccuracy of the explored map. It can be said that the search process is divided into two stages (Figure 1). They switch during the simulation run depending on the value of the GWO parameter. If the value is greater than one, it searches occluded space. If the value is less than one, it increases the map accuracy by repeated visits in the explored space. It needs to be emphasized that the occupancy grid map with probabilistic values is used in this study [23].

The MOGWO exploration employs static waypoints in the simulation, which promotes the efficient exploration of an indoor simulated environment. It can be noted that the waypoints belong to the programmed logic of the algorithm and are not supposed to be used in the real environment [24]. The waypoints are grey wolf agents with some costs of probability values. In each iteration, the robots search alpha, beta, and gamma waypoints and save them in an archive wherein avoiding the selection the same non-dominated waypoints for several robots. After the selection, the robots can compute the next position, which is the closest to the average position of alpha, beta, and gamma waypoints, from among the frontier cells [25].

This paper is organized as follows: in Section 2, we briefly recall different algorithms of multi-robot exploration and evolutionary optimization techniques used in related works. In Section 3, the theory of GWO and MOGWO is presented. Sections 4 and 5 are dedicated to the proposed MOGWO exploration and its performance. Section 6 concludes the present study.

**Figure 1.** The proposed Multi-Objective Grey Wolf Optimizer (MOGWO) exploration in two stages: (**a**) selecting the waypoints in unexplored space with high probability values of occupancy grid map; (**b**) selecting the closest waypoints in explored space with high probability values proportional to the distance; (**c**) the next position is one of the frontier points, which is the nearest to the selected waypoint.

#### **2. Related Work**

In the last two decades, many techniques have been proposed for robot exploration. Among them, there are novel fundamental, hybridized, and modified methods. In this section, studies on the different algorithms and the impact of the evolutionary optimization techniques in exploration are discussed.

Considering exploration as one of the branches in robotics, Yamauchi et al.'s frontier-based method is the pioneering work in this field [26]. From that time up to now, many frontier-based studies have appeared, most of which were hybridized or modified with success.

The coordinated multi-robot exploration (CME) is frontier-based with the emphasis on the cooperative work with a team of robots [22]. The robot's mission is to search for maximum utility with the minimum cost that diverges the robots from each other keeping the direction to search unexplored space. The alternative coordinated method is the randomized graph approach [27,28]. It builds a roadmap in an explored area that navigates robots to move through safe paths. Recently, Alfredo et al. [29] introduced the efficient backtracking concept to the random exploration graph, preventing the same robot in visiting the same place more than once. These above-mentioned methods have the common idea of using frontier-based control.

Another approach in exploration that is completely different in theory and practice is artificial intelligence (AI). Reinforcement Learning (RL) and convolutional neural network (CNN) are such attempts, which have been proposed in previous studies [30–32]. The exploration-based approach on neural networks differs considerably from the frontier-based approach in terms of environment perception and control system. Visual sensors (cameras) scan a place for further computation using image-processing algorithms [33]. The output of the calculation is the interaction of the robot with the environment. Lei Tai et al. [34] conducted a survey of leading studies in mobile robotics using deep learning from perception to control systems.

Recently, a novel branch of exploration that employs nature-inspired optimization techniques has appeared. The approaches seek to enhance existing solutions to exploration. Sharma S. et al. [35] applied clustering-based distribution and bio-inspired algorithms such as PSO, Bacteria Foraging Optimization, and Bat algorithm. The clustering provides a direction of robot motion, while the nature-inspired approaches involve exploring the unknown area. The study of [36] applied a combination of PSO, fractional calculus, and fuzzy inference system. They compared their results with other six other PSO variations that showed effective multi-robot exploration. A similar waypoint concept in our study was performed in [37]. The artificial pheromone and fuzzy controllers help the multi-robot systems to navigate efficiently by distributing the search between robots and avoiding repeated visits in explored regions.

The study of [38] involved more than one optimization problem in the exploration, which is important to highlight here. The optimal solution seeks to minimize two objective functions: the variance of path lengths and the sum of the path lengths of all robots. Compared to our research, they applied the K-Means clustering algorithm instead of the bio-inspired technique used in our study.

The study of [39] presented the auto-adaptive multi-objective strategy for multi-robot exploration, where the multi-objective concept consisted of two missions: a search of uncertainties and stable communication. This work is closely related to the present work, but the focus of their research is an assessment of the communication conditions for providing efficient map coverage, which is a different perspective compared to our study.

In regard to multi-objective optimization in multi-robot systems, MOPSO [40] and multi-ACO [41] have already been applied to path planning problem. Broadly speaking, the metaheuristic algorithms are often applied in path planning problems compared to other issues, mainly, because optimization is the core study for finding a short and smooth path.

In general, MOGWO has never been applied before in mobile robotics studies.
