# **Advances in Industrial Robotics and Intelligent Systems**

Edited by António Paulo Moreira, Pedro Neto and Félix Vidal Printed Edition of the Special Issue Published in *Applied Sciences*

www.mdpi.com/journal/applsci

## **Advances in Industrial Robotics and Intelligent Systems**

## **Advances in Industrial Robotics and Intelligent Systems**

Editors

**Ant ´onio Paulo Moreira Pedro Neto F ´elix Vidal**

MDPI • Basel • Beijing • Wuhan • Barcelona • Belgrade • Manchester • Tokyo • Cluj • Tianjin

*Editors* Antonio Paulo Moreira ´ University of Porto and INESC TEC— Technology and Science Portugal

Pedro Neto University of Coimbra Portugal

Felix Vidal ´ Asociacion de Investigaci ´ on´ Metalurgica del Noroeste ´ Spain

*Editorial Office* MDPI St. Alban-Anlage 66 4052 Basel, Switzerland

This is a reprint of articles from the Special Issue published online in the open access journal *Applied Sciences* (ISSN 2076-3417) (available at: https://www.mdpi.com/journal/applsci/special issues/Control Robotics).

For citation purposes, cite each article independently as indicated on the article page online and as indicated below:

LastName, A.A.; LastName, B.B.; LastName, C.C. Article Title. *Journal Name* **Year**, *Volume Number*, Page Range.

**ISBN 978-3-0365-6554-5 (Hbk) ISBN 978-3-0365-6555-2 (PDF)**

Cover image courtesy of Felix Vidal ´

© 2023 by the authors. Articles in this book are Open Access and distributed under the Creative Commons Attribution (CC BY) license, which allows users to download, copy and build upon published articles, as long as the author and publisher are properly credited, which ensures maximum dissemination and a wider impact of our publications.

The book as a whole is distributed by MDPI under the terms and conditions of the Creative Commons license CC BY-NC-ND.

## **Contents**



## **About the Editors**

#### **Ant ´onio Paulo Moreira**

Antonio Paulo Moreira graduated with a degree in Electrical Engineering from the University of ´ Porto in 1986, M.Sc. degree in Electrical Engineering—Systems in 1991, and Ph.D. degree in Electrical Engineering in 1998. Associated Professor with Tenure at the Electrical Engineering Department of the University of Porto. Coordinator of the Centre for Robotic and Intelligent Systems Centre of INESC TEC—Institute for Systems and Computer Engineering, Technology and Science. Participation in 25 scientific projects; coordinator of 7 projects. The knowledge developed in these research projects generated 40 development contracts and technology transfer to companies, as coordinator of 18 projects. Has 308 scientific publications at present, 202 of which are in international conferences, books and thesis, and 78 are in international journals with a referee.

#### **Pedro Neto**

Pedro Neto, Ph.D., is an Associate Professor at the Mechanical Engineering Department and coordinator of the Collaborative Robotics Laboratory at the University of Coimbra. He is a two-time recipient of the Impact and International Publications Award at the Faculty of Science and Technology, received more than a dozen awards, and is in the World's Top 2% Scientists Rank from Elsevier. Pedro Neto served as vice president of the Portuguese Robotics Society, is a member of the IEEE Committee on Factory Automation, associate editor in refereed journals, and scientific committee member of flagship conferences. His current research interests include human–robot interaction and collaboration, machine learning, soft robots, and advanced robot applications in the manufacturing domain. He co-authored more than 100 papers and collaborated with more than 100 companies in projects and consulting, some of them granted by the European Commission in the prestigious HORIZON framework.

#### **F ´elix Vidal**

Felix Vidal has an M.Sc. in Industrial Engineering (Automation and electronics) and International ´ Welding Engineer (IWE). At present, he is working in AIMEN as head of Smart Systems and Smart Manufacturing, and project coordinator of several European R&D projects related with digital transformation. He has more than 20 years' experience as a researcher in European and National projects dealing with the digitisation of hte EU industry. He has authored papers in major conferences and journals related to smart manufacturing in recent years. He is a member of EFFRA and euRobotics and reviewer for different papers in international manufacturing and mechatronics conferences and journals.

## *Editorial* **Special Issue on Advances in Industrial Robotics and Intelligent Systems**

**António Paulo Moreira 1, Pedro Neto 2,\* and Félix Vidal <sup>3</sup>**


**\*** Correspondence: pedro.neto@dem.uc.pt; Tel.: +351-239790767

Robotics and intelligent systems are key technologies to promote efficient and innovative applications in the most diverse domains (industry, healthcare, agriculture, construction, mobility, etc.), performing and supporting activities that are not suitable to be performed by humans. Such activities are frequently time-consuming, repetitive tasks with low added value, physically demanding, and/or dangerous. Nevertheless, robotics and intelligent systems face several scientific and technological challenges related to their integration and interoperability with other systems, safety, flexibility, reconfigurability and autonomy. These challenges are especially relevant when robots operate in real unstructured environments and share the workspace with humans and other equipment.

This Special Issue collects research achievements, ideas, and applications of advanced intelligent robotic systems, covering diverse technologies and application domains. Generally, the contributions cover optimal path planning strategies and innovative designs for mobile manipulators, the integration of robotic and intelligent systems, grasping, manipulation, teleoperation, haptics, user experience approaches for collaborative robots, and multi-agent systems.

In the last few years, we addressed the emergence of mobile manipulators as versatile robotic machines, combining the best abilities of mobile robots and robotic manipulators. An interesting study reports a mobile manipulator unified framework for motion planning considering joint limits, joint velocity limits, self-collisions, and singularities [1]. A novel path planning strategy for the autonomous navigation of a mobile manipulator operating in inspection processes is proposed in [2]. A mobile manipulator, which operates as a healthcare and domestic assistant, demonstrated its capability to perform generic service tasks in non-standardized healthcare and domestic environments [3]. In [4], a robot-based framework is proposed to automatically plan trajectories designed for painting large objects, e.g., a car roof. A Simultaneous Localization and Mapping (SLAM) framework used to solve the problem of the poor positioning accuracy of mobile robots, by fusing horizontal and vertical lidar data with Inertial Measurement Unit (IMU) data, eliminates the motion distortion of the dual-lidar odometry [5]. A robot path planning method using midpoint interpolation increased the efficiency of optimization by minimizing the planning time [6]. An interesting study presents a design of a mobile robot with omnidirectional tracks, combining the advantages of a typical track drive with the omnidirectional Mecanum wheels [7].

A dual-arm robotic manipulator demonstrated the ability to manipulate large and heavy objects avoiding obstacles by using a hierarchical manipulation planner [8]. A position/force controller used to grasp objects through a robotic manipulator, find the position of the object to be grasped accurately, and apply the appropriate force to each finger to handle the object properly is proposed in [9]. In [10], a haptic teleoperation of impact hammers in mining operations is proposed, where the 3D model of the environment

**Citation:** Moreira, A.P.; Neto, P.; Vidal, F. Special Issue on Advances in Industrial Robotics and Intelligent Systems. *Appl. Sci.* **2023**, *13*, 1352. https://doi.org/10.3390/ app13031352

Received: 13 January 2023 Accepted: 17 January 2023 Published: 19 January 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

is used to estimate repulsion forces that are transferred to the operator via haptics so that the hammer does not collide with the structures of the mine. A multi-order attentional spatial interactive convolutional neural network for haptic recognition is detailed in [11]. It was validated on the recognition of letter shape (A–Z) with complex contours from a haptic acquisition platform based on three-scale pressure arrays. In [12], the optimization of robot placement was studied to reduce the overall robot joint wear, where a proper robot base placement results in an overall reduction in the wear of the joints of a robotic arm. An algorithm for verifying the correctness of multi-agent systems modeled as tracking bigraphical reactive systems and checking whether a behavior policy for the agents meets non-functional requirements is presented in [13]. A review of the recent literature on augmented reality-supported collaborative robotics from a human-centered perspective to solve issues related to operators' needs is proposed in [14]. The study elaborates on a structured framework driven by User eXperience approaches to design augmented reality interfaces.

**Funding:** This research received no external funding.

**Acknowledgments:** Thanks to all the authors and peer reviewers for their valuable contributions to the Special Issue 'Advances in Industrial Robotics and Intelligent Systems'.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

## *Article* **Motion Planning of Nonholonomic Mobile Manipulators with Manipulability Maximization Considering Joints Physical Constraints and Self-Collision Avoidance**

**Josuet Leoro † and Tesheng Hsiao \*,†**

Department of Electrical and Computer Engineering, National Yang Ming Chiao Tung University, Hsinchu 30010, Taiwan; jl2005ec.04g@g2.nctu.edu.tw

**\*** Correspondence: tshsiao@cn.nctu.edu.tw

† These authors contributed equally to this work.

**Abstract:** The motion of nonholonomic mobile manipulators (NMMs) is inherently constrained by joint limits, joint velocity limits, self-collisions and singularities. Most motion planning algorithms consider some of the aforementioned constraints, however, a unified framework to deal with all of them is lacking. This paper proposes a motion planning solution for the kinematic trajectory tracking of redundant NMMs that include all the constraints needed for practical implementation, which improves the manipulability of both the entire system and the manipulator to prevent singularities. Experiments using a 10-DOF NMM demonstrate the good performance of the proposed method for executing 6-DOF trajectories while satisfying all the constraints and simultaneously maximizing manipulability.

**Keywords:** motion planning; trajectory tracking; mobile manipulator; joint constraints; self-collision avoidance; manipulability

#### **1. Introduction**

A mobile manipulator is a robotic system that consists of a standard robot manipulator mounted on a mobile platform. This system integrates the dexterity provided by the manipulator with the extended workspace provided by the platform. Additionally, the combination of both subsystems usually introduces kinematic redundancy, which increases flexibility and dexterity. Therefore, mobile manipulators are suitable to perform delicate tasks over a large space, such as welding large parts or painting large, curvy surfaces.

The practical applications of robotic systems commonly define tasks by either a pointto-point movement or a continuous path of the end-effector in task space (also known as operational space). This paper aims to solve the latter, and in particular, focuses on the task space trajectory tracking problem. A trajectory is a path on which a timing law is specified, for instance in terms of velocities and/or accelerations [1]. In other words, not only is the end-effector's pose profile defined, but so is its velocity profile. To accomplish this, a motion planning algorithm that exploits the capabilities of both the manipulator and the mobile platform and that coordinates their movements is required. The redundancy of mobile manipulators can be used to perform additional subtasks or satisfy system constraints. These constraints include joint limits, joint velocity limits, joint velocity boundary constraints (i.e., the constraints on the initial and final joint velocities), and self-collision avoidance. Furthermore, for task space trajectory tracking to be achievable, it is important that the system is kept away from singularities. All these requirements make the motion planning for trajectory tracking a challenging problem.

There exist recent efforts in solving the motion planning of mobile manipulators [2,3]. Liao et al. [2] presented an optimization-based solution that not only handles constraints at the position level, but can also set a target joint configuration for the manipulator at

**Citation:** Leoro, J.; Hsiao, T. Motion Planning of Nonholonomic Mobile Manipulators with Manipulability Maximization Considering Joints Physical Constraints and Self-Collision Avoidance. *Appl. Sci.* **2021**, *11*, 6509. https://doi.org/ 10.3390/app11146509

Academic Editors: Pedro Neto, António Paulo Moreira and Félix Vidal

Received: 27 June 2021 Accepted: 14 July 2021 Published: 15 July 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

the end of the trajectory. A heuristic approach was proposed by Santos et al. [3] that is simple to implement and can accomplish additional constraints such as joint limits and manipulability improvement. Nonetheless, these methods do not deal with constraints at the velocity level and are only applicable to mobile manipulators with omnidirectional platforms. The present work focuses on the motion planning of mobile manipulators with nonholonomic mobile platforms. The movement of this type of platform is constrained by the rolling without slipping condition, which inhibits the platform from instantly moving in any arbitrary direction [4].

The motion planning of NMMs has also been studied in the literature using different approaches [5–10]. De Luca et al. [7] implemented the reduced gradient method for NMMs. This method finds a permutation matrix that helps reduce the velocity input to the subspace of commands that satisfy the given task. The remaining velocity inputs are used to maximize an objective function. Even though this method is computationally more efficient than the projected gradient approach, it is difficult to find such a permutation matrix for highly redundant robots since the Jacobian must be pre-analyzed by hand. Jia et al. [9] studied an adaptive motion distribution and coordinated control between the manipulator and the mobile platform to minimize the end-effector's positioning error.

In task space trajectory tracking, it is important that the motion planning algorithm moves the system away from singularities. This is because the system is in kinematic singularity, and the dexterity of the structure is reduced because the robot's end-effector cannot be moved in a certain direction. In addition, when the system is in the neighborhood of a singularity, small velocities in the task space may cause large velocities in the joint space [1], which is unacceptable because this would result in the failure of the trajectory tracking task, and even damage the mobile manipulator. For these reasons, the manipulability maximization was been included in multiple motion planning methods for NMMs [5,6,8–10]. Bayle et al. [5,6] maximized the system's manipulability using the projected gradient method. Huang et al. [8] studied the coordination of the platform and the manipulator, simultaneously considering the mobile platform stability and the manipulator's manipulability. Although these techniques can successfully follow the end-effector path while considering additional criteria, none of these consider joint constraints.

Furthermore, the solution of the task space trajectory tracking problem must not only consider joint limits but also joint velocity limits. This is because if a joint reaches its velocity limit, the end-effector might not be able to comply with the desired velocity profile. To the authors' knowledge, the literature of motion planning for trajectory tracking in task space with NMMs that includes joint constraints is limited. Zhang et al. [10] proposed formulating the motion planning problem as an optimization problem where the manipulability is maximized and the joint limits and joint velocity limits are included as constraints. This optimization problem is reformulated as a quadratic programming problem and converted into a linear variational inequality problem, that can be solved by different numerical methods. This approach is effective but does not consider boundary constraints for joint velocities. These constraints are also relevant because, for a given task, zero joint velocities are expected at the start and end of the trajectory. Additionally, NMMs are not only subject to physical limits but also to self-collisions, especially between the manipulator and the mobile platform, which are not included in their work.

An important remark is that maximizing the manipulability of the whole system might result in poor manipulability for the arm alone, even though the manipulability for the whole system is preserved or improved [6,10]. Additionally, it is important that both the robot arm subsystem and the whole mobile manipulator system maintain a certain level of manipulability once a coordinated task is completed. After completing an arbitrary path, a subsequent task might only require the arm to manipulate an object. If the arm is in a configuration with low manipulability, executing such a task might not be feasible. For these reasons, in this work, we propose a new manipulability measure for mobile manipulators that, when maximized, and as demonstrated in our experiments, intrinsically improves the manipulability of the robot arm as well as the manipulability of the whole system.

The solution to the motion planning of NMMs for trajectory tracking presented in this work includes joint constraints (range and maximum velocities), self-collision avoidance and manipulation capability preservation. Figure 1 summarizes how all these constraints are included in our solution. Both the particular and homogeneous solutions of our proposed scheme are weighted to avoid joint limits and self-collisions while the trajectory is tracked. The homogeneous solution is used to maximize the manipulability by exploiting the redundancy of the system. To satisfy joint velocity boundary constraints, the step size for searching the maximum manipulability is varied. The joint velocity limits are satisfied by restraining the maximum step size based on the allowable self-motion.

**Figure 1.** Summary of the proposed motion planning approach.

Experimental results demonstrate that our method can successfully solve the motion planning problem of NMMs under all the mentioned constraints. This work focuses on task space trajectory tracking at kinematic level. In other words, the outputs of the motion planning algorithm are the joint positions and velocities that will be fed to a joint space dynamic controller for motion control. Then, the motion controller is responsible for suppressing the model uncertainties and external disturbances to guarantee that the actual joint positions follow the ones output by the motion planning algorithm. In the experiments shown in this paper, we use the built-in motion controller of the commercial manipulator and leave the design of our own motion controller for future research.

The contributions of this work are detailed as follows:


This paper is organized as follows. Section 2 describes the kinematic modeling of NMMs. Section 3 describes the motion planning problem for trajectory tracking and presents the proposed solution. In Section 4, the concepts that are employed to satisfy each of the mentioned constraints are described and included in the motion planning algorithm. Experimental results using 6-DOF tasks are presented in Section 5 to validate the efficacy of our approach. Finally, Section 6 concludes the paper.

#### **2. Kinematic Modeling**

The kinematic modeling described here follows the procedure of De Luca et al. [7] and Bayle et al. [6]. For a general procedure of kinematic modeling of wheeled mobile manipulators, please refer to Bayle et al. [5].

Let *<sup>r</sup>* <sup>∈</sup> <sup>R</sup>*<sup>m</sup>* be the end-effector's position and/or pose in the task space. The configuration vector *q*, also known as the generalized coordinates of the mobile manipulator, is given by the combination of the platform configuration vector *qp* and the robot arm configuration vector *qa*. Figure 2 illustrates these configuration vectors. A frame *x y* is attached to the mobile platform at the center of the wheels' axle (*xp*, *yp*), with respect to the world reference frame *xy*, with its *x* axis pointing in the forward direction and the *y* axis pointing in the direction parallel to the wheels' axle. The angle between the *x* axis of the world reference frame and *x* attached to the platform is denoted as *θp*. Then, the platform configuration is given by *qp* = - *xp yp θ<sup>p</sup> <sup>T</sup>* <sup>∈</sup> <sup>R</sup>3. The robot arm configuration is given by the position of its joints as *qa* = - *qa*<sup>1</sup> *qa*<sup>2</sup> ... *qna <sup>T</sup>* <sup>∈</sup> <sup>R</sup>*na* . Finally, the configuration vector of the mobile manipulator is *q* = - *qp <sup>T</sup> qa <sup>T</sup><sup>T</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* with *<sup>n</sup>* <sup>=</sup> <sup>3</sup> <sup>+</sup> *na*. The end-effector's position/pose is a function of the configuration vector defined by the kinematic map:

$$r = f(\mathfrak{q}\_p, \mathfrak{q}\_a),\tag{1}$$

**Figure 2.** Nonholonomic mobile manipulator schematic.

The wheeled platform movement is constrained under the rolling without a slipping assumption on both wheels, which can be expressed as the following nonholonomic constraint:

$$
\dot{q}\_p = G(q\_p) u\_{p\prime} \tag{2}
$$

where *up* = - *vp ω<sup>p</sup> <sup>T</sup>* <sup>∈</sup> <sup>R</sup><sup>2</sup> is the velocity input of the platform, consisting of the linear and angular velocities of the platform, which are known as pseudo velocities or quasivelocities. The columns of matrix *G*(*qp*) span the admissible velocity space for a given platform configuration. The matrix *G*(*qp*) for a differential-drive platform is defined as

$$G(q\_p) = \begin{bmatrix} \cos \theta\_p & 0\\ \sin \theta\_p & 0\\ 0 & 1 \end{bmatrix}'$$

On the other hand, the robotic arm kinematics at the velocity level is not constrained for any configuration, and therefore, the generalized velocities of the arm are equal to the velocity inputs of the arm:

$$
\dot{q}\_a = u\_a. \tag{3}
$$

The velocity input vector for the entire NMM can be constructed as *u* = - *up <sup>T</sup> ua <sup>T</sup><sup>T</sup>* <sup>∈</sup> R*δ<sup>m</sup>* , with *δ<sup>m</sup>* = 2 + *na*. The dimension *δ<sup>m</sup>* is called the mobility degree of the mobile manipulator and indicates the space dimension of the admissible generalized velocities [6]. Using (2) and (3), the map from the velocity input vector *u* to the generalized velocities *q***˙** = - *q***˙***p <sup>T</sup> q***˙***<sup>a</sup> <sup>T</sup><sup>T</sup>* can be written as

$$
\dot{q} = S(q)u,\tag{4}
$$

with:

$$S(q) = \begin{bmatrix} G(q\_p) & 0 \\ 0 & I\_{n\_a} \end{bmatrix}'$$

where *G*(*qp*) maps the platform's velocity input vector *up* = - *vp ω<sup>p</sup> <sup>T</sup>* to the platform's generalized velocities *q***˙***<sup>p</sup>* = - *x*˙ *<sup>p</sup> y*˙ *<sup>p</sup>* ˙ *θp T* , and matrix *Ina* is an identity matrix of size *na* that sets the arm's generalized velocities equal to its input velocities *q***˙***<sup>a</sup>* = *ua*.

The differential kinematics of the NMM is obtained by differentiating the relation (1) with respect to time:

$$\begin{aligned} \dot{r} &= \begin{bmatrix} J\_p(q\_p) & J\_a(q\_a) \end{bmatrix} \begin{bmatrix} \dot{q}\_p \\ \dot{q}\_a \end{bmatrix} \\ &= \begin{bmatrix} J\_p(q\_p) & J\_a(q\_a) \end{bmatrix} S(q)u \\ &= J(q)S(q)u = Ju, \end{aligned} \tag{5}$$

where the matrices *Jp* <sup>∈</sup> <sup>R</sup>*m*×<sup>3</sup> and *Ja* <sup>∈</sup> <sup>R</sup>*m*×*na* are the Jacobians of the platform and the arm, respectively. The matrix *<sup>J</sup>* <sup>∈</sup> <sup>R</sup>*m*×*<sup>n</sup>* is the Jacobian of the mobile manipulator, and ¯*<sup>J</sup>* <sup>∈</sup> <sup>R</sup>*m*×*δ<sup>m</sup>* is a reduced version of the Jacobian in which the admissible velocities of the platform under the nonholonomic constraint have been included. Equation (5) follows the same form as the differential kinematics of standard manipulators; therefore, the well-known methodologies for the motion planning of redundant manipulators can be extended to NMMs in terms of ¯*J*, including concepts for joint limits avoidance, self-collisions avoidance and manipulability maximization.

#### **3. Motion Planning Method**

The motion planning problem for task space trajectory tracking consists of finding the input velocities *<sup>u</sup>*, given the desired end-effector's position/pose *rd*(*t*) <sup>∈</sup> <sup>R</sup>*<sup>m</sup>* for *<sup>t</sup>* <sup>∈</sup> [*t*0, *tf* ], such that *r*(*t*) follows *rd*(*t*). If the task space velocity of the end-effector is set as

$$
\dot{\mathbf{r}} = \dot{\mathbf{r}}\_d + \mathbf{K}(\mathbf{r}\_d - \mathbf{r}),
\tag{6}
$$

where *<sup>K</sup>* <sup>∈</sup> <sup>R</sup>*m*×*<sup>m</sup>* is a positive definite matrix, then the convergence of *<sup>r</sup>*(*t*) to *rd*(*t*) is guaranteed. Consequently, the motion planning problem is turned into solving the input velocities *u* from the underdetermined linear equations (5) where *r***˙** is set according to (6).

During the execution of the trajectory, the movement of the joints that get closer to a position constraint (joint limit or self-collision) must be penalized (slowed down). This can be achieved by using a weighting matrix. In this work, we define the weighted input velocity and the reduced weighted Jacobian as follows:

$$J\_W = \text{JW}^{\frac{1}{2}}(q) \quad \text{and} \quad \mathfrak{u} = \mathcal{W}^{\frac{1}{2}}(q)\mathfrak{u}\_{W'} \tag{7}$$

where *<sup>W</sup>*(*q*) <sup>∈</sup> <sup>R</sup>*δm*×*δ<sup>m</sup>* is a configuration-dependent positive semidefinite matrix. It is constructed so that its elements penalize the motion of the joints based upon the system constraints. We choose *W*(*q*) as a diagonal matrix in this paper; furthermore, we drop the notational dependency of *W* on *q* to simplify the expression in the subsequent derivation.

Using (7), the system (5) can be rewritten as

$$
\dot{\mathbf{r}} = \overline{\mathbb{J}}\_{\mathbb{W}} \mathfrak{u}\_{\mathbb{W}}.\tag{8}
$$

Because of the kinematic redundancy, i.e., *m* < *δm*, infinite solutions to *uW* exist. The solution to (8) can be decomposed as a sum of a particular solution and a non-zero homogeneous solution. The particular solution satisfies the end-effector's task, while the homogeneous solution changes the manipulator configuration without changing the end-effector's position/pose.

One way to solve the redundant system (8) is to formulate the problem as a constrained optimization problem [1], where the goal is to minimize a cost function and satisfy the constraint (8). We propose the minimization of the quadratic cost function:

$$g(\boldsymbol{\mu}\_W) = \frac{1}{2} (\boldsymbol{\mu}\_W - \boldsymbol{W^{\frac{1}{2}}} \boldsymbol{\mu\_0})^T (\boldsymbol{\mu\_W} - \boldsymbol{W^{\frac{1}{2}}} \boldsymbol{\mu\_0})^T$$

this choice is aimed to find a vector of velocities *uW* that is as close as possible to *<sup>W</sup>* <sup>1</sup> <sup>2</sup> *u***0**, where *<sup>u</sup>***<sup>0</sup>** <sup>∈</sup> <sup>R</sup>*δ<sup>m</sup>* is a velocity vector that is used to satisfy an additional task such as maximizing the manipulability. The choice of *u***<sup>0</sup>** will be presented shortly. Notice that the physical constraints are also imposed to *u***<sup>0</sup>** by weighting it similarly to how ¯*JW* is obtained in (7). By using the method of Lagrange multipliers to minimize *g*(*uW* ) with the equality constraint (8), the following solution is obtained:

$$\mathfrak{u}\_{\mathsf{W}}{}^{\*} = \overline{f}\_{\mathsf{W}}^{\dagger} \dot{\mathsf{r}} + (I - \overline{f}\_{\mathsf{W}}^{\dagger} \overline{f}\_{\mathsf{W}}) \mathsf{W}^{\frac{1}{2}} \mathfrak{u}\_{\mathsf{0}}.\tag{9}$$

where ¯*J*† *<sup>W</sup>* is the Moore–Penrose pseudoinverse of the matrix ¯*JW* such that ¯*JW* ¯*J*† *<sup>W</sup>* = *I*, *<sup>I</sup>* <sup>−</sup> ¯*J*† *<sup>W</sup>* ¯*JW* is the orthogonal projection into the null space of ¯*JW*. The first term of (9) is the particular solution, and the second term is the homogeneous solution. It is trivial to show that *<sup>I</sup>* <sup>−</sup> ¯*J*† *<sup>W</sup>* ¯*JW* projects *<sup>u</sup>***<sup>0</sup>** onto the null space of ¯*JW* by multiplying both sides of (9) by ¯*JW*. The velocity input vector is then recovered using the second part of (7):

$$
\mu = \mathcal{W}^{\frac{1}{2}} \mathcal{J}\_{\mathcal{W}}^{\dagger} \dot{\mathbf{r}} + \mathcal{W}^{\frac{1}{2}} (I - \mathcal{J}\_{\mathcal{W}}^{\dagger} \mathcal{J}\_{\mathcal{W}}) \mathcal{W}^{\frac{1}{2}} u\_{\mathbf{0}}.\tag{10}
$$

The degree of kinematic redundancy at the velocity level is relevant for this solution to be feasible, since it defines the number of simultaneous constraints that can be satisfied in the differential kinematics inversion without ¯*JW* losing its rank. The degree of redundancy for NMMs is calculated as *<sup>R</sup>* <sup>=</sup> *<sup>δ</sup><sup>m</sup>* <sup>−</sup> *<sup>m</sup>*. Whilst analyzing the matrix *<sup>W</sup>* <sup>1</sup> <sup>2</sup> for three different cases, it is possible to understand the expected behavior of solution (10). Let *z* represent the number of elements that are zero in the diagonal of *W* <sup>1</sup> <sup>2</sup> , i.e., the number of joints that are forced to stop due to *z* simultaneously activated constraints. When *z* < *R*, both the particular and homogeneous solutions exist, and therefore, the secondary task will still be considered. When *z* = *R*, the system is not redundant anymore and only the particular solution exists. Finally, when *z* > *R*, the system (8) has no solution. Therefore, for the solution (10) to exist, the condition *z* ≤ *R* is necessary.

Our proposed solution (10) has an advantage over the projected gradient solution used in [5,6], because it includes the physical constraints and penalizes both the particular and homogeneous solutions. Furthermore, in contrast with the weighted least-norm method [11], this solution takes advantage of the redundant joints that have not been penalized, due to physical constraints, in attempt to satisfy the task defined by *u***0**.

In this paper, the matrix *W* is constructed to satisfy joint position constraints (joint limits and self-collision avoidance), which is discussed in Section 4.2. The vector *u***<sup>0</sup>** is designated to locally maximize a proposed new manipulability measure for mobile manipulators, as discussed in Section 4.1. To satisfy the joint velocity constraints, the particular and homogeneous solutions are analyzed. Because the particular solution is in charge of following the end-effector's task, it cannot be modified. On the other hand, the homogeneous solution can be varied to satisfy the joint velocity constraints. Furthermore, the homogeneous solution must also consider the admissible velocity commands with

respect to the nonholonomic constraints [7]. Taking these two points into account, we define vector *u***<sup>0</sup>** as

$$
\mu\_0 = \pm a(t)\beta(t)S^T(q)\nabla\_q F(q),
\tag{11}
$$

where *α*(*t*) is a scalar coefficient that is adjusted online to satisfy joint velocity limits (Section 4.3.2), *β*(*t*) is a positive variation factor used to satisfy joint velocity boundary constraints (Section 4.3.1), *S*(*q*) is defined in (4), and *∇qF*(*q*) is the gradient of the objective function *<sup>F</sup>*(*q*) : <sup>R</sup>3+*na* <sup>→</sup> <sup>R</sup>, which is set to a manipulability measure. The product *<sup>α</sup>*(*t*)*β*(*t*) is the step size of the gradient step ascent/descent, and the sign of *u***<sup>0</sup>** defines whether the objective function *F*(*q*) will be maximized (+) or minimized (−).

Replacing (11) in (10), the final form of the proposed solution for *u* is:

$$\begin{aligned} \boldsymbol{u} &= \boldsymbol{u}\_{\mathcal{P}} + \boldsymbol{a}(t)\boldsymbol{\beta}(t)\boldsymbol{u}\_{\mathcal{W}} \\ \text{with} \\ \boldsymbol{u}\_{\mathcal{P}} &= \mathcal{W}^{\frac{1}{2}}\boldsymbol{f}\_{\mathcal{W}}^{\dagger}\dot{\boldsymbol{r}}\_{\mathcal{r}} \\ \boldsymbol{u}\_{\mathcal{H}} &= \mathcal{W}^{\frac{1}{2}}(\boldsymbol{I} - \boldsymbol{f}\_{\mathcal{W}}^{\dagger}\boldsymbol{\bar{J}}\_{\mathcal{W}})\mathcal{W}^{\frac{1}{2}}\boldsymbol{S}^{T}(\boldsymbol{q})\nabla\_{\boldsymbol{q}}\boldsymbol{F}(\boldsymbol{q}), \end{aligned} \tag{12}$$

where the positive sign for the gradient step descent has been used because we are interested in maximizing the manipulability. Notice that when the sign of *α*(*t*) is positive, the objective function *F*(*q***)** is maximized. However, for some cases, *α*(*t*) could be negative for short periods of time to respect joint velocity limits, as explained in Section 4.3.

#### **4. Manipulability Maximization and Constraints Satisfaction**

In this section, the details for manipulability maximization are described, the weighting matrix *W* is defined so that it penalizes the joint movement to avoid both joint limits and self-collisions, and a scheme to satisfy joint velocity limits as well as joint velocity boundary constraints is presented.

#### *4.1. Manipulability Maximization*

The term manipulability, introduced by Yoshikawa [12], describes the ability of a robotic system to provide end-effector's velocities in any direction for a given configuration. The manipulability of wheeled mobile manipulators was studied in detail in [5]. There exist different algebraic measures to characterize the manipulability of a robotic system [5,12–14]. The most widely used measure, known as the manipulability measure and noted here as Ω, is given by Ω = *σ*1*σ*<sup>2</sup> ... *σm*, where *σ*1, *σ*2, ... , *σ<sup>m</sup>* are the singular values of the system Jacobian *J*(*q*). Therefore, Ω is the manipulability measure for system configuration *q*. The measure Ω is proportional to the volume of the manipulability ellipsoid [12]. Furthermore, it can be shown that Ω has the following property:

$$
\Omega = \sqrt{\det(J(q)J^T(q))}\tag{13}
$$

By definition, Ω ≥ 0, and Ω = 0 if and only if rank(*J*(*q*)) < *m*. The elements of *∇q***Ω** are given mathematically by

$$\begin{split} \frac{\partial \Omega}{\partial q\_{i}} &= -\frac{1}{2} \Omega \cdot \text{trace} \left( \left( \left( f \right)^{T} \right)^{-1} \left( \frac{\partial f}{\partial q\_{i}} f^{T} + f \left( \frac{\partial f}{\partial q\_{i}} \right)^{T} \right) \right), \\ \text{with } i &= 1, 2, \dots, n. \end{split} \tag{14}$$

As mentioned before, we are not only concerned with improving the manipulability of the whole system but also that of the arm alone. In this work, we present a new manipulability measure that better expresses the manipulability of a mobile manipulator, because both the manipulability of the arm and the whole system are intrinsically considered.

Let us define the manipulability of the whole system, denoted as Ω*p*+*a*, and the manipulability of the robot arm, denoted as Ω*a*, to be the measures obtained using Equation (13) with the Jacobians ¯*J* and *Ja* from Equation (5), respectively. Notice that by using the reduced Jacobian ¯*J*, the platform's nonholonomic constraints are included in the manipulability measure of the whole system. Our proposed manipulability measure for mobile manipulators is defined as

$$
\Omega\_{MM} = \mathring{\Omega}\_{p+a} \mathring{\Omega}\_{a\_{\prime}} \tag{15}
$$

where Ωˆ *<sup>p</sup>*+*<sup>a</sup>* and Ωˆ *<sup>a</sup>* are the normalized manipulabilities computed by dividing them by their respective maximum values over all the possible configurations of the system:

$$
\hat{\Omega}\_{p+a} = \frac{\Omega\_{p+a}}{\Omega\_{p+a\_{\max}}} \quad \text{and} \quad \hat{\Omega}\_{\mathfrak{d}} = \frac{\Omega\_{\mathfrak{d}}}{\Omega\_{\mathfrak{d}\_{\max}}}.
$$

The normalized values are used in our proposed measure to make sure that all its components are in the same scale. This normalization also makes the measures invariant to units and chosen reference frame [14,15].

Notice that ¯*J* = - *Jp Ja <sup>S</sup>*(*q*) <sup>∈</sup> <sup>R</sup>*m*×*δ<sup>m</sup>* in (5) indicates the relation between ¯*<sup>J</sup>* and *Ja*. Hence, the singular values of ¯*J* are not necessarily the same as the singular values of *Ja*. There are cases where *Ja* is rank-deficient while ¯*J* is not. For these cases, the whole system manipulability Ω*p*+*<sup>a</sup>* = 0 even though the arm is in a singular configuration. Therefore, the measure Ω*p*+*<sup>a</sup>* fails to express the ability of the arm alone to provide end-effector's velocities in any direction. As a result, maximizing the measure Ω*p*+*<sup>a</sup>* might result in the poor manipulability of the arm, even though the manipulability of the whole system is preserved or improved. This is an issue that has been discussed in the literature [6,10].

On the other hand, our proposed measure Ω*MM* is defined by the product of the singular values of ¯*J* and *Ja*, i.e., the product of the manipulability ellipsoids of ¯*J* and *Ja*. Therefore, it encapsulates the abilities of the whole system and the arm to provide end-effector's velocities in any direction. If any of the singular values of *Ja* or ¯*J* is zero, e.g., the arm is in a singularity, the measure Ω*MM* = 0. Hence, the measure Ω*MM* is a better representation of the manipulability of mobile manipulators because it intrinsically considers the manipulability of the arm and the whole system.

Using the product rule, the gradient of Ω*MM* is calculated as

$$
\nabla\_q \Omega\_{MM} = \frac{1}{\Omega\_{p+a\_{\max}} \Omega\_{a\_{\max}}} (\Omega\_a \nabla\_q \Omega\_{p+a} + \Omega\_{p+a} \nabla\_q \Omega\_a). \tag{16}
$$

Analyzing the right hand side of (16), it is clear that the manipulability of the arm subsystem affects the gradient of the whole system. Likewise, the manipulability of the whole system affects the gradient of the arm subsystem. Therefore, maximizing Ω*MM*, i.e., setting *∇qF*(*q*) = *∇q***Ω***MM* in our solution (12) will simultaneously improve the manipulabilities of the arm and the whole system. In the experiments section (Section 5), a comparison is performed among the different objective functions for manipulability maximization to demonstrate the advantages of the proposed manipulability measure.

The reader may have noticed that because our proposed motion planning solution (12) multiplies *W* <sup>1</sup> <sup>2</sup> with the gradient *∇qF*(*q*), the direction of the original gradient that maximizes the manipulability is changed. However, as our goal is to push the system away from singularities by gradually improving the manipulability rather than finding the shortest path towards its maximum value, it is sufficient to prove that the weighted vector *W* <sup>1</sup> <sup>2</sup> *u***<sup>0</sup>** also points in a direction that increases the manipulability as the unweighted gradient *<sup>u</sup>***<sup>0</sup>** does. Since *<sup>W</sup>* <sup>1</sup> <sup>2</sup> is positive semidefinite, we have *u<sup>T</sup>* **<sup>0</sup>** *<sup>W</sup>* <sup>1</sup> <sup>2</sup> *u***<sup>0</sup>** ≥ 0 for all *u***0**. This implies that the weighted gradient also points in a direction that increases the manipulability.

#### *4.2. Joint Position Constraints*

Joint position constraints are common requirements in the motion planning of robotic systems. In this work, joint limits and self-collision avoidance are considered. These constraints are included in the proposed solution (12) using matrix *W*, which we define as the product of three terms as follows:

$$\mathcal{W} = \mathcal{W}\_{film} \mathsf{W}\_{Col} \, T\_q^{-1} \mathsf{A}$$

where *WJlim* is the weighting matrix for joint limits constraints, *WCol* is the weighting matrix for self-collision avoidance, and the matrix *Tq* (known as the Jacobian normalization matrix), is used to normalize the velocity commands *u* as follows [12]:

$$T\_q = \text{diag}\left(\frac{1}{\mu\_{1\_{\text{max}}}}, \frac{1}{\mu\_{2\_{\text{max}}}}, \dots, \frac{1}{\mu\_{\delta\_{\text{max}}}}\right),$$

where *uimax* is the *i*th maximum velocity command. The weighting of this matrix handles the differences in units and scales among all the joints.

For mobile manipulators, matrices *WJlim* and *WCol* should be constructed only considering the arm. This is because there is no limit to the movement of the mobile platform along each degree of freedom. Likewise, the movement of the mobile platform cannot be used to avoid self-collisions because the platform moves the base of the arm. Therefore, in this work, the structure of both weighting matrices was designed to take this into account. To simplify the presentation, let *Wg* represent either *WJlim* or *WCol*. *Wg* is a diagonal *δ<sup>m</sup>* × *δ<sup>m</sup>* matrix with the following form:

$$\mathcal{W}\_{\mathcal{S}} = \begin{bmatrix} I\_2 & 0 \\ 0 & \mathcal{W}\_a \end{bmatrix}'$$

where *I*<sup>2</sup> is an identity matrix of size 2 (for the case of the differential drive) and *Wa* is a diagonal *na* × *na* matrix given by

$$\mathcal{W}\_{\mathfrak{a}} = \begin{bmatrix} w\_1 & 0 & 0 & \dots & 0 \\ 0 & w\_2 & 0 & \dots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \dots & w\_{n\_{\mathfrak{a}}} \end{bmatrix}'$$

where each element of the diagonal matrix *Wa* is defined using a performance criterion *H*(*q*).

In the next two subsections, two separate criterion functions for joint limits and selfcollision avoidance are defined. These two criterion functions have common properties. The values of *H*(*q*) and |*∂H*(*q*)/*∂qi*| become very large as the constraints are violated. When constructing matrix *Wa* using these criterion functions, the joint movement towards or away from a constraint must be contemplated [11,16]. Under this consideration, the elements of *Wa* are given by

$$w\_i = \begin{cases} \frac{1}{1 + \left| \frac{\partial H(q)}{\partial q\_i} \right|} & \text{if } \Delta \left| \frac{\partial H(q)}{\partial q\_i} \right| > 0 \\\\ 1, & \text{if } \Delta \left| \frac{\partial H(q)}{\partial q\_i} \right| \le 0, \end{cases} \tag{17}$$
 
$$\text{with } i = 1, 2, \dots, n\_{a\_{\ell}}$$

where Δ|*∂H*(*q*)/*∂qi*| is the change rate of |*∂H*(*q*)/*∂qi*| with respect to time, and is numerically calculated during implementation. With this choice, a value of one is assigned to *wi*, indicating that no penalty is imposed on the *i*th joint, if the *i*th joint is not moving (Δ|*∂H*(*q*)/*∂qi*| = 0), or it moves away from a constraint (Δ|*∂H*(*q*)/*∂qi*| < 0). On the other hand, *wi* tends towards zero if the movement of the *i*th joint gets closer to a constraint. Hence, the element *wi* penalizes the movement of the *i*th joint by means of (12) if it moves towards a constraint and stops the joint if it is too close to it.

#### 4.2.1. Joint Limits Avoidance

To construct the weighting matrix for joint limits avoidance *WJlim*, a well-known criterion function [17] is used:

$$H\_{flin}(q) = \sum\_{i=1}^{n\_d} \frac{1}{4\gamma} \frac{(q\_i^+ - q\_i^-)^2}{(q\_i^+ - q\_i)(q\_i - q\_i^-)},$$

where *qi* is the *i*th joint angle, *q*<sup>−</sup> *<sup>i</sup>* and *<sup>q</sup>*<sup>+</sup> *<sup>i</sup>* are its lower and upper limits, respectively, and *γ* is a scalar constant that adjusts the rate of change of *HJlim*(*q*). This criterion function increases as the joint gets closer to its limits and goes to infinity at the joint bounds. The elements of the gradient of this function are given by

$$\frac{\partial H\_{\rm film}(q)}{\partial q\_{i}} = \frac{1}{4\gamma} \frac{(q\_{i}^{+} - q\_{i}^{-})^{2} (2q\_{i} - q\_{i}^{+} - q\_{i}^{-})}{(q\_{i}^{+} - q\_{i})^{2} (q\_{i} - q\_{i}^{-})^{2}}.\tag{18}$$

Each element *∂HJlim*(*q*)/*∂qi* is equal to zero if *qi* is at the middle of its joint range and goes to infinity at either limit.

#### 4.2.2. Self-Collision Avoidance

To construct the weighting matrix for self-collision avoidance *WCol*, an exponential function of the distance between two collision points is used as the criterion function [16]:

$$H\_{\mathrm{Col}}(\mathfrak{q}) = \rho \mathfrak{e}^{-c\_1 d(\mathfrak{q})} d(\mathfrak{q})^{-c\_2} \rho$$

where *ρ* > 0 controls the amplitude of *HCol*(*q*), and *c*1, *c*<sup>2</sup> > 0 control the rate of decay. This function has a maximum value when the distance *d* between two links is zero, and exponentially decreases as this distance increases. The distance between two collision points is defined as *d*(*q*) = *pl***<sup>1</sup>** − *pl***<sup>2</sup>** 2, where *pl***<sup>1</sup>** and *pl***<sup>2</sup>** represent the position vectors, referred to a common frame, of the collision points on two nonadjacent links. *pl*<sup>1</sup> and *pl*<sup>2</sup> can be calculated from the configuration vector *q* through forward kinematics.

The elements of the gradient of *HCol*(*q*) are given by

$$\frac{\partial H\_{\text{Col}}(q)}{\partial q} = \frac{\partial H\_{\text{Col}}(q)}{\partial d} \frac{\partial d}{\partial q'} \tag{19}$$

where each of the partial derivatives is given by

$$\frac{\partial H\_{\text{Col}}(\mathfrak{q})}{\partial d} = -\rho \mathfrak{e}^{-c\_1 d} d^{-c\_2} (c\_2 d^{-1} + c\_1),\tag{20}$$

$$\frac{\partial d}{\partial q} = \frac{1}{d} [J\_1^T (p\_{I1} - p\_{I2}) + J\_2^T (p\_{I2} - p\_{I1})]\_\prime \tag{21}$$

where *J*<sup>1</sup> and *J*<sup>2</sup> are the associated Jacobian matrices of *pl***<sup>1</sup>** and *pl***2**, respectively. The collision points are chosen from the surface of the links for which the collision distance is computed. For the case of potential collisions between the arm and the mobile platform, *pl***<sup>2</sup>** is picked as a point fixed on the surface of the mobile platform (*pl***<sup>2</sup>** does not move with respect to the arm). By using a frame attached to the mobile platform as the common frame, and selecting *pl***<sup>2</sup>** as a fixed point, (21) reduces to:

$$\frac{\partial d}{\partial q} = \frac{1}{d} [J\_1^T (p\_{l1} - p\_{l2})].\tag{22}$$

When constructing matrix *Wa*, the partial derivative *∂d***/***∂q* in (19) was calculated using (21) or (22) depending on whether the collision is evaluated between two moving links or a moving link and a fixed one, respectively. Each element *∂H*(*q*)/*∂qi* tends toward zero as the distance between two evaluated collision points increases, and tends towards infinity as the distance gets closer to zero. Because *d* = 0 is not admissible, i.e., the two links are in contact, the chosen points must contemplate a threshold.

Multiple pairs of potential collision points might exist in a robotic system. Let *Nc* be the number of potential collision point pairs that are evaluated. A self-collision matrix *WColj* is constructed for each distance *dj*, with *j* = 1, 2, ... , *Nc*. The weighting matrix that includes the contribution of each evaluated pair is finally obtained by

$$\mathcal{W}\_{Col} = \prod\_{j=1}^{N\_c} \mathcal{W}\_{Col\_j}. \tag{23}$$

With this combination, the *i*th diagonal element of the matrix *WCol* penalizes the movement of the *i*th joint. In contrast with the combination of collision weighting matrices proposed in [16], the combination (23) guarantees that the joints are stopped if they attempt to decrease any of the collision distances *dj* to a value of zero. Since some potential collisions are taken care of by the joint limits, the number of points considered in (23) is small, and therefore the computation cost can be relieved.

#### *4.3. Joint Velocity Constraints*

In a task space trajectory tracking problem, joint constraints at the velocity level are as important as joint limits and self-collision avoidance. These constraints include joint velocity boundary constraints and joint velocity limits. Satisfying joint velocity boundary constraints is a requirement for the end-effector to stop at the beginning and end of the task, i.e., the initial and final joint velocities must be zero. Joint velocity limits must be considered because the motion planning algorithm might generate joint velocity commands that are out of bounds in order to follow the task space velocity profile. These unfeasible velocities cannot be achieved by the real joints. Therefore, the trajectory tracking will fail if the joint velocity limits are not respected.

#### 4.3.1. Joint Velocity Boundary Constraints

To satisfy joint velocity boundary constraints, the initial and final joint velocities must be zero. *up* in (12) is directly associated with the end-effector's task; therefore, it implicitly satisfies the boundary constraints. However, *uh* in (12) does not necessarily satisfy them, because the manipulability maximization task is dependent on the mobile manipulator configuration at the initial and final poses. In this work, we propose using a time-varying self-motion magnitude to handle these constraints. The objective is to avoid non-zero values of *uh* only at the beginning and end of the trajectory. With this in mind, it is proposed to set the variation factor *β*(*t*) to increase from zero to one at the beginning of the trajectory, maintain a value of one for most of the trajectory, and then decrease from one to zero at the end of the trajectory.

Let *tb* be the blending time for *β*(*t*) to increase from zero to one, and *tf* be the trajectory execution time. To achieve a smooth transition, a 5th order polynomial *β*1(*t*) = *a*<sup>0</sup> + *a*1*t* + *a*2*t* <sup>2</sup> + *a*3*t* <sup>3</sup> + *a*4*t* <sup>4</sup> + *a*5*t* <sup>5</sup> is used when *t* < *tb*. The decrement in *β*(*t*) at the end of the trajectory, when *t* > *tf* − *tb*, is the complement of the polynomial *β*1(*t*) and is defined by *β*2(*t*) = 1 − *β*1(*t* − *tf* + *tb*). By imposing the conditions *β*1(0) = 0, *β*1(*tb*) = 1, *β*˙ <sup>1</sup>(0) = 0, *β*˙ <sup>1</sup>(*tb*) = 0, *β*¨ <sup>1</sup>(0) = 0, *β*¨ <sup>1</sup>(*tb*) = 0, the values of the coefficient of *β*1(*t*) are found.

After finding the polynomial coefficients, the self-motion magnitude variation factor is given by

$$\beta(t) = \begin{cases} a\_3 t^3 + a\_4 t^4 + a\_5 t^5, & \text{if } t < t\_b \\ 1, & \text{if } t\_b \le t \le t\_f - t\_b \\ 1 - (a\_3 (t - t\_f + t\_b)^3 + \\ a\_4 (t - t\_f + t\_b)^4 + & \text{if } t > t\_f - t\_b, \\ a\_5 (t - t\_f + t\_b)^5) \end{cases} \tag{24}$$

\*\*with:\*\* 
$$\begin{aligned} a\_0 &= 0, \quad a\_1 = 0, \quad a\_2 = 0, \\ a\_3 &= \frac{10}{t\_b^3}, \quad a\_4 = -\frac{15}{t\_b^4}, \quad a\_5 = \frac{6}{t\_b^5}, \end{aligned} $$

The blending time *tb* is chosen by the user. Figure 3 shows the shape of *β*(*t*) for *tf* = 15(*s*), *tb* = 2.5(*s*).

**Figure 3.** Example shape of self-motion variation factor *β*.

#### 4.3.2. Joint Velocity Limits

To satisfy joint velocity limits, the maximum magnitude of self-motion is determined such that the velocity limit for each joint is not violated [18], namely:

<sup>|</sup>*uip* (*t*) + *αβuih* (*t*)| ≤ *<sup>u</sup>*<sup>+</sup> *<sup>i</sup>* , (25)

where *u*<sup>+</sup> *<sup>i</sup>* is the *i*th joint velocity limit with *i* ∈ 1, 2, . . . , *δm*. Therefore, to avoid exceeding the joint velocity limits, *α*(*t*) must be selected such that it satisfies (25). The upper and lower limits of *α*(*t*) can be found using this equation. For each joint, it can be shown that the maximum and minimum values of *α*(*t*), denoted by *αimax* (*t*) and *αimin* (*t*), respectively, are given by

$$a\_{i\_{\max}}(t) = \max\left(\frac{u\_i^+ - u\_{i\_p}(t)}{\beta(t)u\_{i\_h}(t)}, \frac{-u\_i^+ - u\_{i\_p}(t)}{\beta(t)u\_{i\_h}(t)}\right),\tag{26}$$

$$\mu\_{i\_{\min}}(t) = \min\left(\frac{u\_i^+ - u\_{i\_p}(t)}{\beta(t)u\_{i\_h}(t)}, \frac{-u\_i^+ - u\_{i\_p}(t)}{\beta(t)u\_{i\_h}(t)}\right). \tag{27}$$

Then, *αmax*(*t*) and *αmin*(*t*) for all the joints are:

$$a\_{\max}(t) = \min\left(a\_{1\_{\max}}(t), a\_{2\_{\max}}(t), \dots, a\_{\delta\_{\text{tw}\_{\max}}}(t)\right),\tag{28}$$

$$\mathfrak{a}\_{\min}(t) = \max\left(\mathfrak{a}\_{1\_{\min}}(t), \mathfrak{a}\_{2\_{\min}}(t), \dots, \mathfrak{a}\_{\delta\_{\min}}(t)\right),\tag{29}$$

where *αmax*(*t*) is the self-motion magnitude limit. In [18], the maximum value of *αmax*(*t*) and minimum value of *αmin*(*t*) are calculated for the whole trajectory, and the upper bound of *αmax*(*t*) or lower bound of *αmin*(*t*) is used as the step size to take advantage of the maximum admissible velocities. In our approach, a suitable initial value *α<sup>s</sup>* is selected

through experimentation and the upper and lower limits of *α* are calculated for each time *t*. The value of *α* at time *t* is then given by

$$\alpha(t) = \begin{cases} \alpha\_{\max}(t), & \text{if } \alpha\_s > \alpha\_{\max}(t), \\ \alpha\_{\min}(t), & \text{if } \alpha\_s < \alpha\_{\min}(t), \\ \alpha\_{s'} & \text{otherwise.} \end{cases} \tag{30}$$

This technique, in contrast with using the maximum/minimum self-motion for the entire trajectory, prevents sudden joint accelerations due to the use of maximum velocities in every step. Note that for some cases *α*(*t*) may be negative if *αmax*(*t*) is negative, which means that the joint velocity limits cannot be satisfied without decreasing the manipulability of the system. The task can be executed as long as the system is still away from any singularity. The reader may have noticed that *β*(*t*) is in the denominator of (26) and (27), and as shown in Section 4.3.1, *β*(*t*) is zero at the beginning and the end of the trajectory. For these two cases, *αmin* = −∞ and *αmax* = ∞, but this does not cause instability in the system because by following (30), the value of *α*(*t*) is set to *αs*.

It is also possible to detect whether a task can be accomplished or not by using the limits of *α*(*t*). If the inequality *αmax*(*t*) < *αmin*(*t*) is true, then a suitable value of *α*(*t*) which avoids the joint velocity limits does not exist, because even the particular solution will violate them. In other words, the given task space trajectory cannot be accomplished without violating at least one of the joint velocity limits. For these situations, the task space trajectory must be replanned with a longer execution time *tf* or with lower end-effector's maximum velocities.

#### **5. Experiments**

Experiments were carried out to verify the efficacy of our scheme to solve the motion planning problem for trajectory tracking at the kinematic level. In this section, the results for the tracking of two trajectories, a Lissajous trajectory (see Section 5.4), and an elliptic trajectory (see Section 5.5), are analyzed. These trajectories were picked to demonstrate the ability of our approach to comply with the different constraints introduced in this manuscript while improving the manipulabilities of the whole system and the robot arm. For the Lissajous trajectory, a comparison of different objective functions for manipulability maximization is made to highlight the advantages of the proposed manipulability measure for mobile manipulators.

The experiments were performed using a 10-DOF NMM developed by the Industrial Technology Research Institute (ITRI), as shown in Figure 4. This NMM is composed of a differential-drive wheeled mobile platform, a prismatic joint mounted on top of the platform, and a Universal Robots UR5 6-DOF robotic arm attached to the prismatic joint. From this point forward, the UR5 arm's joints are denoted as *qai* with *i* = 1, 2, ... , 6 and the prismatic joint as *zpj*. The respective Denavit–Hartenberg (DH) parameters are shown in Table 1. The joint limits and joint velocity limits are shown in Table 2.

**Figure 4.** Nonholonomic mobile manipulator used for the experiments.


**Table 1.** D-H parameters.

**Table 2.** Joints physical constraints.


The Jacobian of the arm *Ja* used for calculation of Ωˆ *<sup>a</sup>* in (15) was constructed only using the UR5 arm without the prismatic joint. If the prismatic joint were included, the arm would be allowed to stretch for most tasks since it would always be able to move the end-effector vertically using the prismatic joint, i.e., the manipulability is not affected when the arm is horizontally stretched. Stretching the arm for most tasks is an undesirable behavior, and therefore, the manipulability maximization of the UR5 arm without the prismatic joint is a suitable selection. The Jacobian *Ja* is calculated with respect to the frame *XaYaZa* shown in Figure 4.

Due to the lack of a reliable positioning system, the odometry of the wheels was used in the experiments to compute the position and orientation of the mobile platform, and the forward kinematics were used to compute the end-effector's pose, which in turn was fed back to the motion planning algorithm. Therefore, the errors presented in the experiments are not the real-world errors, but the errors in the trajectory in which it is assumed that a reliable positioning system exists. Furthermore, as mentioned in the introduction section, the scope of this work is the motion planning of NMM for trajectory tracking at the kinematic level; therefore, the objective of the presented experiments is to validate the proposed algorithm at the kinematic level. Problems inherent to the dynamic behavior of the system, including vibrations of the mechanical structure, friction from the ground, etc., have an impact on the real end-effector tracking error, but they are out of scope of this paper and not considered in the algorithm. For these reasons, the simulation results of the position and orientation errors along the trajectory are also shown in this manuscript to highlight the performance of our motion planning algorithm.

An important remark is that the system vibrations due to the NMM mechanical structure greatly affected the performance of the motion planning algorithm when the system was close to its joint limits or self-collision constraints. This behavior was not observed when performing the simulations. To address this issue, a moving average filter with a window size of five was used to filter the gradients of the criterion functions for joint limits avoidance and self-collision avoidance. This filter diminished the impact of such vibrations on the motion planning algorithm.

#### *5.1. Orientation Error for 6-DOF Tasks*

In all the experiments, 6-DOF tasks were performed. For a 6-DOF task (*m* = 6), where the position and orientation of the end-effector are considered, the expression *rd* − *r* (mentioned in Section 3) has a specific definition that depends on the orientation representation, i.e., *r***˜** = *rd* − *r* does not hold for all orientation representations [1]. In this work, unit quaternions are used to describe the end-effector's orientation because of their efficiency and nonsingular representation for all orientations [1,19–21].

A unit quaternion *Q* = [*w* + *xi* + *yj* + *zk*] is represented in scalar-vector form by *<sup>Q</sup>* <sup>=</sup> {*s*, *<sup>v</sup>*} with *<sup>s</sup>* <sup>∈</sup> <sup>R</sup> and *<sup>v</sup>* <sup>∈</sup> <sup>R</sup>3, where *<sup>s</sup>* and *<sup>v</sup>* are called the scalar and vector elements of *Q*, respectively. The desired and current pose are defined using unit quaternions for orientation as *rd* = - *Pd Qd <sup>T</sup>* and *rc* <sup>=</sup> - *Pc Qc T* , where *Pd* = - *xd*, *yd*, *zd* and *Pc* = - *xc*, *yc*, *zc* are the desired position and current position, respectively, and *Qd* = {*sd*, *vd*} and *Qc* = {*sc*, *vc*} are the desired orientation and current orientation, respectively. The position error *eP* is defined as *eP* = *Pd* − *Pc*. The orientation error can be described in terms of the quaternion Δ*Q* = {Δ*s*, **Δ***v*}, where [1]:

$$\begin{aligned} \Delta s &= s\_{\mathfrak{c}} s\_{d} - \mathfrak{v}\_{d}^{T} \mathfrak{v}\_{\mathfrak{c}}, \\ \Delta \mathfrak{v} &= s\_{\mathfrak{c}} \mathfrak{v}\_{d} - s\_{d} \mathfrak{v}\_{\mathfrak{c}} - \mathfrak{v}\_{d} \times \mathfrak{v}\_{\mathfrak{c}}. \end{aligned} \tag{31}$$

If the desired orientation and current orientation are aligned, i.e., with zero orientation error, then Δ*Q* = {1, 0}. Consequently, it would be sufficient to define the orientation error to be **Δ***v*. It is also important to follow a convention for the sign of the quaternion because *Q* = {*s*, *v*} and −*Q* = {−*s*, −*v*} represent the same orientation. A common convention is to keep the scalar quaternion element positive. Take this into account and the orientation error is defined as follows:

$$\mathfrak{e}\_{\mathcal{O}} = \begin{cases} \Delta \mathfrak{v}\_{\prime} & \text{if } \Delta \mathfrak{s} \ge 0 \\ -\Delta \mathfrak{v}\_{\prime} & \text{if } \Delta \mathfrak{s} < 0. \end{cases} \tag{32}$$

Separating the position and orientation errors, the motion planning control law (6) is rewritten as

$$
\dot{\mathbf{r}} = \dot{\mathbf{r}}\_d + \begin{bmatrix} \mathbf{K}\_P & \mathbf{0} \\ \mathbf{0} & \mathbf{K}\_O \end{bmatrix} \begin{bmatrix} \boldsymbol{\varepsilon}\_P \\ \boldsymbol{\varepsilon}\_O \end{bmatrix} \tag{33}
$$

where *KP* and *KO* are positive definite diagonal 3 × 3 matrices. Note that *eO* in (32) is not an angle difference but its size represents the size of the orientation error that, as shown in [1,22], can achieve the convergence of the orientation error.

#### *5.2. Evaluated Self-Collision Pairs*

The types of self-collision can be significantly reduced by setting the joint limits properly. In addition, collisions among the first three links and the last three links can be taken care of by maximizing the arm manipulability, because the 6-DOF manipulator has poor manipulability if the arm is retracted to the point where the wrist is close to the base of the arm. Therefore, only the self-collision between the arm and the mobile platform needs consideration. Such a type of collision takes place when the elbow of the manipulator collides with the top of the platform, or when the wrist collides with the front of the platform. The distances associated with these potential self-collisions are depicted in Figure 5. Given that the platform is fixed with respect to the arm, setting frame *XpYpZp* (located on the center of the wheels' axle, as shown in Figure 4) as the reference frame allows to use (22), where *pl***<sup>1</sup>** is a point in the arm's elbow or wrist, correspondingly, and *pl***<sup>2</sup>** is a fixed point with respect to *XpYpZp*:

**Figure 5.** NMM's self-collision distances.

As illustrated in Figure 5, to prevent the elbow collision, a collision pair between the *z* component of *pelbow* with respect to frame *XpYpZp* and a point located at 0.5 (m) from the origin of frame *XpYpZp* in the *Zp* direction is selected. The distance for this pair is named *delbow*. To prevent wrist collision, a collision pair between the *x* component of *pwrist* with respect to frame *XpYpZp* and a point located at 0.37 (m) from the origin of frame *XpYpZp* in the *Xp* direction is selected. The distance for this pair is named *dwrist*. The wrist collision is only evaluated if the wrist height *hwrist*, the *z* component of *pwrist* with respect to frame *XpYpZp*, is lower than 0.5 (m), otherwise, its associated weighting collision matrix is assigned to identity. This was done to avoid restricting the movement of the joints that reduce *dwrist* when the wrist is above the top of the mobile platform. All these parameters were chosen based on the physical dimensions of the NMM, and the second point in each of the collision pairs was selected so that when *delbow* and *dwrist* are zero, a gap still exists between the potentially colliding links.

#### *5.3. Common Parameters of the Simulations and Experiments*

For all the experiments, the selected feedback gain matrices in (33) are *KP* = 10*I*3×<sup>3</sup> and *KO* = 20*I*3×3, where *I* is an identity matrix. The initial value of *α*(*t*) in (30) is set to *α<sup>s</sup>* = 3, and the blending time is set to *tb* = 0.2*tf* for the variation factor *β*(*t*) in (24). The parameters for self-collision avoidance in (20) are *<sup>ρ</sup>* = <sup>1</sup> × <sup>10</sup>−3, *<sup>c</sup>*<sup>1</sup> = 50 and *<sup>c</sup>*<sup>2</sup> = 1. The starting configuration of the robot arm is *qa* = - <sup>0</sup> <sup>−</sup>80 110 <sup>−</sup><sup>120</sup> <sup>−</sup>90 0 (◦). Furthermore, a sampling time *ts* = 0.02(*s*) was used for the simulations and experiments.

#### *5.4. Lissajous Trajectory*

The Lissajous trajectory was picked to demonstrate the ability of the proposed approach to track a trajectory for which the nonholonomic constraints greatly affect the movement of the system. The proposed Lissajous trajectory is defined by

$$\mathbf{r}\_d(t) = \begin{bmatrix} \mathbf{P}\_d(t) \\ \mathbf{Q}\_d(t) \end{bmatrix} = \begin{bmatrix} \mathbf{x}\_0 \\ \mathbf{y}\_0 \\ \mathbf{z}\_0 \\ \mathbf{Q}\_0 \end{bmatrix} + \begin{bmatrix} A\cos(s(t) + \pi/2) \\ B\cos(2(s(t) + \pi/2) + \pi/2) \\ C\cos(2s(t)) - C \\ 0 \end{bmatrix}'$$

where [*x*0, *y*0, *z*0] *<sup>T</sup>* is the end-effector's starting position, *s*(*t*) is the trajectory timing variable, and *A*, *B* and *C* define the size of the path. The orientation is set to be the same for the entire

trajectory, i.e., *Q*(*t*) = *Q*0. Notice that this is a 6-DOF trajectory because the orientation is constrained for the entire trajectory.

The starting configurations of the mobile platform and prismatic joint for this trajectory were set to *xp* = −0.1 (m), *yp* = −0.13 (m), *θ<sup>p</sup>* = −90(◦) and *zpj* = 0.2 (m). With this configuration, the end-effector's initial pose is given by *P***<sup>0</sup>** = - <sup>−</sup>0.009 <sup>−</sup>0.6491 0.9884 (m) and *Q*<sup>0</sup> = {0, 0*i* + 1*j* + 0*k*}. A trapezoidal velocity profile [1] was used for the derivative of the timing variable *s*˙(*t*). The parameters for the path size were set to *A* = 1.3 (m), *B* = 1.3 (m) and *C* = 0.27 (m), and an execution time *tf* = 64(*s*) was chosen. Figure 6a,b show snapshots of the NMM's movement along the Lissajous trajectory, in simulations and experiments, respectively.

Figure 7 compares the trajectory tracking results between the simulation and the experiment. In the simulation, the position and orientation errors are kept small during the whole trajectory, as shown in Figure 7b,c. This demonstrates the good tracking performance at the kinematic level of our proposed motion planning algorithm. The observed larger errors in the experiments, as exhibited in Figure 7e,f, are due to the vibrations of the system during the experiments. Furthermore, as mentioned before, the control of the dynamic behavior of the system is beyond the scope of this work. Nevertheless, the position errors in the experiments are kept below 2 × <sup>10</sup>−<sup>3</sup> (m) and the orientation errors below 1.5 × <sup>10</sup><sup>−</sup>3. Likewise, we observe that the obtained trajectories in the simulation (Figure 7a) and the experiment (Figure 7d) are fairly similar.

Figure 8 illustrates the experiment results. Note that the negative joint velocity limit <sup>−</sup>*u*<sup>+</sup> *<sup>i</sup>* is denoted as *u*<sup>−</sup> *<sup>i</sup>* in the pertinent figures. The manipulabilities of both the arm and the complete system are kept high during the execution of the trajectory, and their final values are higher than at the start of the trajectory, as shown in Figure 8b. It is important to remark that there are no potential self-collisions during the execution of this trajectory. Even though *dwrist* is negative at the beginning of the trajectory, as shown in Figure 8c, the wrist is above the top of the mobile platform and therefore the joint movements were not restricted, as explained in Section 5.2.

(**b**)

**Figure 6.** Snapshots of the NMM's motion for the Lissajous trajectory tracking: (**a**) simulations; and (**b**) experiments.

**Figure 7.** Tracking performance comparison between simulations and experiments for the Lissajous trajectory. End-effector's trajectory in simulations (**a**) and experiments (**d**). Position errors in simulations (**b**) and experiments (**e**). Orientation errors in simulations (**c**) and experiments (**f**). Notice that the position and orientation errors, panels (**b**,**c**), respectively, are small in the simulations. In addition, notice that the trajectories in simulations (**a**) and experiments (**b**) are quite similar. The reason for the larger errors obtained in the experiments (panel (**e**,**f**)) are discussed in Section 5.4.

Figure 8d–i demonstrate the fulfillment of the joint limits and joint velocity boundary constraints. All the joints are kept within their corresponding limits. Notice that the movement of *qa*1, Figure 8f, is restricted in the time interval *t* = (29, 32)(*s*) to respect its lower limit. This time interval corresponds to the trajectory section between snapshots number three and four in both Figure 6a,b. For this section of the trajectory, the movement of the end-effector in the XY plane is taken care of by the platform due to the restriction imposed to *qa*1. We observe in Figure 8e that the velocity limits of the prismatic joint are respected. The remaining joints do not reach their respective limits as shown in Figure 8f–h. Furthermore, as seen in Figure 8d,e,i, the velocity profiles for all the joints are smooth and satisfy the boundary constraints, i.e., the initial and final joint velocities are equal to zero.

To demonstrate the advantages of the proposed manipulability measure Ω*MM*, a comparison of the manipulability maximization results using different objective functions in simulations of the Lissajous trajectory tracking is shown in Figure 9. Here, a task is considered as failed when none of the constraints are satisfied, and thus the simulation is stopped. These results were obtained by using the same parameters shown in Section 5.3 with the only change being the objective function.

The Lissajous trajectory tracking fails when the objective function is set to maximize the manipulability of the arm, i.e., *F*(*q*) = Ωˆ *<sup>a</sup>*, as shown in Figure 9a. The manipulabilities of both the arm and the whole system start a fast decay after *t* = 44*s*. This decrement in the manipulabilities is the result of restricting the homogeneous solution to respect the joint velocity limits. Consequently, the arm manipulability is not maximized anymore and the system moves towards singularity, ultimately failing the task. The results of setting the objective function to the manipulability of the whole system, i.e., *F*(*q*) = Ωˆ *<sup>p</sup>*+*a*, are shown in Figure 9b. In this case, the task succeeds, however, the manipulability of the arm

deteriorates, and at the end of the trajectory has a value close to zero. This is the behavior discussed in previous studies [6,10].

**Figure 8.** Experiment results for the Lissajous trajectory tracking: (**a**) mobile platform's trajectory; (**b**) normalized manipulability measures; (**c**) self-collision distances; (**d**) mobile platform's velocity commands; (**e**) prismatic joint's position and velocity; (**f**–**h**) arm's joint positions; and (**i**) arm's joint velocities. Notice that all the joint limits and velocity limits are respected (panels (**d**–**i**)). Furthermore, the manipulabilities of both the arm and the whole system are improved (panel (**b**)). See Section 5.4 for a detailed discussion of this figure.

Figure 9c shows the results of maximizing a linear combination of the manipulabilities of the arm and the whole system, i.e., *F*(*q*) = 0.5Ωˆ *<sup>p</sup>*+*<sup>a</sup>* + 0.5Ωˆ *<sup>a</sup>*. Figure 9d shows the results of maximizing the proposed manipulability measure for mobile manipulators, i.e., *F*(*q*) = Ω*MM* from Equation (15). Both objective functions preserve the manipulability of the arm. However, it is clear that maximizing the proposed measure has better results overall. First, the manipulability of the arm is higher along the trajectory in Figure 9d compared to the arm manipulability obtained with the linear combination in Figure 9c. Secondly, the obtained final manipulabilities have improved with respect to the initial ones in Figure 9d. On the other hand, the linear combination improves the manipulability of the whole system, but not that of the arm with respect to the values at the start of the trajectory, as shown in Figure 9c.

**Figure 9.** Lissajous trajectory's comparison of objective functions for manipulability maximization in simulations: (**a**) maximization of the manipulability of the arm; (**b**) maximization of the manipulability of the whole system; (**c**) maximization of a linear combination of the manipulability of the arm and the whole system; and (**d**) maximization of the proposed mobile manipulator manipulability measure. This figure demonstrates the advantages of the manipulability measure for mobile manipulators (panel (**d**)) presented in Section 4.1. See Section 5.4 for a detailed discussion of this figure.

#### *5.5. Elliptic Trajectory*

The elliptic trajectory was picked to demonstrate the ability of the proposed approach to comply with joint velocity limits and to prevent self-collisions. This trajectory consists of moving the end-effector from an initial pose *r***<sup>0</sup>** = - *P***<sup>0</sup>** *Q*<sup>0</sup> *<sup>T</sup>* to a desired final pose *rd* = - *Pd Qd <sup>T</sup>* using an elliptic path. The trajectory position is defined by

$$P\_d(t) = \begin{bmatrix} A\cos(s(t)) + c\_x\\ B\sin(s(t)) + c\_y\\ m(s(t) - s\_0) + z\_0 \end{bmatrix} /$$

where *A*, *B*, *cx*, *cy* and *s*<sup>0</sup> are calculated using the *XY* coordinates of *P***<sup>0</sup>** and *Pd*, such that the elliptic path is centered at the closest point to the origin, while it covers a 90◦ angle between the start and end points. The trajectory's *Z* coordinate follows a straight line between the points (*z*0,*s*0) and (*zd*,*sd*), where *z*<sup>0</sup> and *zd* are the *Z* coordinates of *P***<sup>0</sup>** and *Pd*, respectively; *s*<sup>0</sup> and *sd* are the starting and ending angles of the elliptic path, respectively; and *m* in the equation above is the slope of this line. A fifth-order polynomial profile [1] was used for the timing variable *s*(*t*). The orientations and rotational velocities along the trajectory were computed using quaternion polynomials [23]. This technique has two main advantages: a smooth velocity profile is obtained, and the rotational velocities and accelerations in the task space are included as boundary constraints.

The starting configuration of the mobile platform and prismatic joint for this trajectory was set to *xp* = −1.3 (m), *yp* = 0.56 (m), *θ<sup>p</sup>* = 0(◦), and *zpj* = 0.24 (m). With this configuration, the end-effector's initial pose is given by *P***<sup>0</sup>** = - −0.781 0.669 1.028 (m) and *<sup>Q</sup>*<sup>0</sup> <sup>=</sup> {0, 0.7071*<sup>i</sup>* <sup>−</sup> 0.7071*<sup>j</sup>* <sup>+</sup> <sup>0</sup>*k*}. The final pose was selected to have position *Pd* <sup>=</sup> - 1.55 <sup>−</sup>1.0 0.26 (m) and orientation *Qd* = {0.2706, 0.6533*i* + 0.6533*j* − 0.2706*k*}, and an execution time *tf* = 20(*s*) was chosen. Figure 10a,b show snapshots of the NMM's movement along the elliptic trajectory, in simulations and experiments, respectively.

Figure 11 compares the trajectory tracking results between the simulation and the experiment. The small position and orientation errors obtained in the simulation (Figure 11b,c) again demonstrate the good tracking performance of the proposed approach. In the experiment, the position errors are kept below 1.5 × <sup>10</sup>−<sup>3</sup> (m) and the orientation errors are kept below 1 × <sup>10</sup>−<sup>3</sup> as depicted in Figure 11e,f. Once again, the vibrations of the system played a role in the larger errors seen in the experiment.

(**b**)

**Figure 10.** Snapshots of the NMM's motion for the elliptic trajectory tracking: (**a**) simulations; and (**b**) experiments.

**Figure 11.** Tracking performance comparison between simulations and experiments for elliptic trajectory. End-effector's trajectory in simulations (**a**) and experiments (**d**). Position errors in simulations (**b**) and experiments (**e**). Orientation errors in simulations (**c**) and experiments (**f**). Notice that the position and orientation errors, panels (**b**,**c**), respectively, are small in the simulations. In addition, notice that the trajectories in simulations (**a**) and experiments (**b**) are quite similar. The reason for the larger errors obtained in the experiments (panel (**e**,**f**)) are discussed in Section 5.5.

Figure 12 illustrates the experiment results. The manipulabilities of both the arm and the complete system are again improved at the end of the trajectory compared to the initial configuration, as shown in Figure 12b. However, we notice how both manipulabilities decreased during the time interval *t* = (8, 15)(*s*). This behavior is due to the value of *α*(*t*) (manipulability maximization step size) being negative for this span of time in order to respect the joint velocity limits, as discussed in Section 4.3.2. Notice that *vp* (Figure 12d) and *z*˙ *pj* (Figure 12e) are kept at their maximum values during this span of time. It would not be possible to track this particular trajectory while respecting the joint velocity limits without stretching the arm; hence, the manipulabilities of the complete system and the arm decreased.

We observe in Figure 12c how the self-collision distance of the elbow gets close to zero. In this trajectory, the elbow needs to get close to the platform in order for the end-effector to track the desired trajectory. The motion planning algorithm gradually stopped the elbow to prevent the collision with the platform, as shown by the slow decay of *delbow*. To achieve this, the movement of the prismatic joint was restricted, as shown in Figure 12e. The wrist also moves towards the front of the platform, as shown by the decrement in *dwrist* in Figure 12c. This potential self-collision is also prevented by limiting the movement of *qa*<sup>2</sup> and *qa*3, as shown in Figure 12f,g.

Figure 12d–i show that the joint limits, joint velocity limits and joint velocity boundary constraints are also satisfied. All the joints were kept within their respective limits, as shown in Figure 12e–h. As depicted in Figure 12d,e, the platform's linear velocity and the prismatic joint's velocity are kept within their limits. Once more, the velocity profiles are smooth and the initial and final velocities for all the joints are zero.

**Figure 12.** Experiment results for the elliptic trajectory tracking: (**a**) mobile platform's trajectory; (**b**) normalized manipulability measures; (**c**) self-collision distance; (**d**) mobile platform's velocity commands; (**e**) prismatic joint's position and velocity; (**f**–**h**) arm's joint positions; (**i**) arm's joint velocities. Notice that all the joint limits and velocity limits are respected (panels (**d**–**i**)). The potential self-collisions described in Section 5.2 are prevented (panel (**c**)). Furthermore, the manipulabilities of both the arm and the whole system are improved (panel (**b**)). See Section 5.5 for a detailed discussion of this figure.

#### **6. Discussion**

A scheme was proposed to solve the motion planning of NMMs considering joint limits, joint velocity limits, joint velocity boundary constraints, and self-collision avoidance while maximizing the manipulabilities of both the robot arm and the whole system.

The proposed solution uses a weighted input velocity vector and a weighted Jacobian to penalize the movement of joints that get close to a position constraint. A proposed quadratic cost function is minimized when solving the motion planning problem for redundant NMMs. This cost function includes a secondary task to be satisfied that is also weighted to comply with the position constraints. The maximization of an introduced manipulability measure for mobile manipulators is used as the secondary task to push the system away from singularities. In the experiments section, it is demonstrated that maximizing this new measure simultaneously improves the manipulability of the whole system and that of the manipulator alone.

This work focuses on the motion planning for trajectory tracking at the kinematic level, which must not only comply with joint position constraints, but must also respect joint velocity constraints and joint velocity boundary constraints. Joint velocity boundary constraints are satisfied by varying the magnitude of the homogeneous solution at the start and end of the trajectory. The manipulability maximization at these points is not necessarily zero; hence, using an adequate variation in the magnitude of self-motion is needed. Joint velocity limits are satisfied by evaluating the maximum allowable self-motion for each joint. Using this information, the step size of the gradient ascent/descent is limited when required, and consequently, the joint velocity limits are not exceeded.

The experiments for 6-DOF trajectories were conducted to verify the efficacy of the proposed scheme. The results demonstrate that the proposed approach can solve the motion planning problem for NMMs to perform trajectory tracking at the kinematic level while considering the constraints required for real implementation including manipulation capability preservation or improvement.

The experiments designed in Section 5 consider an open environment without obstacles, because this is the scope of our manuscript. However, the proposed solution can be extended to prevent collisions with obstacles by including collision pairs between the robot arm and these obstacles, using the same definitions as in Section 4.2.2. This will penalize the movement of the arm's joints that get closer to an obstacle in the environment. Nevertheless, in the case of the platform, stopping it is not an efficient approach. In this case, an additional task can be added to the solution to push the platform away from the obstacles. One way to achieve this is by using a task priority scheme [24] using the nullspace of the motion planning algorithm.

Even though our work focuses on NMMs, our approach can be effortlessly adapted for use with holonomic mobile manipulators. Future work will focus on dynamic modeling and controller design to deal with system vibrations and tire slip such that the tracking errors of the system can be reduced.

**Author Contributions:** Conceptualization, methodology, J.L. and T.H.; software, validation, visualization, writing—original draft preparation, J.L.; supervision, project administration, funding acquisition, resources, writing—review and editing, T.H. Both authors have read and agreed to the published version of the manuscript.

**Funding:** This work was funded by the Ministry of Science and Technology, Taiwan (Project code MOST 107-2923-E-009-004-MY3).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** The authors wish to thank the Mechanical and Mechatronics Systems Research Laboratories from the Industrial Technology Research Institute (ITRI), Taiwan, for the support and resources provided for this work.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **References**


## *Article* **A Robot-Assisted Large-Scale Inspection of Wind Turbine Blades in Manufacturing Using an Autonomous Mobile Manipulator**

**Heiko Engemann 1,2,\*, Patrick Cönen 1, Harshal Dawar 1, Shengzhi Du 2,\* and Stephan Kallweit <sup>1</sup>**


**Abstract:** Wind energy represents the dominant share of renewable energies. The rotor blades of a wind turbine are typically made from composite material, which withstands high forces during rotation. The huge dimensions of the rotor blades complicate the inspection processes in manufacturing. The automation of inspection processes has a great potential to increase the overall productivity and to create a consistent reliable database for each individual rotor blade. The focus of this paper is set on the process of rotor blade inspection automation by utilizing an autonomous mobile manipulator. The main innovations include a novel path planning strategy for zone-based navigation, which enables an intuitive right-hand or left-hand driving behavior in a shared human–robot workspace. In addition, we introduce a new method for surface orthogonal motion planning in connection with large-scale structures. An overall execution strategy controls the navigation and manipulation processes of the long-running inspection task. The implemented concepts are evaluated in simulation and applied in a real-use case including the tip of a rotor blade form.

**Keywords:** mobile manipulation; large-scale inspection; wind turbine production; autonomous navigation; surface-orthogonal path planning; intelligent robot; flexible production

#### **1. Introduction**

Wind energy has gradually taken the dominant share of renewable energies. The worldwide capacity increased from 7600 MW in 1998 to 93 GW in 2020 [1]. The main components of a wind turbine are: a rotor equipped with wing-shaped blades, a nacelle that houses a drive train, and a tower [2]. The rotor blades transform the wind energy into rotary energy and must be lightweight, robust, and long-lasting. Therefore, they are made from composite materials, such as glass or carbon fiber material with a resin-like epoxy. During rotation, extremely high forces affect the blades. Therefore, it is important to avoid imperfections during the manufacturing process.

In the production of glass fiber reinforced structural components, the fiber structure is fixed by enclosing laid semi-finished glass fibers with a resin matrix [3]. Imperfections in the alignment of the structure or during the fiber reinforcement change the structural properties and thus reduce the quality of the composite material. Currently, such imperfections are detected with help of ultrasonic [4,5], thermal [6,7], or radar [8,9] techniques, whereby a differentiation has to be made in pre- and post-resin-injection inspections. If an imperfection is detected after the resin injection, no corrections are possible, and the component is, therefore, a reject. In research, radar imaging has gained a lot of attention for inspection tasks of fiber composite material. In contrast to other methods based on x -rays, thermal imaging, or ultrasonic imaging, radar imaging is non-invasive and provides a high resolution combined with a high penetration depth [10]. Millimeter wave radar scans can

**Citation:** Engemann, H.; Cönen, P.; Dawar, H.; Du, S.; Kallweit, S. A Robot-Assisted Large-Scale Inspection of Wind Turbine Blades in Manufacturing Using an Autonomous Mobile Manipulator. *Appl. Sci.* **2021**, *11*, 9271. https:// doi.org/10.3390/app11199271

Academic Editors: António Paulo Moreira, Pedro Neto and Félix Vidal

Received: 14 September 2021 Accepted: 29 September 2021 Published: 6 October 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

be used to generate a detailed layer-by-layer visualization of a rotor blade [9]. Therefore, it is possible to locate imperfections in 3D. However, the underlying algorithms for the 3D reconstruction require a surface orthogonal sensor alignment and a low measurement uncertainty of the sensor pose during the scanning process. These requirements exclude a manual execution with handheld devices.

Figure 1 shows a schematic of the manufacturing of a rotor blade, including the two main components: aeroshells and shear webs. The typical diameter of a wind turbine rotor, including the blades, has increased from 54 m (2005) to 158 m (2017) for onshore and from 76 m (2005) to 164 m (2015) for offshore installations [11]. The huge dimensions of the rotor blades complicate the inspection processes, which is why they are mostly performed manually by human workers. The automation of the inspection processes has great potential to increase the overall productivity and to create a consistent reliable database for each individual rotor blade. Such an automation approach must be largescaled and flexible to cover the high variety of rotor blade dimensions.

**Figure 1.** Schematic of the manufacturing of a wind turbine blade [12].

An *autonomous industrial mobile manipulator* (AIMM) [13] combines the flexibility of an industrial robot arm (manipulator) with the mobility of an autonomous mobile robot (AMR). Equipped with various sensors, AIMMs are capable of autonomous navigation, even in dynamic and large-scale production environments. Furthermore, the perceived data can be used to realize a shared human–robot workspace, which is of particular importance for manual labor-dominated production types. One of the first AIMMs was introduced in 1984: MORO [14]. MORO was capable of navigating on the shopfloor and executing pick and place tasks. Based on this pioneering work, many further developments were carried out, focusing on different industrial applications: part feeding [15–17], transportation and assembly [18–20], as well as large-space manufacturing [21–23]. In recent years, inspection robots have become more present in different domains, such as the oil and gas industry [24,25], the power industry [26,27], and civil infrastructure [28]. Therefore, the idea to utilize an AIMM for inspection tasks is obvious. In contrast to recent developments in the area of inspection robots, our approach focuses on the motion planning addressing the surface orthogonal sensor alignment and the special requirements of a shared human–robot workspace in an industrial context.

Therefore, we make use of the AIMM OMNIVIL [29]. OMNIVIL consists of a self-built mobile platform. The mobile platform was designed to address the needs in a dynamic production environment, such as positioning, accuracy, and maneuverability. Therefore, the platform is based on four Mecanum wheels [30], whereby a pivoting axle guarantees continuous ground contact. The collaborative manipulator UR5 is mounted on top of the mobile platform. The sensor concept includes various sensor types to perceive the environment and the internal state of the robot. The workspace monitoring concept and the localization and positioning capabilities of the mobile platform were evaluated in

various experiments ranging from static to highly dynamic scenarios [29]. The collaborative manipulator is equipped with an RGB-D camera (Intel-RealSense 435) and a radar module that works at 80 GHz with a 24 GHz bandwidth. Figure 2 shows the AIMM OMNIVIL.

**Figure 2.** The AIMM OMNIVIL.

In this paper, we present an automated inspection process for wind turbine rotor blade manufacturing executed by the AIMM OMNIVIL. The focus is set on the robot control setup rather than the composite material analysis. Therefore, this work addresses motion planning, environmental perception, and the human–robot interaction. The following strategies are implemented towards a fully autonomous inspection of a rotor blade: (1) a novel path planning strategy for a zone-based navigation concept that enables an intuitive right or left driving behavior of the mobile platform without limiting the planning based navigation; (2) a novel method for surface orthogonal motion planning in connection with large scale structures; (3) therefore, the large-scale structures are divided into feasible subparts based on a workspace analysis, including the reachability and manipulability of the manipulator; (4) an overall execution strategy that controls the corresponding navigation and manipulation processes of the long-running inspection task.

The rest of the paper is structured as follows. Section 2 presents the control system, including the zone-based segmentation of the production environment, the path planning strategy, and the surface orthogonal motion planning. In Section 3, two experiments are carried out to validate the path planning strategy and the overall execution of the rotor blade inspection. Section 4 concludes the paper.

#### **2. Robot Control System**

#### *2.1. Zone-Based Segmentation of the Production Environment*

The general concept of zone-based navigation is inspired by zone management within an industrial production environment. The approach is based on virtual navigation zones [29]. Instead of using infrastructural markers, the zones can be defined in a given map by using 2D polygons. For each zone, a predefined robot behavior can be set with different parameter settings for maximum velocity, maximum acceleration, goal tolerance, warning indicators (visual and acoustic), or even different kinematic models reaching from the differential to holonomic models.

For instance, Figure 3a shows a zone-based segmentation of a 60 × 60 m production environment. The coloring represents the different cost levels of the zones ranging from green (minimum cost value) to red (maximum cost value). In Figure 3b, the zones are colored with an individual color for each zone. Figure 3c shows the corresponding layered costmap [31] configuration in hierarchical order. The ordering of the layers allows modulating the costs by overwriting them only when and where required.

**Figure 3.** Zone-based segmentation of a production environment. (**a**) Zone-based segmentation of a 60 × 60 m environment colored from green (minimum cost value) to red (maximum cost value); (**b**) same zone setup, but with individual zone colors of layer hierarchy; (**c**) hierarchical ordered costmap layers.

The standard configuration of a layered costmap configuration includes a static, an obstacle, and an obstacle inflation layer. This default configuration does not fulfil the requirements of a dynamic production environment, including different manufacturing and transportation zones as well as a shared human–robot workspace. The implemented approach is based on the concept of layered costmaps described in [32], whereby each layer tracks the data to a specific functionality. With the cost value *Ω<sup>i</sup> xy* = {0, 1, . . . , 254, 255} for a cell *c<sup>i</sup> xy* at position *P<sup>c</sup> <sup>j</sup>* <sup>=</sup> *xj*, *yj* , the layer *i* contributes to the Master costmap as follows:

The Static layer contributes to the master costmap by analyzing an a priori created 2D occupancy grid map *G*. The layer provides information about free and occupied spaces in the production environment by Equation (1):

$$
\Omega^1\_{\text{xy}} = \begin{cases} 0, & \text{G}(\mathbf{x}, \mathbf{y}) == \text{Free} \\ & \text{254}, \\ & \text{255}, \quad \text{G}(\mathbf{x}, \mathbf{y}) == \text{Occupied} \\ & \text{G}(\mathbf{x}, \mathbf{y}) == \text{NoInformation} \end{cases} \tag{1}
$$

with *Free*, *Occupied*, and *NoInformation* reflecting the related values of the occupancy grid map implementation.

The Zone layers, Corridor (2), Restricted (3), Station (4), and Prohibition (5), contribute to the zone-based navigation layout of the production environment. The dimensions of the zones are provided in form of a polygon list. The corridor zone should be the preferred zone for the implemented path planner; therefore, the cost value of the corresponding cells is set to a small value of *Ω*<sup>2</sup> *xy* = 10. The robot is allowed to enter restriction zones, but only if necessary. Therefore, the cost value for the corresponding cells is set to *Ω*<sup>3</sup> *xy* = 100. The same condition applies to the station zones. Since a motion of the robot through a restriction zone is from higher preference than a motion through a station zone, the cost value of the corresponding cells is set to *Ω*<sup>4</sup> *xy* = 120. The robot is not allowed to enter a prohibition zone; therefore, the cost value of the corresponding cells is set to *Ω*<sup>5</sup> *xy* = 250.

The Guard Rail layer is based on the inflation layer described in [32]. Originally, the inflation layer was designed to create a buffer zone around lethal obstacles to avoid the robot coming too close to the obstacles. We applied that method to define a buffer zone at the borders of the corridor zones. The robot uses these guard rail zones as a guide for navigating inside the corridor zones. The robot will navigate with a high preference alongside the edge between the guard rail zone and the corridor zone. The developed method is further explained in Section 2.2. The inflation radius is adjustable and defines the width of the guard rail zone. The corresponding cells are set to the cost value *Ω*<sup>6</sup> *xy* = 12.

The Human layer contributes the information of a multilayer and redundant workspace monitoring system (WMS) described in [29]. The WMS uses RGB and thermal images as well as Lidar data. The multilayer sensor setup is improved by the implementation of redundant algorithms for human co-worker detection based on neural networks. The fused confidence intervals between 0 and 1 are provided as a 2D heatmap in form of an occupancy grid map. The corresponding cells in the costmap *M* are called human cells and assigned with the cost value *Ω*<sup>7</sup> *xy* = 200. A threshold of 0.5 is used for the confidence level to neglect low confidence detections. An area with an adjustable radius around each human cell is defined as a human safety zone. The Human layer calculates the Euclidean distance between the current robot position and each human cell to determine if the robot is inside a human safety zone. Following the concept of the navigation zones, the robot will preventively change its motion behavior if it is inside a human safety zone. In addition, the human cells are inflated to create a buffer zone around the human co-workers. The radius of these buffer zones is smaller compared to the human safety zones. The cost value of the corresponding cells is set to *Ω*<sup>7</sup> *xy* = 200. Theoretically, the robot is allowed to plan close to a human, but it is highly cost inefficient. Figure 4 shows an exemplary scenario of an occupancy grid map provided by the WMS and the resulting costmap, including the buffer zones in blue and one visualized human safety zone in red.

**Figure 4.** Human layer. (**a**) Occupancy grid map provided by the WMS; (**b**) costmap *M* with buffer zones around the human cells in blue and one exemplary visualized human safety zone (red circle); (**c**) costmap and occupancy grid map overlayed.

The Obstacle layer tracks the data from the 2D Lidar sensors. The Lidar data is provided in form of an array *S*, including the sensor readings. The sensor readings are converted into the costmap space to determine the corresponding cells. Analog to the Static layer, the obstacle layer contributes to the master layer by following Equation (2).

$$
\Omega\_{\rm xy}^{8} = \left\{ \begin{array}{c} 0, & M\_{\rm Lidar}(\mathbf{x}, \mathbf{y}) == \text{Free} \\ 254, & M\_{\rm Lidar}(\mathbf{x}, \mathbf{y}) == \text{Occupied} \\ 255, & M\_{\rm Lidar}(\mathbf{x}, \mathbf{y}) == \text{NoInformation} \end{array} \right. \tag{2}
$$

where *MLidar* is the sensor readings in the costmap space. The Obstacle Inflation layer adds an adjustable buffer zone around the obstacle. Therefore, the distance between the obstacle and the planned path is increased.

#### *2.2. Cost Adaption Based on Search Expansion Direction*

Common industrial navigation concepts are based on line following strategies [33] using passive [34] or active [35] landmark detection. These methods are approved in industrial environments, but not as flexible as desired. The well-known planning algorithms A\* [36] or Dijkstra [37] provide the most cost-efficient path available from a start position to a goal position, given a costmap. The presented navigation concept segments the production environment in zones of different cost levels. The resulting costmap is provided

to the path planner, resulting in a high preference for the corridor zone, without limiting the robot in the manner of a line follower. However, a line follower behavior is socially desirable for the corridor zone. This is particularly true for a shared human–robot production environment. In [38], the cell costs were slightly reduced for cells on the right side of a corridor. As a result, the pass speed of the human and the robot was significantly increased which indicates an optimized human–robot collaboration. The social behavior of right or left driving is widely used in public and industrial environments. Such a predictable and intuitive behavior of the robot is a key feature for beneficial human–robot cooperation.

Instead of manipulating the costs before the actual planning process, our approach is applied while planning and takes the expansion direction of the planning algorithm into concern. As a result, the desired left or right driving behavior can be applied to any zone and in any direction. The aim is that the global planner prefers a path right next to the previously mentioned guard rail zone. This behavior is comparable to a line follower, without limiting the robot to a particular line motion. Figure 5 shows a horizontal and a vertical example scenario of a planning procedure in a grid-based costmap.

**Figure 5.** Exemplary planning scenario. (**a**) Horizontal; (**b**) vertical.

The start cell for the planning procedure is set to the middle of the corridor zone. During the search for a valid path, a check is performed for each expanded cell of the corridor zone; if the cell is a neighbor of the guard rail zone, the cell is called a cell of interest. Each cell of interest can be either declared as a left-hand or right-hand driving cell. The cell type is determined by taking the expansion direction of the search algorithm into concern. The expansion direction can only be determined if a neighbored cell of interest exists. In Figure 5a, the cells at positions (7,4) and (9,8) are right-hand driving cells. The expansion direction is visualized with a red arrow. The cells at positions (9,4) and (7,8) are left-hand driving cells. The cells at (8,4) and (8,8) do not feature a specific type, since they had no neighbored cell of interest when they were expanded. In Figure 5b, the cells at (5,7) and (11,5) are right-hand driving cells. The cells at (5,5) and (11,7) are left-hand driving cells, and the cells at (5,6) and (11,6) of no type.

During search expansion the path potential is stored in form of a grid-based potential map *Φ*. The potential map *Φ* holds the information for each expanded cell of how much it costs to reach that cell from the start cell. Cells that are not expanded yet, are set to a clearly identifiable default value *ϕ*. Algorithm 1 shows our approach applied to realize a right-hand driving behavior.



In this example, the right-hand driving behavior is applied to the corridor zone. The function *calculateCostsToReachCell* symbolizes the standard calculation of an expanded cell in the potential map, as shown in Equation (3):

$$potential = \Phi(c\_{prev}.x.c\_{prev}.y) + M(c\_{exp}.x.c\_{exp}.y) + \Omega\_{mv\prime} \tag{3}$$

where *cexp* is the expanded cell, *cprev* is the previous cell, and *Ωmv* is the cost value to move from one cell to a neighboring cell. The function is triggered in case the expanded cell is not part of the corridor zone or the expanded cell is not determined as a right-hand driving cell. Therefore, in these cases, the default behaviors of the global planners are not changed.

If the expanded cell is part of a corridor zone and neighbors a guard rail zone, the cell is of interest and further analyzed. Four cases are differentiated:


Depending on the relative position of the expanded cell in relation to the guard rail zone and the already expanded cells stored in the potential *P*, the cell is defined as a right-hand driving cell. In the case of a right-hand driving cell, the calculation of the cell potential is changed from Equation (3). The applied formula is shown in Equation (4):

$$potential = \Phi(x\_i, y\_i) + M\_{\text{Corridor}} \,/\, \text{4} \tag{4}$$

The coordinates *xi*, *yi*, with *i* = {1, 2, 3, 4}, reflect the above mentioned four cases. *MCorridor* is the cost value of a cell inside the corridor zone.

The presented approach can be applied to any zone in any direction, as long as a particular cost gradient is present between the zone and the guard rail zone. Figure 6 shows an example scenario for a large-scale planning scenario in a simulated production environment. The red dot represents the start position and the green dot the goal position. The costmap is divided into navigation zones according to Section 2.1. Figure 6a shows the default behavior of the implemented A\* planner. As desired, the resulting path is mostly located inside the corridor zone. Figure 6b shows the resulting right driving behavior by applying our approach.

<sup>14:</sup> **end if**

<sup>15:</sup> **end function**

#### *2.3. Large-Scale Surface Orthogonal Motion Planning*

The addressed inspection task requires a high positioning accuracy of the radar module (<200 μm) in reference to the workpiece frame for each scan process. Therefore, the large-scale inspection task is executed in asynchronous mobile manipulation mode. The workpiece is divided into smaller subsegments, which can be inspected standalone with the motion capabilities of the manipulator. At each subsegment, the following process steps are performed:


The approached positions of the mobile platform are determined by analyzing the reachability of the manipulator in its workspace and segmenting the workpiece accordingly.

#### 2.3.1. Surface Reconstruction

To plan a path on a surface, the geometrical shape of the surface must be known. Especially for inspection tasks, it cannot be assumed that the actual state of the workpiece still corresponds to its computer-aided design (CAD) data. In our implementation, the surface is reconstructed from a point cloud using the Point Cloud Library [39] (PCL). The point cloud is provided by the RGB-D camera at the end effector of the manipulator. In the first step, the point cloud is down-sampled, and then the surface is reconstructed by applying B-spline fitting [40]. Figure 7 shows the reconstructed surface of an exemplary rotor blade segment.

**Figure 7.** Surface reconstruction. (**a**) Dense point cloud of a rotor blade segment; (**b**) point cloud after down-sampling; (**c**) reconstructed surface (front view); (**d**) reconstructed surface (top view).

#### 2.3.2. Waypoint Generation

The surface orthogonal motion planning is inspired by the approach presented in [41], which addresses an inspection task of a landscape with an unmanned aerial vehicle (UAV). For this purpose, the landscape is rasterized into a grid. The grid resolution depends on the technical parameters of the sensor, the desired image resolution, and the desired overlaps of the individual images. The centers of the grid cells are shifted along the normal of the ground surface. This approach was adapted for the creation of 6D waypoints, representing surface orthogonal 6D poses of the radar sensor. Figure 8 shows the three steps of the implemented waypoint generation method, applied to the reconstructed surface of Figure 7. Figure 8a shows the centers of the grid cells. These are projected onto the reconstructed surface (see Figure 8b). The projected centers are shifted along the surface normal to the desired distance from the surface. The z-axis of the 6D waypoints are aligned with the negative surface normals (see Figure 8c).

**Figure 8.** Grid-based waypoint generation. (**a**) Centers of the grid cells; (**b**) projection of grid onto the reconstructed surface; (**c**) 6D waypoints.

#### 2.3.3. Path Planning

The 6D waypoints can be considered as nodes in a complete graph. A Hamiltonian cycle [42] in this graph visits all waypoints. The search for the shortest Hamiltonian cycle in a complete and weighted graph is called the traveling salesman problem (TSP). This problem is NP-complete, and there is no known efficient algorithm to solve it [43]. One approximation approach is the algorithm according to Christofides [44]. The implemented approach is shown in Algorithm 2.

**Algorithm 2.** Factor 3/2 approximation of the travelling salesman problem


The path length is limited to a maximum of one and a half times the optimal solution. The algorithm first calculates a minimum spanning tree [43] (MST). A minimum perfect match is formed between the nodes of the MST with an odd degree. This is possible because there is always an even number of nodes with an odd degree. In the path planning implementation, the Blossom-V implementation from [45] is used for this purpose. The edges of the matching are added to the MST, such that the degree of all nodes is even. Thus, a Euler cycle can be formed in the graph. By truncation, i.e., deleting multiple visited nodes, the approximate solution of the TSP is formed.

#### 2.3.4. Positioning of the Mobile Platform

The mobile platform must be positioned in such a way that the surface orthogonal planned path of the end effector lies in a workspace region with high reachability. The method used to determine the characteristics of the manipulator's workspace is described in [29]. The size of the chosen suitable area of the manipulator workspace directly affects the segmentation of the workpiece in local subsegments.

The workpiece segmentation consists of the following steps:


**Figure 9.** Process of workpiece segmentation. (**a**) Scanning grid; (**b**) projection onto surface; (**c**) reflection alongside surface normal; (**d**) clustering into local scan areas.

The reflected points represent positions of the manipulator's end-effector. The aim of the workpiece segmentation is to find a set of neighbored projected points, whereby the corresponding reflected points are inside the defined workspace of the manipulator. Therefore, a list of position tuples Λ*PR* is created, whereby each tuple consists of a projected point and the corresponding reflected point. By continuously expanding in the domain of the projected points and checking the reachability of the corresponding reflected points, the tuples of list Λ*PR* are clustered into suitable local scan areas. The implemented method is described in detail in Algorithm 3.

**Algorithm 3.** Segmentation of the workpiece into subsegments


The mobile platform is positioned in such a way that the center of the calculated scan area *li* = (*xI*, *yI*) *<sup>T</sup>* is located at the center of the system's reachable workspace *ri* = (*xR*, *yR*) *<sup>T</sup>*. Therefore, the centers *ci* are translated alongside the x-axes of the workpiece coordinate system *I*. Depending on the approach direction of the mobile platform, the translation is performed in *x*+- or *x*−-direction. The shifted positions *l* ∗ *<sup>i</sup>* are converted into 2D poses *Li* = (*xI*, *yI*, *θI*) *<sup>T</sup>* by adding an orientation accordingly to the translation direction. The 2D poses are transformed into the world coordinate system *W*, resulting in the required 2D goal poses for the navigation task.

#### *2.4. Task Management*

The inspection of a large-scale rotor blade is a long-running task. Therefore, we designed a task management system in form of a nested state machine. The state machine uses the SMACH framework [46]. The *main* state machine includes two nested state machines managing the manipulation and the navigation task. Figure 10 shows the task management system.

**Figure 10.** Task management system. State machines in blue and ROS nodes in orange.

The main state machine requires as input the model of the workpiece (CAD File), the desired distance between the radar sensor and the surface of the workpiece (DistanceSR) and the characteristics of the used manipulator (ManipulatorInfo). The ManipulatorInfo consists of the suitable area of the manipulator workspace, including the corresponding maximum reach of the manipulator.

The workpiece segmentation and the position determination of the platform at the local scan subsegments is executed a priori. The workpiece segmentation provides the local scan areas (ScanAreasInfo). Each ScanAreaInfo consists of the area center (ScanAreaCenter) and the dimensions (ScanAreaDim) in reference to the workpiece coordinate system. The *platform positioning* converts the centers of the local scan areas into 2D navigation goals (PosesGoal) in reference to the world coordinate system.

The navigation and manipulation state machines are triggered sequentially for each pair of navigation goal (PoseGoal) and local scan area (ScanAreaDim). The navigation state machine takes care about the path planning and path execution processes. The move base flex [47] framework is implemented to change between different motion behaviors of the mobile platform depending on the current navigation zone. The manipulation state machine is divided into three process steps. The *point cloud executor* gathers the point cloud data at the local scan pose (PoseScan) and crops it accordingly to the dimensions of the local scan area. The waypoint generator determines the 6D end-effector waypoints (WaypointsEE), orthogonal oriented to the surface of the workpiece. The manipulation executor controls the motion of the manipulator.

#### **3. Experiments and Discussions**

#### *3.1. Production Environment Navigation*

We evaluated the presented zone-based navigation concept in the multi-robot simulator Gazebo [48]. Therefore, we created 10 different production environments of the size

60 × 60 m. Each simulated production environment features an individual navigation zone layout, comparable to the layout shown in Figure 6. In each simulated production environment, we defined 10 start-goal-position tuples, which resulted in a total of 100 tuples. The tuples were divided into five groups:


The five groups are equally distributed across the 10 simulated environments, resulting in two tuples for each group per environment.

For each tuple, we manually annotated the desired path *Pi* with *i* = {1, 2, . . . , 100}. Each path *Pi* fulfills two criteria. The first one is the reduction in the overall costs of the path by preferring the corridor zone. The second one respects the desired right-hand driving behavior alongside the corner between the corridor zone and the guard rail zone.

The experiment compares the performance of the default A\* and Dijkstra implementation of the robot operating system (ROS) [49] with our adapted version, later called A\*zone and Dijkstrazone.

The deviation between a planned path *A* and the corresponding manually annotated path *B Pi* is represented by the average mean square error *aMSE*. The calculation is given in Equation (5):

$$aMSE = \frac{\sum\_{i=0}^{n} d\_i^2}{n},\tag{5}$$

where *di* is the Euclidean distance between each of the *n* positions *ai A* and its closest neighbored position *bi B*. Figure 11 shows the experimentally determined results for each of the 10 simulated production environments averaged over all five start-goal-position tuple groups.

**Figure 11.** Average mean square error over all paths for each simulated environment.

As expected, the *aMSE* is significantly higher for the default implementations of A\* and Dijkstra. This result is caused by the right-hand driving criteria of the manually annotated reference paths. In addition to the *aMSE*, we analyzed the following performance parameters:


#### 3. The path length.

The required process time indicates the additional computation effort caused by the implementation of Algorithm 1. The number of expanded cells reflects the efficiency of reaching the goal cell. The overall path length shows the additional path caused by the desired right driving behavior. Table 1 shows the performance parameters averaged over all 100 start-goal-position tuples.

**Table 1.** Comparison of the evaluated planning approaches. Performance parameters averaged over 100 start-goal-position tuples in 10 simulated environments.


The default implementations of A\* and Dijkstra provide the most cost-efficient path. The cost efficiency is mainly influenced by the cost levels of the different zones. Inside a zone, the length of the path is the factor of influence. Therefore, the average path length of the default implementations is approximately 14% smaller compared to our approach. With an average *aMSE* of 0.11 m for A\*zone and 0.06 m for Dijkstrazone, our approach proved to provide consistent paths following the concept of right-hand driving.

The general difference in processing time between the approaches A\*, A\*zone and Dijkstra, and Dijkstrazone is caused by the calculation of the heuristic. Therefore, the average processing time for A\*zone is 2.5% greater than that for Dijkstrazone, even though the number of expanded cells is approximately 35% smaller.

The additional computation effort needed to determine the right-hand driving cells at each expansion step does not increase the overall processing time. Quite the opposite, the approach A\*zone needs approximately 11% less average processing time. This decrease correlates with the number of expanded cells, which is also approximately 11% smaller than that of A\*. The same applies to the approach Djikstrazone, which results in 9% less expanded cells and a corresponding decrease in the overall processing time. Table 2 shows a comparison of the five start-goal-position tuple groups by evaluating the number of expanded cells normalized to the resulting path length for each tuple.


**Table 2.** The number of expanded cells normalized to the path length.

The goal zone has a high impact on the efficiency of the evaluated path planners. If the goal position lies within the corridor zone (*TSC*, *TCC*), the number of expanded cells is significantly lower compared to a goal position outside the corridor zone (*TRR*, *TSS*, *TCS*), by comparing columns *TSC*, *TCC* with other columns of Table 2. This behavior is caused by the cost levels of the zones and the nature of the path planning algorithms. Since the cost level of the corridor zone is relatively low the planners will expand inside the corridor zone with a high preference, before starting to expand inside a restricted or station zone. The A\* and A\*Zone implementations provide a lower number of expanded cells for each paths group, as highlighted by the bold font in Table 2, which is caused by the additional heuristic.

Our right-hand driving approach results in an additional decrease in the number of expanded cells. This is caused by the effect of our method on the behavior of the path planning algorithms. Figure 12 shows a comparison of a path *p* ∈ *TSC* and the corresponding expansion potentials for each of the evaluated approaches.

**Figure 12.** Path potentials of a path related to a start-goal-position tuple *t* ∈ *TSC***.** Start position in red and goal position in green. (**a**) Dijkstra; (**b**) DijkstraZone; (**c**) A\*; (**d**) A\*Zone.

The relatively low-cost cells at the edge between the corridor zone and the guard rail zone let the path planning algorithms concentrate their expansion towards a particular direction. Since the corridor zone is designed to connect typical station areas in the industrial environment, our method reduces the number of cells that are expanded in the corridor zone. Therefore, our method provides a higher expansion efficiency which results in lower computational effort compared to standard implementations of Dijkstra or A\*.

#### *3.2. Evaluation in Real-World Use-Case*

The presented concept is evaluated at a typical rotor blade form, which features a length of 11 m. Due to space limitations of the experiment environment, only the tip of the form is used throughout the experiment. Figure 13a shows the 3D model of the tip. The tip has the dimensions 2.0 × 0.8 × 0.9 m3 (LxHxW). Figure 13b,c show the real-world form equipped with glass fiber mats and a setup for vacuum generation.

**Figure 13.** Rotor blade forms. (**a**) 3D model of the tip of the rotor blade form; (**b**) rotor blade form used in the real-world experiment (side view); (**c**) rotor blade form used in the real-world experiment (top view).

Figure 14a shows the analyzed workspace of the used Universal Robot UR5 displayed as a cut in the x, y-plane of the manipulator coordinate frame *R*. Since the end-effector will be downward oriented most of the time during inspection and accordingly to [29], the used geometric primitive to analyze the workspace is a downward facing hemisphere. Each analyzed voxel has the edge length of 50 mm. Figure 14b shows the same cut, with a threshold at 50% reachability applied.

**Figure 14.** Workspace analysis of the manipulator for the inspection task at the corresponding height. (**a**) Cut through the x,y-plane; (**b**) Cut through the x,y-plane with reachability threshold of 50%; (**c**) chosen workspace area at height zr = 0.1 m.

The chosen suitable workspace area consists of two layers in z-direction to cover the curved shape of the blade tip form. The area is shown in Figure 14c and of size: 0.45 m in x-direction, 0.7 m in y-direction, and 0.1 m in z-direction of the coordinate system *R*. In addition, the so-called max reach value *maxreach* of the mobile manipulator is calculated. The *maxreach* symbolizes the maximum distance the manipulator is able to reach into the workpiece alongside the robot coordinate system *R*. The calculation is given in Equation (6):

$$
gamma\_{\text{reach}} = D\_{\text{MW}} - (d\_{\text{RPF}} + d\_{\text{PFW}}), \tag{6}
$$

where *DMW* is the distance between the origin of *R* and the reachable voxel, which is located at the maximum distance alongside the x-axis of *R*. The distances *dRPF* and *dPFW* depend on the hardware setup and the minimum safety distance between the platform and the workpiece. The distance *dRPF* describes the distance between the origin of *R* and the front of the mobile platform. The distance *dPFW* describes the safety distance between the front of the mobile platform and the workpiece.

Based on the method presented in Section 2.3.4, the workpiece is segmented in inspectable subsegments taking the maximum reach *maxreach* of the hardware setup into account. Figure 15a shows the colored subsegments. Figure 15b shows the corresponding poses of the mobile platform as black arrows for each subsegment. Figure 15c shows the mobile manipulator executing the scan process at a subsegment of the work piece.

**Figure 15.** Evaluation in a real use-case. (**a**) Work piece segmentation; (**b**) visualization of the navigation zone setup and the platform positioning; (**c**) actual execution of the scanning process.

At each local subsegment, the surface reconstruction (cf. Section 2.3.1) is executed. Therefore, an RGB-D camera provides a point cloud of the work piece. A crop box filter removes all points, which are related to the ground or the robot itself. In the next step, a HSV filter is applied to the remaining points to remove points related to the yellow corner tape of the vacuum setup (cf. Figure 15c). The resulting point are clustered by their Euclidean distance to determine the points belonging to the scanning surface. A down sampled version of the resulting point cloud is used for the surface reconstruction. Figure 16 shows the complete pipeline for a point cloud captured at a local subsegment.

The waypoint generation (cf. Section 2.3.2) is based on the surface reconstruction. Figure 17a shows the approached 3D positions of the end-effector in the robot coordinate frame *R* during the inspection process of one local subsegment. Figure 17b shows the surface-orthogonal orientation of the end-effector at each 3D position.

**Figure 16.** Processing pipeline of the surface reconstruction at a local workpiece subsegment.

**Figure 17.** Execution of inspection process at one local subsegment of the work piece. (**a**) 3D positions of the end-effector; (**b**) orientation of the end-effector at each 3D position.

The 3D positions of the end-effector are shifted alongside the calculated surface normal by the amount of the chosen scanning height. The resulting 3D positions reflect the surface

of the workpiece (see Figure 18a). Figure 18b shows the path of the end-effector during the inspection process.

**Figure 18.** Execution of inspection process at one local subsegment of the work piece. (**a**) 3D positions of the end-effector shifted alongside the surface normal by the amount of the scanning distance; (**b**) path executed by the end-effector during the inspection process.

The pose information of each local subsegment is transformed into the static global frame map. Therefore, the localization capabilities of the mobile manipulator OMNIVIL are used, as described in [29]. Figure 19a shows the approached 3D positions of the end-effector in the map frame. The positions are colored accordingly to their related local subsegment. Figure 19b shows the surface-orthogonal orientation of the end-effector at each 3D position.

**Figure 19.** Execution of inspection process of the complete work piece. (**a**) 3D positions of the end-effector; (**b**) orientation of the end-effector at each 3D position.

Figure 20a shows the shifted end-effector positions, which reflect the concave and convex shape of the scanned surface. The middle part of the form is not covered due to the

limited maximum reach of the used manipulator UR5. Figure 20b shows the path of the end-effector executed at each local subsegment.

**Figure 20.** Execution of inspection process of the complete work piece. (**a**) 3D positions of the end-effector shifted alongside the surface normal by the amount of the scanning distance; (**b**) path executed by the end-effector during the inspection process.

#### **4. Conclusions**

This study presented a method for the automation of the large-scale inspection process of wind turbine blades in manufacturing. The focus was set on the control of the autonomous mobile manipulator. It provided insights into related research fields, including autonomous navigation and surface orthogonal motion planning. The presented methods are applicable to various tasks related to large-scale inspection.

The developed approach realized autonomous navigation including a zone-based segmentation of the production environment. The common approach of a layered costmap was extended to fulfil the needs of a collaborative human–robot industrial environments. In addition, a new method was presented, which manipulates the cost values during the search expansion of a path planner. The method was applied to the state-of-the-art algorithms A\* and Dijkstra and was used to realize a right-hand driving behavior of the mobile manipulator in corridor zones. An experiment in a simulation environment showed the superior efficiency and reliability of the method. The actual inspection process was performed in an asynchronous mode by the mobile manipulator. Therefore, a method was developed to segment the workpiece into smaller subsegments, which can be inspected by the manipulator. The motion planning at the local subsegments used a surface reconstruction based on point cloud data. The resulting waypoints were considered nodes in a complete graph. The problem to find the shortest path was solved by applying algorithms related to the traveling salesman problem. The developed system was evaluated in a real-use case.

Further improvements will focus on the segmentation of the production environment. The segmentation can be performed automatically by taking documentation of the factory layout into a concern or by identifying workstations by the robot itself. Furthermore, the generation of the workpiece model should be performed by the robot itself.

**Author Contributions:** Conceptualization and methodology, H.E.; supervision, S.D. and S.K.; software, H.E., P.C. and H.D.; validation, H.E., P.C. and H.D.; writing—original draft preparation, H.E.; writing—review and editing, H.E. and S.D.; project administration, H.E. and P.C. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by European Regional Development Fund. Research project FiberRadar (EFRE-0801493).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** This project was supported by the Faculty of Engineering and Built Environment, Tshwane University of Technology and the Faculty of Mechanical Engineering and Mechatronics, University of Applied Sciences Aachen.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **CHARMIE: A Collaborative Healthcare and Home Service and Assistant Robot for Elderly Care**

**Tiago Ribeiro 1,2, Fernando Gonçalves 3,4, Inês S. Garcia 1,2,5, Gil Lopes <sup>6</sup> and António F. Ribeiro 1,2,\***

	- id8699@alunos.uminho.pt

**Abstract:** The global population is ageing at an unprecedented rate. With changes in life expectancy across the world, three major issues arise: an increasing proportion of senior citizens; cognitive and physical problems progressively affecting the elderly; and a growing number of single-person households. The available data proves the ever-increasing necessity for efficient elderly care solutions such as healthcare service and assistive robots. Additionally, such robotic solutions provide safe healthcare assistance in public health emergencies such as the SARS-CoV-2 virus (COVID-19). CHARMIE is an anthropomorphic collaborative healthcare and domestic assistant robot capable of performing generic service tasks in non-standardised healthcare and domestic environment settings. The combination of its hardware and software solutions demonstrates map building and self-localisation, safe navigation through dynamic obstacle detection and avoidance, different human-robot interaction systems, speech and hearing, pose/gesture estimation and household object manipulation. Moreover, CHARMIE performs end-to-end chores in nursing homes, domestic houses, and healthcare facilities. Some examples of these chores are to help users transport items, fall detection, tidying up rooms, user following, and set up a table. The robot can perform a wide range of chores, either independently or collaboratively. CHARMIE provides a generic robotic solution such that older people can live longer, more independent, and healthier lives.

**Keywords:** service robot; assistant robot; collaborative robot; healthcare; elderly care; intelligent systems; COVID-19

#### **1. Introduction**

The world's population is consistently growing older, with the over-65 age group overgrowing all other age groups. The predicted worldwide growth in the elderly population is from 702 million in 2019 to over 1.5 billion by 2050 [1]. In 2014, the over-55-years old population outnumbered persons aged 15 to 24 years old, and by 2035 it is expected that the ages 0 to 14 will also be outnumbered [2]. The elder age group is estimated to continuously grow and outnumber all youth and child populations under 24 years old by 2050. It is estimated that the percentage of elderly people (the over-65 age group) in the European Union will grow from 19% to 29% over the next approximately 50 years [3]. Population ageing and public health expenses primarily dedicated to older dependent persons presents significant challenges, with implications not just on the social aspect but also economically [4]. The World Health Organization (WHO, Geneva, Switzerland) even developed a global strategy and action plan on ageing and health that focused, among other strategic objectives improving measurement, monitoring, and research on healthy ageing [5].

**Citation:** Ribeiro, T.; Gonçalves, F.; Garcia, I.S.; Lopes, G.; Ribeiro, A.F. CHARMIE: A Collaborative Healthcare and Home Service and Assistant Robot for Elderly Care. *Appl. Sci.* **2021**, *11*, 7248. https:// doi.org/10.3390/app11167248

Academic Editors: António Paulo Moreira and Emanuele Carpanzano

Received: 7 May 2021 Accepted: 2 August 2021 Published: 6 August 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

These statistics/data prove the growing need for efficient elderly care solutions regarding therapy, rehabilitation, companions and activity planning, but most importantly, healthcare robotics [6–9] that are capable of aiding in day-to-day tasks, collaboratively or independently. Healthcare generic service and assistive robots can provide practical help to increase the elderly population's life quality, improving cognitive and physical health. These robots can play an essential role concerning healthcare support and independent life, especially when problems related to ageing start to appear. Moreover, service robots provide safe healthcare assistance in public health emergencies such as the SARS-CoV-2 virus (COVID-19) [10–12].

The Collaborative Healthcare/Home Assistant Robot by Minho Industrial Electronics (CHARMIE), represented in Figure 1, is an anthropomorphic healthcare and domestic service and assistive robot capable of performing tasks in non-standardised environmental settings. The social goal of the CHARMIE project is the development of a robot capable of aiding in nursing homes, healthcare facilities and domestic houses, among other settings. The scientific objective of CHARMIE is the development of a multifaceted anthropomorphic robot capable of performing a broad set of tasks based solely on machine learning solutions that allow the robot to learn how to perform and improve tasks via observation and trial-and-error direct interaction with the environment.

**Figure 1.** CHARMIE (Collaborative Healthcare/Home Assistant Robot by Minho Industrial Electronics) different variations. (**a**) Conceptual sketch of the anthropomorphic robot. (**b**) Developed anthropomorphic design. (**c**) Primary prototype assembled.

Some service robots are already being implemented in geriatric care [13]. The development of such robots is encouraged with funding from the European Union with projects such as Hobbit [14], ENRICHME [15], ARNA [16] and Sobi [17]. These four robots go beyond the simpler pet-like social companion robots already possessing sensors and actuators that allow them to perform more complex tasks. Focusing on enhancing older people's well-being, some examples of tasks already performed are user entertainment, object manipulation and transportation, empathic and social human-robot interaction, monitoring persons and home-related chores. MOnarCH [18] presents a multi-robot cognitive system whose operation is already being tested in hospitals. It targets the development of a novel framework to model mixed human-robot societies and its demonstration using a network of heterogeneous robots and sensors in an oncological hospital's pediatric area. It

can handle uncertainties introduced by people and robots, generate natural interactions and engage in educational entertainment activities. However, the previously mentioned robots only solve tasks using pre-programmed methods and cannot generalise to different task variations or significant environmental changes.

Robotics competitions can be used for benchmarking specific functionalities and integrated systems [19,20]. RoboCup@Home [21,22] is no exception and is regarded as a top competition with numerous state-of-the-art contributions in the field of service and assistive robotics. The goal is to foster the development of versatile domestic service robots that operate safely in daily life non-standardised environments. Some of the robots achieving the best results in the latest competitions are Walking Machine [23], RoboFEI [24], CATIE [25], Homer [26] and AMIGO [27]. Recent years have shown robots solving a wide range of different domestic tasks, like watering plants, taking the trash out, storing groceries, serving breakfast, cleaning and setting up a table. These tasks provide broad coverage of multidisciplinary subjects such as Navigation, Mapping, Person Recognition, Person Tracking, Object Recognition, Object Manipulation, Speech Recognition, Gesture Recognition and Cognition. Even though the tasks referred fit the domestic environment standards, these can straightforwardly be adapted to healthcare purposes and environments due to their genericness. Regarding cognition, RoboCup@Home teams are already using machine learning technique primarily based on supervised learning to solve tasks such as face recognition [28], body pose estimation [29,30] and object detection [31,32].

The presented robot, CHARMIE, is an anthropomorphic general service and assistive robot that focuses on healthcare and domestic environments to aid elderly people and healthcare employees. The purpose of CHARMIE is to provide a robotic solution capable of providing healthcare support and more independent life, assisting both the elderly's cognitive and physical health. It can perform a wide range of tasks in almost all nonstandardised indoor environments and some outdoor environments. The primary focus regarding environments are nursing homes, hospitals, healthcare facilities, and domestic houses. These show the highest necessity and opportunity to integrate a robotic system capable of helping elderly people and healthcare workers. The capability of a wide range of tasks allows this robot to help in various domains, such as pick and place tasks, goods transportation, following workers/patients/users, speech communication (hearing and talking), visual analysis of the environment, navigation, object recognition and manipulation. The cooperative and collaborative aspect is demonstrated by the robot's ability to perform tasks parallel to user chores and even aid in tasks that the user cannot perform alone. The generic tasks described can be easily adapted to the robot's purpose. An example of pick and place tasks adapted to the purpose and environment can be: (i) delivering food trays to patients in hospitals; (ii) collecting crutches from residents in nursing homes; (iii) transporting boxes of medicines between different areas in healthcare facilities; and (iv) setting and cleaning up a table or loading a washing machine at a house. The variety of different tasks these generic robots can perform allows the robot to help in various ways, whichever best suits the user's present needs.

#### **2. Materials and Methods**

To perform the wide variety of non-standardised chores, CHARMIE's system and hardware went through different development choices regarding hardware components and related dependencies. The complexity required by some tasks dictated that the robot's system and hardware solutions had to contemplate a significant number of degrees of movement. From the omnidirectional platform to the arms, all components were dimensioned, envisioning the complex movements the robot must make. An example is pushing wheeled trolleys that force the robot to adjust the position of both arms throughout the action and the platform movement that must fit the moving object. Additionally, the anthropomorphic shape of CHARMIE allows users to be more receptive to interact with the robot, as described in [33]. From a practical perspective, it is easier to interact with the real world if the robot has the same shape as a human since the environment is optimised

for human-shape interaction. From a social perspective, creating a small human's physical appearance gives the impression users are interacting with a small, friendly robot, which translates into a sense of friendliness and proximity to its users. Even though, at the moment, CHARMIE is not fully anthropomorphic, some parts are, such the arm. The final objective is to reach the anthropomorphic level presented in Figure 1a,b.

#### *2.1. System and Hardware*

As described in Figure 2, CHARMIE's hardware [34] can be divided into four sections: (i) the motion platform; (ii) the robot arms; (iii) the lifting mechanism and torso; and (iv) the robot head. The goal is to provide solutions that best fit generic service tasks to improve elderly care. Thus, [34] provides a more in-depth description of hardware sections (i) and (ii). One aspect that must also be referred to regarding the robot's design is its anthropomorphic design. One study concerning robot shape indicates that the robot's visual design is directly related to the number of human-robot interactions initiated by humans [35]. The anthropomorphic shape grants the people who interact with the robot a greater sense of comfort or friendliness than differently shaped robots. Different shapes made human users more reluctant to interact with the robot due to fear of the unknown and its movement. Having the shape of a human body allowed the robot to have a higher number of interactions, resulting in more tasks performed by the robot, thus helping more significantly its users, both the elderly and healthcare workers. Another significant advantage of the anthropomorphic shape is that every day-to-day human environment is refined to the human body height-wise, weight-wise, and shape-wise. This aspect facilitates robot interaction with day-to-day environments without requiring an adaptation of the world to best fit the robot's capabilities. With this shape, the concept is precisely the opposite: adjust the developed robot shape to enhance its interaction with human environments, rather than adapting every human environment the robot must interact with.

**Figure 2.** CHARMIE's four hardware sections, namely robot head, robot arms, the lifting mechanism and torso and the motion platform on both the physical robot and the anthropomorphic implementation.

#### 2.1.1. Motion Platform

For service and assistive robots that operate in highly dynamic environments with humans working beside them, it is a significant advantage to have a platform that allows movement in any direction at any point in time. The locomotion system is the only

part of CHARMIE that is not intended to be anthropomorphic (bipedal) for stability and simplicity reasons. The motion platform developed uses four omnidirectional wheels with individual suspension systems. This design was conceived with the robot's predicted interaction environments in mind, mainly large/medium indoor environments such as hospitals, healthcare facilities, nursing homes and houses. The wheels are 203 mm double aluminium omnidirectional wheels with roller bearings. When motion platforms use three omnidirectional wheels, and the center of mass is considerably high, the platform may be at risk of falling under certain circumstances. If a linear momentum is applied in the 120◦ gap between wheels the robot may lose balance and end up falling. With the addition of the fourth wheel and consequential reduction of the degree gap between wheels the robot significantly improved in safety and stability. However, with this addition, it is possible for a wheel to occasionally lose contact with the floor due to slight surface irregularities or slopes. Locomotion wise, this translates into unpredictable and incorrect movement. Thus, a compact MacPherson [36], Figure 3a, suspension system was developed to overcome floor irregularities, small bumps, and slope variations while improving its control smoothness. The motion platform is a regular octagonal shape with a 54 cm diameter designed so CHARMIE fits through every standard door frame size. It is purposely heavy (~20 kg without batteries) to guarantee a low centre of mass for safety reasons, ensuring the robot does not fall if an external force pushes it. Additionally, this motion platform can transport a load of approximately 65 kg, as shown in Figure 3b. In [34], a more in-depth analysis of the motion platform and suspension system is presented.

**Figure 3.** (**a**) MacPherson suspension system designed for CHARMIE to improve stability and guarantee all four wheels are in contact with the floor surface. Image from [34]. (**b**) CHARMIE's motion platform load transportation experimental test. The platform was tested, moving for approximately 5 min while maintaining the movement performance and stability. The load in the picture shows an average adult human weight of approximately 65 kg.

#### 2.1.2. Lifting Mechanism and Torso

A significant number of the tasks that generic service and assistive robots must perform involve object manipulation. The motion platform allows the robot to move throughout the environment on the *x*-axis and *y*-axis. To interact at different heights with objects such as cabinets with shelves at different heights or with users, from children to adults, it is of extreme value to implement a system that allows the robot to have a *z*-axis DOF (Degree of Freedom). By implementing a lifting system, the workspace of the redundant manipulators attached to the torso increases. Moreover, the robot can move up and down in its *z*-axis, allowing it to interact with objects at different heights, from picking/placing objects from/to the floor to interacting with objects on tables, counters and shelves, among others. In the elderly care context, the robot needs to collect things from the floor. This is primarily due to older people having significant difficulties in lowering themselves to pick up something from the floor, which sometimes leads to falls or injuries that can be avoided with the proposed robot solution.

The initial lifting system solution shown in Figure 4a,b consists of a ball screw spindle mechanism. The torso that is in the threaded shaft has two linear bearings connected to aiding beams that go from the motion platform to the head. By activating the motor, the torso moves linearly up and down, providing the lifting mechanism. In this system, only the torso moves, and the only parts assembled to the torso are the redundant manipulators.

**Figure 4.** The different lifting mechanisms implemented in CHARMIE. The top three images show the lifting mechanisms at a higher elevation, and the bottom three images show the lifting mechanism at lower heights. (**a**) Initial ball screw spindle lifting mechanism on top position. (**b**) Initial ball screw spindle lifting mechanism on the bottom position. (**c**) Height changing ball screw spindle lifting mechanism on top position. (**d**) Height changing ball screw spindle lifting mechanism on the bottom position. (**e**) Anthropomorphic lifting mechanism on top position. (**f**) Anthropomorphic lifting mechanism on the bottom position.

Moreover, two new different systems were developed to overcome the problems the initial elevation system had. The first, shown in Figure 4c,d, resembles the lifting mechanism presented by AMIGO [37,38]. Similarly to the initial lifting system, a ball screw spindle mechanism lifts the torso. However, in this implementation, an aluminium tube is connected to the threaded shaft. This system allows the torso to be assembled in the aluminium tube, which with three slider rails can move the torso up and down in a linear movement. The two main differences from the AMIGO elevation system to the initial lifting system design are: (i) instead of moving just the torso and consequently the redundant manipulators in the *z*-axis, it also moves the head; and (ii) the variation of the robot's height. As previously described, this system can adjust the torso height to best fit its goals. CHARMIE can pick objects from the floor in its lower position, and in its higher position, it

can place its arms height-wise, similar to an average size adult human. Since the head is now also present on the torso in this solution, all sensors and the multimodal user interface can alter their height. This system can now move the head height-wise, positioning the camera height to better analyse the environment using the computer vision system. In the initial implementation system, the only degree of freedom was a rotation on the neck so the robot could look up and down. Often, this DOF proved insufficient to analyse the environment since, at times, the robot needed to place its head at the height of what it had to analyse for better results. Computer vision wise, object and user detection are simplified by allowing the robot to position itself at the best angle. This system grants this movement that improved all computer vision-related tasks significantly. The multimodal user interface previously connected to the head can now also move up and down to best fit the user's height. The other advantage of this system is the ability to change the robot's height. Psychologically, it is highly advantageous to alter the robot's height according to the person with which it interacts. Similarly to a human body, when the robot moves down to pick an object from the floor, it moves all of its body with that purpose. The initial solution could only move the arms, not following the anthropomorphic ideology. The field tests developed demonstrated that people were more interactive with this version of the robot due to it being more anthropomorphic. The robot altered its height to be slightly smaller than the user it was interacting with. Creating a sense of inferiority to the human reduces the users' reluctance to intervene with the robot. This way, CHARMIE can operate most features in a domestic environment while at the same time having a friendly appearance

The second lifting mechanism under testing, shown in Figure 4e,f is based on the human's legs. For this elevation system, the following requirements were taken into consideration: (i) structural integrity to support the robot's weight; (ii) anthropomorphic look; (iii) self-locking actuation to reduce energy consumption; and (iv) allowing the robot to squat, increasing its workspace and making it able to interact with objects on the floor. This lifting mechanism has the same advantages as the first lifting mechanism, all of the upper body can move up and down, and the robot's height can change, with the addition of resembling the human body even more. Although the mechanical system's complexity increases substantially in this iteration, a design based on the same mechanical principles is being made, which results in a more straightforward and more reliable solution with a significant increase in robustness. Thus, this mechanism was developed to encompass only 1 DOF that controls the ankles, the knees and the hips of the robot, allowing a complete squatting movement. In [34], a more in-depth analysis of this anthropomorphic lifting mechanism is presented.

#### 2.1.3. Robotic Arm

The arm's design objective was to develop an affordable, lightweight component with a human-like design. The initial design demonstrates an autonomous robotic manipulator with four degrees of freedom, named the Robotic Arm for Collaboration with Humans in Industrial Environment (RACHIE) [39]. This robot's primary goal was to perform a simplified service pick and place task with human interaction to sort cans according to their colour and shape. By simplifying some of the tasks that generic service robots must perform, it allowed the development of a robotic arm that fulfilled all of the objectives previously stated. However, this manipulator did not provide the degrees of freedom necessary for some generic service robot tasks' complexity. The desired solution, to also comply with the anthropomorphic objective, should present similarities to a shoulder, elbow and hand redundant manipulator [40].

The solution implemented in CHARMIE is based on the InMoov arm [41], initially designed by Gaël Langevin. InMoov consists of the first Open Source 3D printed life-size robot. The main differences between the original InMoov arm and CHARMIE's arm reside on the bicep and shoulder. The whole arm is printable on a 12 × 12 × 12 cm 3D, and its PLA parts weigh around 1.414 kg, with the actuators weighing around 0.766 kg. The whole weight of the arm is approximately 2.2 kg. From shoulder to the hand, the arm measures 75 cm and can lift a maximum load of around 400 g.

This arm can be divided into three main parts: (i) hand and forearm; (ii) bicep; and (iii) shoulder. Most generic service robots with robotic manipulators tend to use grippers to simplify this task. However, the anthropomorphic hand presents more DOFs that allow the robot to best use the hand according to the object, providing greater dexterity capability [42,43]. The hand is composed of five fingers, similar to a human hand, to provide different ways to pick up various objects. Each finger has a different number of joints, as can be seen in Figure 5a. The thumb has two DOF, the index and middle finger have three DOF, and the ring and little finger have four DOF. Each finger has one actuator (servo motor) that moves all DOF of the respective finger. The movement is based on a pulley mechanism, similar to tendons, where the motors are located in the robot's forearm, as shown in Figure 5b with fishing line tendons leading to the fingers. So, regarding kinematic architecture, this is an underactuated hand since, in total, it has seventeen DOF, but only five actuators. The wrist has one servo motor responsible for rotating the hand. The bicep has two servo motors, one responsible for lifting and lowering the forearm, and the second responsible for rotating the bicep and the forearm. The shoulder has two individual DOF for moving the whole arm parallel to the robot body and lifting the whole arm perpendicular to the body. In total, as shown in Figure 5c this redundant manipulator has 22 DOF with 10 actuators, which define the movement of the arm. The Denavit-Hartenberg parameters for the arm are described in the Table 1.

**Figure 5.** (**a**) An assembly of a finger, with three degrees of freedom but only one actuator, using fishing line as tendons. From top to bottom, the finger starts closed and opens until it is fully stretched. (**b**) The interior of the forearm has five servos actuators that individually control each finger. (**c**) The complete assembly of CHARMIE's arm.

**Table 1.** Denavit-Hartenberg from CHARMIE's robot arm.


The four parameters are described as: θ*<sup>i</sup>* the rotation around the *Zi*−<sup>1</sup> axis by the angle between the links, *di* the translation along the *Zi*−<sup>1</sup> axis of the distance between the links, *li* the translation along the *Xi* axis (rotated *Xi*−<sup>1</sup> axis) of the length of the link, and *αi*, the rotation about the *Xi* axis of the twist angle.

For elderly care specifically, the arm and hand allow the robot to perform essential tasks. Picking up objects from the floor, low tables, counters and shelves and transporting them to the desired place ease many tasks that otherwise would either not be done or be done with many risks associated. With the addition of the anthropomorphic hand, the robot can easily interact with everyday objects specifically designed for the human hand with different types of grips and grasps. In Figure 4 solutions with both two arms and one arm are presented. The real-world robot and real-world tasks only uses one arm, as can be seen in Figure 5c, whereas the simulated robot already uses the two arms.

#### 2.1.4. Robotic Head

The robot head holds the RGB-D camera, the multimodal user interface and the microphone. From a human interaction perspective, the head is the part of the robot the users look at when communicating with the robot, so it must be appealing and functional. Since the robots' head height could not be adjusted in the initial lifting mechanism system, a DOF was introduced to simulate a neck, so the robot could rotate its head, similar to a yes nod movement. This movement allows the robot to adjust its RGB-D camera angle to see objects at different heights. In the medium position, parallel to the motion platform, it can see objects on tables, shelves and people with an above-average size, as can be seen in Figure 6a. In the minimum position, at −60◦ from the horizontal field of view, it can see objects on the floor touching the front of the robots motion platform, as can be seen in Figure 6b. Lastly, at +30◦ degrees, it can see objects and people higher than the robot in a higher position, as can be seen in Figure 6c. For now, if the robot needs to see further to the sides, it rotates its motion platform, so it is always facing what it is trying to analyse. With the addition of the two new lifting mechanisms, the same system is used just for the RGB-D cameras. This camera is responsible for all visual related tasks such as: (i) user recognition; (ii) pose detection and tracking; (iii) gesture recognition; (iv) obstacle detection in navigation; (v) mapping; and (vi) object learning and recognition. Both the microphone and the multimodal user interface were part of the initial lifting system head. Being able to adapt both systems' height to the user's height allowed a cleaner interaction overall. The microphone is placed at the same height as the users head, allowing better voice recognition in environments with a significant noise level.

**Figure 6.** CHARMIE's field of view using the neck DOF. (**a**) Parallel to the motion platform (0◦). (**b**) Looking down to see objects near the motion platform or on the ground (−60◦). (**c**) Looking up to see items on top shelves (+30◦).

Moreover, the multimodal user interface also best fits the users' height to be more comfortable to interact with. Thus, from the initial lifting mechanism onwards, these two systems are now part of the lifting mechanism, since it is unnecessary to rotate them similarly to the RGB-D camera.

#### *2.2. Components*

To accomplish generic service and assistive tasks to aid in day-to-day life, for both the elderly and healthcare workers, CHARMIE must safely navigate in healthcare-related and domestic environments, perceive and track its human users, recognise gestures or poses and detect and manipulate different everyday objects. To achieve this, CHARMIE has a set of sensors and actuators that best fit both its environment and its tasks. With the sensorial data, the robot must perform some low-level cognitive functions that, when combined, allow the robot to perform more complex chores, both independently and collaboratively. The low-level functions can be classified into four groups of tasks: (i) map building and self-localisation; (ii) navigation; (iii) human-robot interaction; and (iv) object detection and manipulation. The training of all systems that require neural networks is made off-line and the main processor is a MSI Cubi 2 mini-PC with Core-i5 processor and 4 GB RAM.

#### 2.2.1. Sensor System

To move safely according to the operating environment and with the purpose to perform the necessary tasks, CHARMIE must have an adequate perception of the following low-level functions:


For map-building and self-localisation, CHARMIE uses two sensors: (i) 2D LiDAR (HOKUYO URG-04LX-UG01) mounted on the motion platform; and (ii) an RGB-D camera (Microsoft Kinect) located on the robot head. The laser range finder provided a 2D map of the environment near the floor, whereas the RGB-D camera provided a 3D map. The combination of both technologies allowed the robot to take advantage of the positive sides of both technologies. The 2D map could detect small objects on the floor or at the motion platform height, while the 3D map illustrates the complete environment map.

For safe navigation in various indoor environments such as hospitals, nursing homes and domestic houses, the robot uses the same sensors as the mapping and self-localisation tasks. Again, the combination of the 2D laser range finder and the RGB-D camera lets the robot detect different types of obstacles and react appropriately. The 2D detects all small objects on the floor that are harder to detect using the RGB-D camera, and the 3D information can detect all other objects at every height. In the motion platform, the motors have encoders embedded to close the control loop. Besides, the motion platform has current and voltage sensors for every motor. These sensors allow CHARMIE to know whether a wheel is not touching the floor, if a motor is stuck and if the robot is pushing against an object.

For human-robot interaction, CHARMIE uses the RGB-D to detect its users, their pose and some specific recognisable gestures. Recent works developed on this robot already started using a different RGB-D camera (Intel® RealSense™ Depth Camera D455). One of the goals of CHARMIE is to communicate with its human users, both healthcare workers and older people, the same way humans communicate with each other by talking and hearing. Thus, the primary way to communicate with the robot is to speak some set of instructions that are interpreted and converted into tasks. This communication skill allows users who have not been familiarised with technology to easily interact and take the most advantage possible of CHARMIE. The robot has an MV5 Digital Condenser Microphone that allows the robot to hear 360◦ and a JBL GO Speakers. To overcome some difficulties that this communication system may present in particular situations, the robot has a multimodal user interface in its body so users can select the necessary tasks.

For object detection, the robot uses its RGB-D camera. The objects that need to be detected are usually on the floor, on counters and tables, or hand delivered by a user to the robot. Thus the focus lies primarily on objects that are either between 50 cm to 120 cm or laying on the floor. To grasp objects that are hand-delivered or on counters the robot adjusts its lifting mechanism to best accommodate its redundant manipulators to the object's position. The same happens to the objects on the floor. That is why all lifting mechanisms can place themselves so that the robot arms can pick objects from the floor.

The robot's sensory system setup [44] is presented in Figure 7, where each sensor location is described on both implementations of CHARMIE's body. In the head, at a maximum height when the lifting mechanism is at the top position, the RGB-D camera is at 1.50 m which can rotate, tilting the head up and down. The initial RGD-D camera used is the Microsoft Kinect for all the tasks. However, with the recent acquisition of the Intel® RealSense™ Depth Camera D455 camera for CHARMIE, some human pose estimation tasks have already been developed using the new camera. The microphone, the speaker and the multimodal user interface lay on the torso with a maximum height of 1.30 m. The laser range finder is on top of the motion platform at approximately 25 cm of height. All the sensors related to the motors (voltage, current and encoders) are attached to the motors.

**Figure 7.** CHARMIE's sensorial description with all sensors' location used on both the physical robot and the anthropomorphic implementation.

#### 2.2.2. Map Building and Self-Localisation

To perform mapping of the environment, it is necessary to detect the points of interest known as keypoints. Additionally, to calculate travelled trajectory, it is necessary to quantify the movement occurred in-between frames. Rather than using all the pixels from an image to detect keypoints, the FAST [45] algorithm presents a computationally more efficient strategy than other similar solutions. This algorithm works by creating an adjustable bounding circle on every pixel that might resemble a corner. To be a corner, three nearby pixels inside the bounding circle must have a higher intensity than the fourth with a factor *μ*. Also, inside the bounding circle, there must be a set of collinear points with intensity higher than *μ*. These two conditions must be satisfied to consider this region a corner, and thus a keypoint. Next, to estimate the visual odometry using just the camera, since Microsoft Kinect does not have an IMU (Inertial Measurement Unit), the selected algorithm uses monocular odometry. It detects the keypoints using the FAST algorithm on consecutive frames and associates these between frames using essential matrix estimate through LMeds algorithm.

The method selected to build the 3D map of the environment was using OctoMap [46], based on octrees to group all points. This method loses part of the detail, which translates into higher computational efficiency. For CHARMIE's purpose, it is more valuable to have

a time-efficient algorithm than to have 3D mapping with great detail, since the robot only needs the map to safely navigate to the desired position. Figure 8 shows a point cloud converted into an OctoMap in three different perspectives representing an object on top of a table and a guitar next to the wall. Figure 9 shows an example of a complete indoor environment (office) mapped by CHARMIE.

**Figure 8.** Three different perspectives (**a**–**c**) of the same two objects, a toy on a desk and a leaning guitar on a wall converted into an OctoMap to simplify mapping high detail objects.

**Figure 9.** A complete map of an indoor environment, more precisely, an office. After some movement inside the office, CHARMIE created the OctoMap. In the middle of the image, a desk can be seen, two pillars on the bottom are speakers, and the remaining information is mainly regarding the walls. (**a**,**b**) demonstrate two different angles of the map.

The mobile platform's self-localisation is done using Adaptive Monte Carlo Localization (AMCL) [47]. In this algorithm, the robot's pose is represented as a set of multiple hypotheses concerning a prior known map. AMCL combines the information from the mobile platform's odometry and the 2D LiDAR. It starts by performing global localisation, so it is immune to the initial position and, after knowing its location, only performs local tracking using adaptive particle filters.

#### 2.2.3. Navigation (Obstacle Detection and Avoidance)

To safely navigate a previously mapped environment, CHARMIE must detect dynamic and static obstacles that might not be in the environment map and navigate accordingly to overcome these.

Regarding obstacle detection, the robot uses the same sensors as in the mapping function, the 2D LiDAR and the RGB-D camera. The 2D LiDAR is used for small obstacles at the motion platform height. It starts by checking if there is any obstacle inside a 1.50 m radius and calculates both the obstacle position relative to the robot and its size. The RGB-D camera starts with the same principle regarding the 1.50 m radius, calculates its position relative to the robot and size, creating a virtual obstacle from the floor to the robot's height. In Figure 10, an example of the same image from three different angles is displayed, showing an example of data fusion between the RGB camera and the Depth camera. The obstacles derived from the sensors are combined in a temporary virtual obstacle map that is constantly updated. By applying this method in consecutive frames, the direction of movement of dynamic obstacles is also added to the virtual obstacle map.

**Figure 10.** Three different perspectives (**a**–**c**) of three human users interacting with CHARMIE. The robot fuses the RGB and Depth cameras' information to create a 3D view of the environment.

After parameterising all the static and dynamic obstacles in a nearby radius and defining a target location, CHARMIE uses dynamic non-linear systems [48,49] as a distributed control architecture that generates navigation. Task constraints are component forces that are cast together into the vector field of this dynamical system. For example, the directions *ϕ* = *ψobs* (where obstacles are from the robot's viewpoint) and the directions *ϕ* = *ψtar* (where the target is) are constraints represented by repulsive and attractive forces acting on the heading direction. The attractive force attracts the system to the desired heading direction value, whereas the repulsive forces prevent the system from moving in an undesired direction. As the robot moves, the directions to the target and obstacles in the world variates, and consequently, the attractor and repellers move in the vector field.

As stated above, the virtual obstacle map foresees the direction and size of all obstacles that must influence the robot's trajectories according to the robot's reference. So, a repulsive force is applied to all obstacles:

$$f\_{obs,i}(\phi) = \lambda\_i(\phi - \psi\_i) \exp\left[\frac{-(\phi - \psi\_i)^2}{2\sigma\_i^2}\right], i = 1, 2, \dots (no \ of \text{ obstacles}), \tag{1}$$

where *φ* is the robot direction, *ψ<sup>i</sup>* is the obstacle direction, thus (*φ* − *ψi*) is the obstacle direction relative to the robot. *σ* is the angular magnitude on which a repulsive force acts, defined as:

$$\sigma\_N = \arctan\left[\tan\left(\frac{\Delta\theta}{2}\right) + \frac{R\_{\text{robot}}}{d\_i}\right] \tag{2}$$

where Δ*θ* is the angle the robot occupies, *Rrobot* is the robot radius and *di* is the distance from the robot's centre to the obstacle. *λ<sup>i</sup>* is the maximum repulsion force, defined as:

$$
\lambda\_i = \beta\_1 \* \exp\left[-\frac{d\_i}{\beta\_2}\right] \tag{3}
$$

where *β*<sup>1</sup> controls the maximum repulsion strength, and *β*<sup>2</sup> controls the decay rate with increasing distance. The repulsors contributions from all obstacles are summed, creating the repulsor vector field. To get the target position, it starts by transforming the target coordinates into the target direction and apply an attractive force as:

$$f\_{\rm tar}(\phi) = -\lambda\_{\rm tar} \* \sin(\phi - \psi\_{\rm tar})\tag{4}$$

where *ψtar* is the target direction and *λtar* is the attraction force magnitude. To finish the dynamic field vector system, all contributions are added.

$$\frac{d\phi}{dt} = \sum\_{i=1}^{N} f\_{obs,i}(\phi) + f\_{tar}(\phi) \tag{5}$$

In Figure 11, two different scenarios regarding nearby objects are presented. Figure 11a shows three obstacles in red that can be seen inside the maximum security distance circle. Consequently, the repulsors are created considering their distance to the robot and their size, represented in the second graph. In the first graph, the attractor (pink) and the sum of all repulsors (green) are totalled to create the dynamic field vector system (black) that yields the angle to which the robot should rotate to avoid the obstacles. Similarly, in Figure 11b, all the graphs shown represent the same variables as in Figure 11a.

**Figure 11.** Two different variations (**a**,**b**) of the dynamic field vector system are displayed depending on the various obstacles that the robot faces. The blue and red 2D graph on the right of both images represents the 2D LiDAR data. The bottom graph represents the velocity graph with one attractor to the desired speed. The middle graph represents the influence of the various obstacles the robot faces and consequential repulsors. The top graph represents the angle attractor (pink), the sum of all repulsors (green) and the final dynamic field vector system (black).

The difference between the two images lies in the increase in the obstacle size at the right. This translates into a more aggressive repulsor on the right side, represented in blue in the middle graph. Consequently, both the sum of all repulsors (green) and the dynamic field vector system (black) will also be more aggressive. The third graph of each image represents the speed of the robot, also manipulated by an attractor.

To fit CHARMIE's medium-term objective of using machine learning algorithms to solve all of the proposed tasks, a different implementation of the robot's autonomous movement using reinforcement learning was developed. Deep reinforcement learning is primed to revolutionise the field of artificial intelligence. It represents a step towards building autonomous systems with a higher-level understanding than any other learning technique. The application of deep reinforcement learning to robotics allows robots to learn control policies directly from real-world sensorial information through trial-and-error interactions. The tasks performed by service robots such as CHARMIE can be characterised by long-term planning, high-dimensional continuous action-space, and in most cases, incomplete information. Even though this is a problem for some reinforcement learning algorithms, novel solutions such as those presented in [50–53] already solve a wide range of simulated tasks similarly characterised. The algorithm selected to implement into CHARMIE's motion platform is based on Q-Learning [54,55]. The reinforcement learning setup consists of an agent (CHARMIE motion platform) interacting with the environment (simulated indoor environment) in discrete timesteps. At each timestep, the agent receives an observation, takes an action and receives a scalar reward. The agent uses its sensory information, the 2D LiDAR, to navigate towards the target position avoiding the obstacles that come up in its way, similarly to [56]. A mapless motion planner can be trained end-to-end with no manually designed features nor prior demonstrations through this reinforcement learning method. This trained planner can be directly applied to never before seen environments. In Figure 12, it is demonstrated the evolution of how the reinforcement learning algorithm, without any previous knowledge about the environment, solved three iterative complexity mazes. The first more straightforward maze is learnt just by trial-anderror, and the two following mazes use the knowledge gathered from the previous maze.

**Figure 12.** Trajectory evolution throughout the learning process of a differential robot autonomously solving a maze using Q-Learning. Image from [54].

#### 2.2.4. Human-Robot Interaction (User and Gesture Detection)

One of the biggest concerns regarding the CHARMIE project is to provide communication tools to ease communication with users, both older people and healthcare workers. To initialise a communication process, the robot must first recognise its users and their pose/gestures. The functionalities regarding user detection demonstrate solutions using the RGB-D camera, both with and without the depth image. The initial 2D visual user detection algorithm focuses on detecting faces, then crop and align the detected faces and recognise which previously trained, or new users are attempting to communicate. For this purpose, a Multi-task Cascaded Convolutional Network (MTCNN) [57] is used. It consists of three convolutional networks with different architectures with increasing complexity attached to each other, where the output of the previous net is the input of the next. The first network is known as P-Net (Proposal Network), which generates different proposals of where the faces must be, used as inputs to the following network. Next, the R-Net (Refinement Network) analyses the proposals and filters false positives. The final network, O-Net (Output Network), creates the final output, the detected face's image and its facial landmarks.

To classify the faces, the detected faces output by the MTCNN are introduced as input in an Inception-ResNet v1 neural network [58], transforming it into an image representation vector in space, also known as embedding. This network combines the Inception and ResNet architectures. The Inception architecture uses multiple inception modules, which have different convolutional-layers with different kernel sizes operating in parallel. These filter the same level layer in the architecture, concatenating in the next level, finding various features with fewer convolutional-layers. This process makes the network less deep without losing information. The Resnet network introduces residual connections. In traditional neural networks, a layer feeds data to the next one, but with this algorithm, it sends direct information to a deeper layer. These blocks improve two areas. The training time, given that skip-connections can jump layers without training and the lost information when making gradient descent since more deep networks have a hard time being accurate without overfitting or working more straightforward tasks.

The last step is implemented through an SVC-C (Support Vector Classification), with training and a test dataset. This supervised learning algorithm separates the different sample classes, known as embeddings, using a hyperplane where the margins are optimised through support vectors. Figure 13a shows an application of the MTCNN algorithm, detecting different famous people's faces. It was trained to detect 64 different personalities with the "Labeled Faces in the Wild dataset" [59]. Only using information from the 2D camera it could successfully identify all of the trained people on the test set with certainty percentages of over 60% in the worst cases. Additionally, Figure 13b shows the testing dataset's mapping with PCA (Principal Component Analysis), where the same label/output images are projected close to each other while distanced from different ones. For better visualisation purposes, the photos also have a colour differentiation filter. Even though it can detect 64 different personalities, to maintain the same levels of accuracy, CHARMIE is set to detect 20 different people.

**Figure 13.** (**a**) The MTCNN algorithm's output using "Labeled Faces in the Wild dataset" [59] of nine different personalities. (**b**) The PCA graph from Tensorboard from the MTCNN test dataset.

When updating to 3D user recognition technology, some works that use the same camera (Microsoft Kinect) [60] show algorithms that can overcome different face poses, expressions, illumination and disguises. Other solutions, such as FaceNet [61], demonstrate a deep convolutional network trained to directly optimise the embedding itself. The depth solution CHARMIE employs to recognise a small number of users uses various deep convolutional neural networks, similar to the one with just the RGB that integrates both the RGB and the depth image. Figure 14a demonstrates a classification example of two users using the depth information in addition to the RGB. The lack of different trained users can justify the significantly higher values of certainty compared to the 2D solution. However, some problems, such lightning and the pictures of users, can be successfully filtered with the depth camera's addition. An example of the lightning condition being filtered is shown in Figure 14a right image. Figure 14b shows the two images input to the neural networks, the RGB and the depth matrix.

**Figure 14.** (**a**) The output of the 3D user recognition system. The image on the right demonstrates the neural networks filtering bad lighting. (**b**) Two examples of RGB and depth images used as input to train the 3D user recognition system.

After recognising the users, the robot is establishing an interaction with, CHARMIE analyses their poses. Essential information can be interpreted from the pose: (i) the human's position, standing, sitting and laying, among others; (ii) an estimate of their intention to perform an action; (iii) whether they are pointing at something; and (iv) an analysis of their motion when doing collaborative tasks. Skeleton tracking processes depth image data to determine multiple skeleton joints' positions on a human body.

The cameras distinguish a human from the background and identify the position of several features or joints, such as the head, knees, elbows and hands. Once identified, the software connects the joints into a humanoid skeleton and tracks their position in real-time, providing the X, Y and Z coordinates for each of the skeleton points. The addition of depth cameras [62] allows the skeleton tracking system to remove uncertainties between overlapping or occluded objects or limbs, making the method more robust to different lighting conditions than a 2D camera-based algorithm [63]. The algorithm used is based on the Skeleton Tracking SDK by cubemos. It provides fast and highly accurate 2D and 3D human pose estimation that allows tracking of 18 joints simultaneously (two ankles, two knees, two hips, two wrists, two elbows, two shoulders, one between the chest and the neck, one nose, two eyes and two ears). Due to the artificial intelligence algorithms, it can track up to five people in real-time. Figure 15 shows the skeleton-fitting pose estimation overlaid on the users. It can detect high-speed movements and estimate a joint location, even when hidden from the camera. Figure 15c shows that the robot with pose only estimation can detect when a person raises their arm in the air to ask for assistance.

**Figure 15.** Three different situations (**a**–**c**) where CHARMIE estimates the humans' pose to further analyse whether the users require its help, their position, their movement and their intention to act.

The joints tracked by the algorithm also allow the agents to communicate or transmit information using different gestures. Some technologies like optical flow provide solutions to track movement between images. This demonstrates the movement from people or animals, as long as it is in a different direction from the camera's movement and allows 2D verification of the movement direction. FlowNet2 [64] presents an end-to-end solution based on convolutional neural networks to estimate optical flow. It uses various methods that allow estimating movements at both quick and slow speeds. All of the methods have different purposes, and their combination, despite providing good results, is very computationally expensive. So, LiteFlowNet [65] introduces a different end-to-end convolutional neural network architecture to estimate optical flow. This network, used by CHARMIE for gesture recognition, uses an optimised neural network structure whose goal is to have results with the precision of FlowNet, but with a lower computational expense. In Figure 16, the human agent performs a gesture, from (a) to the (c), where it shows the palm of the hand to the robot, then closes the fist and brings it to its chest. The robot can detect different sets of movements that can be configured and associated with a specific task the user intends the robot to do.

**Figure 16.** Optical flow image from a known gesture. From (**a**–**c**), the user shows the palm, then closes it and brings it to its chest. The different colours represent both different moving directions and speed.

Apart from the visual human-machine interfaces, the robot has two more communication systems. One is through speaking and listening, similar to a human-human conversation. CHARMIE uses CMU Sphinx tools from Carnegie Mellon University for speech recognition and Emic 2 Text-to-Speech Module to perform text to speech conversion. All conversations with the robot must be made using the English language. The most significant advantage of this method is that the users do not need any prior knowledge regarding the robot to successfully communicate with it. The robot recognises sequences of keywords from the human. Even though a robot may not understand the entire conversation from a user, it understands keywords and confirms if the information received is correct by asking the user if what is understood is the correct answer. Some examples of keywords are names of its users, names of different rooms, names of objects, and actions. The robot's response may vary according to its perception, location, priorities, whether it is performing a task or moving somewhere. Every sentence a user says to the robot must start with the word "CHARMIE", so the robot knows whether the conversation is towards it or not. In the video, in Appendix A, it is possible to see CHARMIE introducing himself and some conversations with users.

The last human-machine interface the robot has is a multimodal user interface. In environments where there is significant noise level, the user presents difficulties speaking, or the user cannot make a predefined gesture, this system can be used. A tablet with a menu lets users select all features that the previous human-robot interactions presented at the robot's torso, with the addition of being available in languages other than English.

#### 2.2.5. Object Detection and Subsequent Manipulation

For learning and recognising objects, both healthcare-related and household items, CHARMIE uses the supervised learning algorithm named YOLO (You Only Look Once) [66]. YOLO is a state-of-the-art, real-time object detection system known for its high-speed and accuracy. The YOLOv3 [67] algorithm starts by separating the image into a grid. Each grid cell predicts several boundary boxes around objects that score highly with predefined classes. Each boundary box has a respective confidence score of how accurate it assumes the prediction must be and detects only one object per bounding box. The boundary boxes are generated by clustering the ground truth boxes' dimensions from the original dataset to find the most common shapes and sizes. Unlike other models, YOLO looks at the entire image when testing, so its prediction reflects the image's global context. It makes predictions with a single network evaluation, unlike systems such as R-CNN which requires thousands evaluation systems for a single image. This makes it extremely fast; more than a thousand times faster than R-CNN and a hundred times faster than Fast R-CNN. However, YOLO is not ideal to use with models where large datasets may be hard to obtain. Even with its high speed, YOLO is not fast enough to run on embedded devices such as a Raspberry Pi. To help make YOLO even faster, the algorithm creators defined a YOLO architecture variation called Tiny-YOLO. This architecture is approximately 442% faster than YOLO and can achieve 244 FPS on a single GPU. Since Tiny-YOLO is a more compact version, this also means that it is less accurate. The architecture that is used in YOLOv3 is called DarkNet53 [67]. With its 53 layers of convolutions and no max-pooling, its main job is to perform feature extraction. A BatchNormalization and a leaky RELU follow each convolution operation. Darknet53 architecture is proved to be an extremely efficient network regarding object classification. CHARMIE uses Tiny-YOLOv3 architecture to detect different healthcare-related and household objects. Figure 17 shows some of the things that the robot has already learnt to detect, like bottles, cans and bags of chips. To introduce new objects into CHARMIE's database, CHARMIE records a video of the item, collecting all the frames. This information is later used to retrain Tiny-YOLOv3.

y

**Figure 17.** Two different outputs (**a**,**b**) from the YOLO detection algorithm that detect and locate various pre-trained household objects simultaneously in real-time.

To grasp the detected objects, CHARMIE uses its redundant manipulator, the robot arm. Some objects with unusual shapes and whose shape changes, such as a bag of chips, have different programmed collection algorithms. Nevertheless, for most of the items, the picking-up system is similar. After detecting the desired object, CHARMIE calculates the inverse kinematics to place the robot arm and the lifting mechanism in the best position to collect the object. Then the hand-closing is performed according to the object's physical properties. Figure 18 demonstrates an example of picking a can. As stated, the robot moves its platform and lifting mechanism to best fit the item's position, as can be seen in Figure 18a. The arm is moved right next to the object that wants to be manipulated, as can be seen in Figure 18b, and moving the hand right next to it, as can be seen in Figure 18c. In this case, it starts by using the thumb as a back wall, as can be seen in Figure 18d, and then it starts closing the fingers one by one from the index finger to the little finger, as can be seen in Figure 18e. This movement allows the robot to pick up the can, similar to how a human picks it up, as can be seen in Figure 18f.

**Figure 18.** A step-by-step demonstration of CHARMIE's process of picking the desired item to furtherly be used in subsequent tasks.

#### **3. Results**

CHARMIE performs a wide range of tasks as a service and assistant robot combining the four low-level functions previously stated. When developing algorithms to perform new tasks, the central focus lies mostly on how helpful these are for elderly or healthcare workers. By aiding both, CHARMIE can provide a higher quality of life for older people. Some central problems the robot tries to tackle regard tasks: (i) where older people have a lack of mobility (picking objects from the floor); (ii) where they have a lack of strength (carrying heavy objects); (iii) that are safety-related (if a person falls and cannot get up); and (iv) that happen on a day-to-day basis and that may require a lot of energy and include injury risks. As stated, CHARMIE can perform a wide variety of tasks. Of those, five chores that encompass different scenarios and different interactions are thoroughly explained.

#### *3.1. Help Me Carry This Bag*

One of the most severe difficulties for older people is carrying heavy objects. A common practice among the elderly is to shop for groceries with higher regularity, which translates to less weight but more trips to the stores. For now, CHARMIE is only intended for indoor use, with some exceptions. Thus, groceries-carrying tasks are idealised to receive the bags at the environment entrance and be further transported to the desired location. Figure 19 shown all the steps CHARMIE underwent to complete this task. Additionally, this task can be seen in the video in Appendix A.

**Figure 19.** A step-by-step demonstration of CHARMIE's "Help me carry this bag" task. From receiving a task from the first user, collecting the bag, navigating to the desired location and hand-delivering it to the second user.

The user starts by initialising the dialogue requesting help from the robot to collect and transport the grocery bags, as can be seen in Figure 19a. It must indicate where the robot must transport the loads, and optionally if these must be given to another known user. After receiving this information, CHARMIE expresses that it is ready to receive the bag for transportation, and asks the user to place it in its hand, as can be seen in Figure 19b. The robot detects when the user places the load in its hand through human pose estimation and confirms it via the force the arm actuators' must provide, as can be seen in Figure 19c. Furthermore, the robot starts moving to the delivery location while performing safe navigation with static and dynamic obstacle avoidance, as can be seen in Figure 19d. If the robot does not have to deliver the grocery bags to another user, it analyses the table surface and tries to find an empty spot to drop the bags. In the situation of not having an open space for the bags, the robot would wait for a user to clarify where the bags should be placed. If the robot's task is to hand in the bag to a human user, it searches for the user when arriving at the desired location. If no user is there to receive it, the robot tries to place it on the table. Otherwise, the robot moves in the user's direction and asks if he/she is available to receive the bag, as can be seen in Figure 19e. If the answer is no, the robot patiently waits until the user is available. When the user confirms its availability, the robot extends its arm in the human's direction and asks the user to collect the bag, as can be seen in Figure 19f. Afterwards, the robot returns to the initial user to verify whether there are more grocery bags to transport.

#### *3.2. Can You Find This Item and Bring It to Me*

With ageing, older people tend to have more difficulties getting up from a sitting position. Additionally, with mental health deterioration, there is a tendency for memory to start failing and forgetting an object's placement. One of the main chores CHARMIE performs is to collect things and return them to the user. Some examples of objects might be medicine boxes, cans, bottles, cellphones and remote controllers. A variation of this task is when a user does not know the exact location of the object it asks the robot to retrieve. In the example provided for this task, a scenario is presented where the user is laying in bed, not feeling very well and needing to drink some water.

The user starts by calling CHARMIE, and upon arrival the robot sees that the person is laying in bed and stays alert to the task it must do. Through dialogue, the user indicates that he/she is not feeling very well, and cannot get up, but needs to drink some water. In this situation, the user can say where the drink is or say the location and the robot must find it. CHARMIE starts moving towards the kitchen through safe navigation. When arriving at the kitchen, if the user stated where the drink is, the robot goes directly to that location. If the user did not define the area, the robot must look around the kitchen searching, starting with open spaces like the counters, tables and open shelves. After detecting the specified drink, the robot picks up the object with its redundant manipulators and, if possible, closes the opened kitchen cabinets. In Figure 18a scenario is displayed where CHARMIE has various drinks and objects placed on a table, and it must analyse and collect the required drink. The robot returns to the user and asks if it is available to receive the drink. When the user is available, the robot extends the arm in the human's direction, and waits for the user to collect it through pose estimation. After finishing, the robot asks whether the user is feeling better and if it needs anything else.

#### *3.3. Check on the Patients*

One of the major causes of serious injuries in the elderly has to do with falls. The reflexes to protect themselves start to degrade, and older people lose balance and movement capabilities, which may cause serious falls. In situations where the person lives alone, this is extremely dangerous, since the person is only analysed for potential injuries when someone goes to their house. One of CHARMIE's most safety-related tasks consists of patrolling indoor areas and, through pose estimation, understanding if a person is in danger. The robot can patrol nursing homes or hospitals at night to check if the patients are laying on

their beds, lowering the detection time for an older person who fell. CHARMIE can also check if a person is not on the bed or even sitting down needing some assistance. In cases where older people live without any health professional, the robot can quickly send an alert message to the emergency contacts.

In the nursing home patrolling task, CHARMIE uses the known environment mapping where the patient rooms are defined. By patrolling the bedrooms, the robot starts by very calmly opening or pushing the door and slowly moving inside the bedroom to not scare or wake up the patients. As displayed in Figure 20, the robot may encounter various scenarios when estimating the human's pose to evaluate patient safety.

(**a**) (**b**)

**Figure 20.** The different possible scenarios CHARMIE may encounter when patrolling nursing homes and the respective pose estimation output. (**a**) shows an elderly person laying in bed over the covers. (**b**) shows an elderly person laying under the covers, covered from the waist down. (**c**) shows an elderly person laying under the covers, covered from the neck down with one arm outside the covers. (**d**) shows an elderly person sitting in bed. (**e**,**f**) show an elderly person who fell from the bed and is laying on the floor.

The first possible scenario is the detection of a patient laying in bed. However, this scenario may present some different variations. The first, displayed in Figure 20a, shows the pose detection of a patient laying on the bed without any covers. The second, shown in Figure 20b, demonstrates a patient laying in bed but covered from the waist down. The third, Figure 20c, shows a patient laying in bed but covered from the neck down with an arm also showing. In the first two cases, the pose estimation algorithm can detect properly that the user is safely laying in bed, but the third example does not always detect successfully. Thus, when the robot cannot estimate the pose, it analyses the bed height variation to differentiate between the patient laying in bed and the bed being empty. If the patient is not laying in bed and no pose estimation is made inside that room, two different situations may occur, the patient is either missing or sleeping deep inside the covers. If the patient is laying on the bed, the robot calmly leaves the room and slightly closes the door. By analysing the bed height variation, should it detect there is no patient in the room, it sends an emergency warning to the healthcare workers that a patient might be missing. Another scenario results when a patient is sitting on the bed, Figure 20d, and requires non-emergency help. In this case, CHARMIE also summons a healthcare professional but in a different way to the first case. The last scenario displays the worst case: an elderly person has fallen out of bed and cannot get up, as can be seen in Figure 20e,f. If CHARMIE detects this scenario, it immediately sends an emergency message to all of the healthcare professionals. This recognition process is repeated throughout all of the rooms in the nursing home. For an older person who lives by themselves, this recognition system can be temporarily programmed.

#### *3.4. Store the Groceries" or "Clean up the Room*

CHARMIE can also execute some tasks regarding cleaning and tidying up. Even though these tasks do not require carrying heavy loads, they involve a different physical restriction that may end in injury. When tidying a room, older people may have to place themselves in positions, such as picking objects from the floor or stretching to reach the top or bottom shelves that may lead to accidents. With the goal of helping the elderly, and at the same time freeing healthcare workers to focus on more critical patient-related chores, CHARMIE can analyse these environments that need to be cleaned. By perceiving which objects are out of place, the robot can collect and place those in areas previously indicated. To describe this task, two different variations are presented.

To store some groceries laying on the kitchen table, the correct places for every object need to have been previously indicated to the robot. After the bags have been unloaded, the robot analyses the various things it must store. If an object is not in the robot's database, CHARMIE asks the user the correct spot to place the item. When the robot is storing this product, it captures images from different angles to later add to its training data. The remaining known products are packed in the correct place using CHARMIE's redundant manipulators. If the robot comes across a full shelf when attempting to place an item, the robot returns the item to the kitchen table and informs the user.

When cleaning a specific room, the robot starts by analysing the environment, extracting all the information regarding all objects on the floor or tables. With its depth image, the robot can differentiate objects from flat surfaces and save their location. Similarly to the storing groceries example, the robot has to already know the object's place. Contrary to the groceries examples, the places where the items go are varied height-wise, which forces the robot to adapt its height to successfully manipulate the out of place objects.

#### *3.5. Follow Me" and "Lay the Table*

Another beneficial task of a service robot is to follow a user who needs help performing a task. At times, CHARMIE might be in a different compartment than where the user needs it to complete the next job. Therefore, to ease the whole process, the user might request CHARMIE to follow him/her to the new room. This chore is particularly useful in large indoor environments where following a user is significantly more efficient than looking for the worker. In the following process, the robot can avoid static and dynamic obstacles or even calculate a new trajectory when it realises it cannot pass.

In Figure 21, extracted from the video in Appendix A, all of the steps regarding the "Follow Me" task can be seen. The user starts by asking the robot to follow him/her to navigate to the compartment where help is needed, as can be seen in Figure 21a. The robot uses both the 2D LiDAR and the RGB-D camera to lock the user it is following, as can be seen in Figure 21b. During the process, the robot can navigate narrow spaces and even avoid collisions with humans that pass between the followed user and CHARMIE, as can be seen in Figure 21c,d. Almost at the end of the route, a new obstacle comes up, the robot detects that the followed user has an insuperable barrier similar to a wall (represented by the black cardboard demonstrated in the video in Appendix A that the robot cannot surpass, as can be seen in Figure 21e. Thus the robot, using the environment's map, calculates an alternative route to get back to its user, as can be seen in Figure 21f. In the return path, CHARMIE finds novel obstacles that it can successfully overcome.

(**a**) (**b**)

**Figure 21.** A step-by-step demonstration of CHARMIE's "Follow Me" task. From receiving the task to navigating behind the human user while avoiding static and dynamic obstacles that cross the path between the robot and the user. When the robot cannot continue following the user, it recalculates a new trajectory.

As an example of a collaborative duty, the robot can lay the table with a human user. The methodology used is similar to the storing/cleaning tasks but with the collaboration twist. When this task is selected, CHARMIE knows that it needs five items to lay on the table: a plate, a fork, a spoon, a knife and a cup. The robot analyses the objects already laid on the table by a worker or a human user. Furthermore, it complements the user's work by signalling which items it will distribute while analysing whether the human user forgot one item on a previously laid set.

#### **4. Conclusions**

A description of hardware and software solutions for healthcare and domestic collaborative service and assistant robot, CHARMIE, is presented in this article. Additionally, results from the development of the initial prototype and the first set of user trials in a controlled laboratory setting focusing on developing an assistive care robot for older adults are described. The focus of the chores presented displays several different scenarios where CHARMIE can directly impact the quality of life of older people, mainly regarding physical safety, but also concerning social interactions and mental health. The majority of the robot's tasks designed for geriatric care involve fall detection and prevention, such as patrolling through nursing or domestic homes and picking up objects from the floor. In addition to the tasks demonstrated, CHARMIE can also work as a social company robot, asking questions throughout the day, allowing the user to play some games in its multimodal user interface and reminding the users of their schedules. Even though most tasks mainly aid the elderly, these chores can be adapted to be more oriented to healthcare workers or people with reduced mobility.

From a different perspective, due to the difficulties brought by the COVID-19 pandemic and its associated lockdowns, robots such as CHARMIE provide a safer solution to overcome this disease's challenges. Robots provide a superlative solution, since the virus cannot replicate itself in a robot as it does in a human and drastically reduces person-toperson contacts. Elderly care facilities and all other healthcare-related environments are particularly at risk of heavy breakouts since these encompass a very vulnerable population. Without a robotic solution, residents without the virus inside such facilities face a higher chance of contamination, which may happen through asymptomatic healthcare professionals. In addition to all of the social and mental health tasks social robots can perform, service and assistant robots such as CHARMIE can provide more direct help not only to patients, but also to healthcare professionals in a wide range of healthcare facilities. The longevity of the virus has dictated that all healthcare workers undergo very long working shifts, with low sleeping schedules, and in many situations being away from their families. This can lead to healthcare workers reaching a state of high fatigue and burnout in overburdened health systems, which can be eased if some of the tasks are performed by healthcare robots such as CHARMIE. Some examples of tasks are: transporting goods, providing patients information, patrolling the facilities, and sending an alert when any unexpected patient behaviour is detected. Additionally, due to the COVID-19 pandemic, the desired tests meant to be performed on real world environments, in this case in two nursing homes, two domestic homes and one hospital, initially scheduled for 2020 had to be indefinitely postponed as mandated by the national public health committee.

As short-term and middle-term objectives, novel concepts and tasks are projected to be developed and implemented in CHARMIE. The robot can successfully detect falls and prevent some scenarios where older people might fall by picking up items from the floor. However, one of the most significant difficulties regarding movement and mobility of older people happens when trying to get up from a sitting position, which is more aggravated in single-person households. This scenario happens several times during the day: getting out of bed, after a meal, when getting up from the sofa where the elderly do not have enough strength or apply too much force and lose balance, which results in dangerous falls. The goal is to provide active help to older people when trying to stand up. Another goal is to increase the robot's working area in buildings with more than one floor. Since

these healthcare facilities are commonly prepared for people with reduced mobility or wheelchairs, it is uncommon to have rooms that can only be accessed through stairs. Since the motion platform cannot overcome stairs, the goal is to create a system that allows the robot to successfully move between floors using elevators. The map would consist of all the floors in the building, and the robot must move to the elevator to switch floors. However, this task encompasses many steps that are still being worked on, such as calling the elevator, pressing the correct buttons and entering and leaving without colliding with other users. In order to benchmark all of these different tasks, it is intended to apply for RoboCup@Home participation. The CHARMIE project had already done so in 2017, as the video in Appendix A is the qualification video, part of the necessary qualification material. The video demonstrates CHARMIE performing some of the tasks previously described.

The desirable long-term goals for the CHARMIE project can be divided into two categories. From a technological perspective, it is intended to create the necessary hardware and software solutions to transport broader wheeled objects such as hospital beds and wheelchairs that may have patients. It is extremely challenging to move these wheeled platforms with human patients on top. The robot must adapt its omnidirectional way of motion to the wheeled object, detect obstacles further away and analyse the patient pose. All this must be performed while considering all the strong safety measures that come with patient transportation. Additionally, as previously stated, it is intended to use machine learning algorithms for a wide range of tasks allowing the robot to learn how to perform and improve chores via observation and trial-and-error direct interaction with the environment. However, these household and healthcare tasks usually consist of long-term planning, high-dimensional continuous action-space, simulation to real-world transition problem and, in most cases, incomplete information. Such issues require highly complex reinforcement learning algorithms to be solved. This methodology enables adaptative learning, using reinforcement learning based service and assistive elderly care chores like those previously described. From a global perspective, it is planned to start thoroughly testing CHARMIE in real healthcare and domestic environments. All functions, such as map building, self-localisation, obstacle detection and avoidance, human-robot interaction, object detection and manipulation, will be executed in a hospital, two nursing homes, and two domestic houses. This will allow a qualitative and quantitative evaluation from the developers and users of how the robot can perform in such environments. The final results will comprise a fitting framework for a socially assistive robot for end-to-end chores whose final goal is to enhance all its users' quality of life.

**Author Contributions:** Conceptualization, I.S.G., G.L. and A.F.R.; Methodology, T.R. and F.G.; Software, T.R., G.L. and A.F.R.; Validation, I.S.G., G.L. and A.F.R.; Formal Analysis, T.R. and F.G.; Investigation, T.R., F.G. and I.S.G.; Resources, T.R., F.G. and I.S.G.; Data Curation, F.G. and I.S.G.; Writing—Original Draft Preparation, T.R. and I.S.G.; Writing—Review and Editing, F.G., G.L. and A.F.R.; Visualization, I.S.G.; Supervision, G.L. and A.F.R.; Project Administration, G.L. and A.F.R.; Funding Acquisition, T.R., F.G., G.L. and A.F.R. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work has been supported by FCT—Fundação para a Ciência e Tecnologia within the R&D Units Project Scope: UIDB/00319/2020. The author T.R. received funding through a doctoral scholarship from the Portuguese Foundation for Science and Technology (Fundação para a Ciência e a Tecnologia) [grant number SFRH/BD/06944/2020], with funds from the Portuguese Ministry of Science, Technology and Higher Education and the European Social Fund through the Programa Operacional do Capital Humano (POCH). The author F.G. received funding through a doctoral scholarship from the Portuguese Foundation for Science and Technology (Fundação para a Ciência e a Tecnologia) [grant number SFRH/BD/145993/2019], with funds from the Portuguese Ministry of Science, Technology and Higher Education and the European Social Fund through the Programa Operacional do Capital Humano (POCH).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** The authors of this work would like to thank the members of the Laboratory of Automation and Robotics from the University of Minho as well as all former CHARMIE project members.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A. Video Link**

Link to the MinhoTeam qualification video for RoboCup@Home 2017 in Nagoya, Japan. It displays some of the tasks described in this article and all of CHARMIE's functionalities: https://youtu.be/n0ZariVsIj0 (accessed on the 2 April 2021).

#### **References**


## *Article* **An Algorithm for Painting Large Objects Based on a Nine-Axis UR5 Robotic Manipulator**

**Jun Wang 1,\*, Mingquan Yang 1, Fei Liang 1, Kangrui Feng 1, Kai Zhang <sup>1</sup> and Quan Wang <sup>2</sup>**


**Featured Application: The proposed algorithm for painting large objects based on a nine-axis UR5 robotic manipulator can be applicable in many automobile repair shops where paint jobs can be performed. With the help of a nine-axis UR5 robotic manipulator with the proposed algorithm, vehicles can be automatically painted with the least amount of human manual labor. Simultaneously, the quality and efficiency of the paint jobs can be drastically improved, since the UR5 robot maintains its consistency, accuracy, and proficiency while conducting paint jobs.**

**Abstract:** An algorithm for automatically planning trajectories designed for painting large objects is proposed in this paper to eliminate the difficulty of painting large objects and ensure their surface quality. The algorithm was divided into three phases, comprising the target point acquisition phase, the trajectory planning phase, and the UR5 robot inverse solution acquisition phase. In the target point acquisition phase, the standard triangle language (STL) file, algorithm of principal component analyses (PCA), and k-dimensional tree (k-d tree) were employed to obtain the point cloud model of the car roof to be painted. Simultaneously, the point cloud data were compressed as per the requirements of the painting process. In the trajectory planning phase, combined with the maximum operating space of the UR5 robot, the painting trajectory of the target points was converted into multiple traveling salesman problem (TSP) models, and each TSP model was created with a genetic algorithm (GA). In the last phase, in conformity with the singularities of the UR5 robot's motion space, the painting trajectory was divided into a recommended area trajectory and a non-recommended area trajectory and created by the analytical method and sequential quadratic programming (SQP). Finally, the proposed algorithm for painting large objects was deployed in a simulation experiment. Simulation results showed that the accuracy of the algorithm could meet the requirements of painting technology, and it has promising engineering practicability.

**Keywords:** genetic algorithm; principal component analyses; standard triangle language; traveling salesman problem; trajectory planning

#### **1. Introduction**

In recent years, UR5 robots have been widely popularized in industrial production fields such as painting, assembly, and micromanipulation [1]. In the above-mentioned fields, the painting process is the integral manufacturing procedure for automatically coating large objects, and it is one of the essential technologies for improving the surface quality of painted objects, which can then offer better performances under different working conditions [2–4]. Painting feasibility and trajectory planning are critical in the painting field.

To paint large objects, some engineers tried to mount UR5 robots on a mobile platform to enlarge the workspace of the UR5 robots [5,6], while other engineers tried to widen

**Citation:** Wang, J.; Yang, M.; Liang, F.; Feng, K.; Zhang, K.; Wang, Q. An Algorithm for Painting Large Objects Based on a Nine-Axis UR5 Robotic Manipulator. *Appl. Sci.* **2022**, *12*, 7219. https://doi.org/10.3390/ app12147219

Academic Editors: Pedro Neto, António Paulo Moreira and Félix Vidal

Received: 7 June 2022 Accepted: 14 July 2022 Published: 18 July 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

the workspace by changing the structure of the UR5 robots [7]. However, the method of changing the structure of the UR5 robots entails enormous cost, and the UR5 robot with changed structure offers low adaptability. Thus, this paper proposes a UR5 robot which is mounted on an *zyz* three-axis motion platform to achieve the painting of large objects.

A well-designed trajectory can improve painting efficiency. Bureerat, S. et al. employed the MRPEIL-DE algorithm to optimize the trajectory of a six degree-of-freedom (DOF) UR5 robot [8]. Yin, S. et al. utilized mechanical learning methods to devise an energy-saving trajectory for industrial UR5 robots [9]. Serralheiro, W. et al. proposed employing a nonbased trajectory planning method for time-energy optimization of a completely wheeled mobile UR5 robot [10]. Kazim, I.J. et al. compared improving the artificial potential field (APF) by the traditional particle swarm optimization (PSO) algorithm and the serendipitybased PSO (SBPSO) algorithm to control the path of a universal robot UR5 with collision avoidance [11]. Kazim, I.J. et al. utilized differential evolution (DE) optimization with the MATLAB toolbox, which has an applied robot operating system (ROS), to quantify the contour tracking performance of a collaborative universal manipulator robot (UR5) [12]. Al-Shanoon, A. et al. proposed a novel reliable framework for deep ConvNet combined with visual servoing using a single RGB camera [13]. Balanji, H.M. et al. proposed a novel calibration framework based on a single camera and computer vision techniques using ArUco markers [14]. Vivas, A. et al. designed the implementation of the control of a real UR5 robot from Matlab/Simulink using ROS [15]. Araki, R. et al. proposed a 6D pose estimation method for an object from a single RGB image for a robotic grasping task [16].

Nonetheless, most of the above-mentioned algorithms neither consider the singularities of the kinematics of the UR5 robot's joints nor meet the requirements of the painting process. Therefore, this paper proposes a painting algorithm which can generate a painting trajectory satisfying the painting of large objects. The painting algorithm employs the standard triangle language (STL) file, algorithm of principal component analyses (PCA), and k-dimensional tree (k-d tree) to create a digital model of the object to be painted. The digital model is converted into multiple traveling salesman problem (TSP) models, and each TSP model is created with a genetic algorithm (GA). The painting trajectory offers high precision and efficiency.

#### **2. Construction of a Point Cloud Model of the Target Object**

STL files are popular in computer graphics application systems, and their file formats are simple and widely harnessed in machine vision and 3D reconstruction [17,18].

#### *2.1. STL File Parsing*

A STL file consists of multiple triangles and can be divided into the American Standard Code for Information Interchange (ASCII) format and binary format as per the storage format. The STL file obtained in this paper was suitable for ASCII. The analysis of a STL file in the ASCII format is shown in Figure 1.


**Figure 1.** Parsing of a STL file in ASCII format.

In Figure 1, *ηi*, *ηj*, and *η<sup>k</sup>* represent the *x*, *y*, and *z* components of the triangle patch normal vector, respectively; *vi*x, *vi*y, and *vi*<sup>z</sup> represent the *x*, *y*, and *z* coordinates of the *i*-th triangular patch, respectively.

#### *2.2. Filling Triangles*

The STL file only contains the coordinates of the vertices of the triangles, and the coordinates of all points in the model cannot be obtained. To obtain the full 3-dimensional information of the model, the triangles must be filled. This paper utilized the depth-first search (DFS) to fill the triangles. The triangle is represented by *S*. The center of the triangle is represented by o. Lengths of the 3 sides of the triangle is represented by *l*1, *l*2, and *l*3. The triangle *S* is later separated into 3 smaller triangles *S*1, *S*2, and *S*3. The algorithm model and the main flowchart of the filling process are shown in Figures 2 and 3.

**Figure 2.** The model of the algorithm that fills triangles inside an STL file.

**Figure 3.** The filling process of the DFS algorithm.

The stopping condition of DFS recursion is that the minimum height corresponding to the three sides of the triangle is less than the specified minimum height. This requirement is shown in Equation (1), where the minimum triangle height *h*min is given.

$$S \gg \max(l\_1, l\_2, l\_3) \cdot h\_{\min}/2 \tag{1}$$

The area of the triangle in Equation (1) is difficult to directly calculate. Therefore, Heron's formula, which is shown in Equation (2), is introduced, and the area is calculated from the lengths of the sides. The recursive stop condition is shown in Equation (3).

$$\begin{array}{l} S = \sqrt{\varepsilon (\varepsilon - l\_1)(\varepsilon - l\_2)(\varepsilon - l\_3)}\\ \varepsilon = (l\_1 + l\_2 + l\_3)/2 \end{array} \tag{2}$$

$$S \ge 2 \cdot \max(l\_1, l\_2, l\_3) \cdot h\_{\min} \tag{3}$$

The initial vertices are randomly selected as (10, 10, 0), (520, 100, 0), and (260, 410, 0), and *h*min equals 0.05. The filling results are shown in Figure 4a.

**Figure 4.** (**a**) The triangle filling results; (**b**) the improved triangle filling results.

However, there are a large number of unfilled lines in Figure 4a, and the filling effect is not satisfactory. After the filling process had been analyzed, it was found that the point filling method was harnessed in the filling, which could result in the points on the straight lines from the center of the triangle to the vertices of the triangle not being filled. If the number of filling points needs to be increased, the coordinates of the recorded point during the filling process can be changed into the coordinates of each point above the line connecting the recorded point and the vertex of the triangle with the same initial value condition and iteration termination condition. The final filling result is shown in Figure 4b. Compared to Figure 4a, Figure 4b is better filled, but the time taken to fill Figure 4b was much longer than to fill Figure 4a. If the accuracy is not high or the time is short, point filling is recommended. For high-precision conditions, linear filling is recommended.

#### *2.3. Determining the Normal Vector of a Target Point on the Painting Trajectory*

In order to ensure that the painting area is uniform during the painting process, it is necessary to ensure that the axis of the end of the spray gun coincides with the normal vector of the target point on the painting trajectory. Thus, the normal vectors of the target model must be obtained. As is shown in Figure 3, the target points on the painting trajectory can be divided into two types, which include the trajectory points obtained through the filling algorithm and the vertices of the original triangle.

The points obtained by filling are still coplanar with the original triangles. The normal vectors of the points obtained by filling can be determined with the normal vector of the plane. The normal vectors of the triangles can be directly extracted from the STL file. However, for the normal vectors of the vertices of the original triangle, there is a common vertex of multiple triangles. Ergo, the normal phase of the plane cannot be employed

instead of the normal phase of the points. In this paper, local plane fitting and principle component analysis (PCA) estimation methods were harnessed to create the normal vectors of the target points [19–21]. In order to acquire the target point field conveniently, the k-d tree algorithm was harnessed to store the coordinates of the target points [22]. The step-by-step algorithm is shown in Figures 5 and 6.

1. A local point that is less than *r* from the target point is taken.

2. A sphere is created by employing the local points with a target point as the origin of the 3 dimensional Cartesian System and the coordinates of a local point with the target point being the origin are *x*', *y*' and *z*'.

3. The tangent plane is fit with the local points. On the tangent plane: <sup>222</sup> 1 *xyz nnn* ++= .

4. The minimum sum of the squares of the distances between the normalized local points and the tangent plane is taken as the objective function.

5. Feature matrix *S* is constructed as per the objective

function: ª ºª º « »« » <sup>=</sup> ¬ ¼¬ ¼ " " " " " " 7 Q Q Q Q Q Q [[ [ [[ [ V \[ [ \[ [ ][ [ ][ [ 6. The eigenvector corresponding to the minimum eigenvalue of the *S* matrix is taken as the normal vector of the points.

**Figure 5.** The PCA method to estimate the target point normal vector.


**Figure 6.** The k-d tree stores cloud data.

In the actual processing procedure, the normal vector of the target model should take its external normal phase, and the external normal vector of the model can be determined

by Equation (4), where ¯ **p** is the center of all target points, and ||*x*||<sup>2</sup> is the second norm of the vector **x**.

$$\mathbf{n} = \begin{cases} \mathbf{n} \| \| (\mathbf{p} + \mathbf{n}) - \overline{\mathbf{p}} \| \| \_2 \ge \| (\mathbf{p} - \mathbf{n}) - \overline{\mathbf{p}} \| \_2 \\\ -\mathbf{n} \| \| (\mathbf{p} + \mathbf{n}) - \overline{\mathbf{p}} \| \_2 < \| (\mathbf{p} - \mathbf{n}) - \overline{\mathbf{p}} \| \_2 \end{cases} \tag{4}$$

#### *2.4. Determining the Pose of a Target Point on the Trajectory*

In the actual painting process, the pose matrix is often deployed to describe the rotation of the target point in space. During the inverse kinematics calculation of the UR5 robot, the pose of the target point needs to be described by the matrix, and Rodrigues' rotation formula can determine the pose of the target point by the axis direction of the target point.

The diagram of Rodrigues' rotation formula is shown in Figure 7, where *X*1, *Y*1, and *Z*<sup>1</sup> represent the three axes, with *O* being the origin of the 3-dimensional Cartesian system after rotating.

**Figure 7.** The schematic diagram of Rodrigues' rotation formula.

The rotation matrix *R* is shown in Equation (5),

$$\mathbf{E}\cos\theta + (1-\cos\theta)r^T \times r + \sin\theta \times \begin{bmatrix} 0 & -r\_z & r\_y \\ r\_z & 0 & -r\_x \\ -r\_y & r\_x & 0 \end{bmatrix} \tag{5}$$

where **E** represents the third-order identity matrix, and *θ* represents the angle created by vectors **n** and **n0**. In Equation (5), **r** is the vector product of vectors **n** and **n0**; **n0** = [0 0 1]T, and **n** is the normal vector of the point.

#### *2.5. Compression of the Data of Target Points*

The design of the painting process parameters ensures that the end of the spray gun will produce a circle with a radius of *r* at a specified distance. During the painting process, it is necessary to ensure that the trajectory coverage during painting is 0.4. The painting model is shown in Figure 8, and the coverage of Figure 8, denoted by *p*, is shown in Equation (6).

$$p = \frac{4r^2 \cdot \arccos(d/2r) - d\sqrt{r^2 - d^2/4}}{2\pi r^2} \tag{6}$$

**Figure 8.** The painting model.

When the painting trajectory is executed, not all target points are to be painted. Only the path of key points should be painted. The distance between key points should meet the requirements of painting uniformity. In Figure 8 and Equation (6) *d* is employed to reduce the time and space complexity of the trajectory planning. The compression formula for the data points is shown in Equation (7).

$$\begin{aligned} k &= floor \left( d / \left( \sqrt{3} \right) \right) \\ P\_{\mathbf{x}} &= round \left( P\_{\mathbf{x}} / k \right) \cdot k \\ P\_{\mathbf{y}} &= round \left( P\_{\mathbf{y}} / k \right) \cdot k \\ p\_{\mathbf{z}} &= round \left( p\_{\mathbf{z}} / k \right) \cdot k \end{aligned} \tag{7}$$

where *P*x, *P*y, and *P*z represent the *x*-*y*-*z* coordinates of the target points in the point cloud, *floor* represents the floor function, which takes as input a real number *q*, and gives out as output the greatest integer less than or equal to *q*, and *round* represents the round function, which rounds off a numeric value to its nearest integer.

#### **3. Nine-Axis UR5 Robot Forward Kinematics Model**

The space that the UR5 robot covers is limited, and large-size objects may not be painted at once. When an area that needs to be painted is outside the working space of the UR5 robot, the D-H parameters of the UR5 robot must be changed, or auxiliary equipment must be added to help with the painting. The former is costly. Changing joints is not conducive to the popularization of UR5 robots. The proposed 6-DOF UR5 robot is installed on a 3-axis motion platform that moves horizontally, laterally, and vertically to assist painting. The 3D drawing of the UR5 robot is shown in Figure 9. The 9-axis UR5 robotic manipulator includes the 3-axis motion platform that has 3 prismatic pairs and the UR5 robot that has 6 revolute pairs. The 6-DOF UR5 robot selected in this paper is a UR5 robot, which is shown in Figure 10.

**Figure 9.** 3D drawings of the UR5 robot.

**Figure 10.** The UR5 robot.

The pose of the actual UR5 robot end **T** is shown in Equation (8).

$$\begin{aligned} \mathbf{T} &= \mathbf{Trans}(t\_x, t\_y, t\_z) \mathbf{Rot}(\mathbf{x}, \pi) \mathbf{T}\_{\text{robot}} \mathbf{T}\_{\text{tool}} \mathbf{T}\_{\text{dis}} \\ &\quad \mathbf{Trans}(t\_{x'}, t\_{y'}, t\_z) = \begin{bmatrix} 1 \ 0 \ 0 \ t\_x \\ 0 \ 1 \ 0 \ t\_y \\ 0 \ 0 \ 1 \ t\_z \\ 0 \ 0 \ 0 \ 1 \end{bmatrix} \\ \mathbf{Rot}(\mathbf{x}, \pi) &= \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & \cos \pi & \sin \pi & 0 \\ 0 & -\sin \pi & \cos \pi & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} \end{aligned} \tag{8}$$

in which *tx*, *ty*, and *tz* represent the movement of the motion platform along the *x*, *y*, and *z* axes, respectively, and **Trans** represents the movement operator. **Rot** represents the rotation matrix of the *x*-axis. **T**tool, which is shown in Figure 11, represents the pose of the endeffector relative to its installation center. **T**dis, which is also shown in Figure 11, represents the end fixture of the UR5 robot's position. **T**tool is determined by the structure of the end-effector, and **T**dis is determined by the parameters of the end fixture of the UR5 robot's position. The UR5 robot system has 9 axes which are described in Figure 11. Axes 1 to 6 are the 6 joints of the UR5 robot. Axes 7 to 9 are the three-dimensional Cartesian coordinate system of the motion platform.

**Figure 11. T**tool, **T**dis, and the 9 axes of the UR5 robot system.

Figure 12a is the structural diagram of the 6-DOF UR5 robot composed of 6 revolute pairs, represented by 1, 2, 3, 4, 5, and 6 and 0 represents the end fixture of the UR5 robot. In order to analyze the pose change of the UR5 robot's end fixture coordinates relative to the end-effector, the D-H model is harnessed to establish the kinematics forward solution model of the UR5 robot.

**Figure 12.** (**a**) The schematic structure of the UR5 robot (**b**) detailed explanation of *θi*, *di*, *ai*, and *αi*.

The UR5 robot's pose change matrix in the D-H model is shown in Equation (9).

$$\begin{aligned} \dot{\theta}\_i^{-1} \mathbf{T} &= \mathbf{Rot}(z, \theta\_i) \mathbf{Trans}(0, 0, d\_i) \mathbf{Trans}(a\_i, 0, 0) \mathbf{Rot}(x, u\_i) \\ \mathbf{Rot}(z, \theta\_i) &= \begin{bmatrix} \cos \theta\_i & \sin \theta\_i & 0 & 0 \\ -\sin \theta\_i & \cos \theta\_i & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} \end{aligned} \tag{9}$$

In Equation (9), *θ<sup>i</sup>* is the angle for which *xi*−<sup>1</sup> spins around *zi* to become *xi*; *di* is the distance between *xi*−<sup>1</sup> and *xi* along *zi*; *ai* is the distance between *zi* and *zi*+1 along *xi*, and *α<sup>i</sup>* is the angle for which *zi* spins around *xi* to become *zi*+1. **Rot**(*z*, *θi*) represents the rotation matrix of the *z*-axis; *θi*, *di*, *αi*, and *ai* are shown in Figure 12b. *<sup>i</sup>*−<sup>1</sup> *<sup>i</sup>* **T** is shown in Equation (10), where s represents the sine function and c represents the cosine function.

$$\mathbf{r}\_{i}^{i-1}\mathbf{T} = \begin{bmatrix} \mathbf{c}\theta\_{i} & -\mathbf{s}\theta\_{i}\mathbf{c}\alpha\_{i} & \mathbf{s}\theta\_{i}\mathbf{s}\alpha & a\_{i}\mathbf{c}\theta\_{i} \\ \mathbf{s}\theta\_{i} & \mathbf{c}\theta\_{i}\mathbf{c}\alpha\_{i} & -\mathbf{c}\theta\_{i}\mathbf{s}\alpha & a\_{i}\mathbf{s}\theta\_{i} \\ 0 & \mathbf{s}\alpha\_{i} & \mathbf{c}\alpha\_{i} & d\_{i} \\ 0 & 0 & 0 & 1 \end{bmatrix} \tag{10}$$

**T**robot, which represents the pose of the UR5 robot, is obtained via Equation (11). The UR5 robot's pose should satisfy Equation (12), and the D-H parameters of the UR5 robot are shown in Table 1.

$$\mathbf{T}\_{\text{robot}} = \prod\_{i=1}^{6} \mathbf{i}^{-1} \mathbf{T} \tag{11}$$

$$\mathbf{Trans}(\mathbf{x}, y, z)^{-1} \cdot \mathbf{Rot}(\mathbf{x}, \pi)^{-1} \cdot \mathbf{T} \cdot \mathbf{T}\_{total} \,^{-1} = \prod\_{i=1}^{6} \,^{i-1} \mathbf{T} \tag{12}$$

**Table 1.** The D-H parameters of the UR5 robot.


#### **4. The Inverse Solution Model of Kinematics of the 9-Axis UR5 Robot**

Nevertheless, since the system that controls the UR5 robot and the system that controls the motion platform are separated, communication between the systems may cause problems, such as delays and errors. In order to improve painting accuracy, the motion platform cannot be moved frequently. During the painting process, each time the motion platform is moved, the UR5 robot paints an area. After the painting is completed, the motion platform is moved again. The recommended operating space of the UR5 robot is shown in Figure 13.

The recommended working area in Figure 13 represents the feasible area recommended by the UR5 robot. The area between the maximum working area, which is represented by Max. working area in Figure 13, and the recommended feasible area represents the non-recommended feasible area that the UR5 robot may not reach. The non-recommended area represents the singularity of the UR5 robot. The non-recommended area takes the largest cylinder contained in the green area in Figure 13 as the largest area for each painting procedure.

However, the recommended area in Figure 13 is not a complete ball, and the cylindrical area with a diameter of 151 mm is not recommended. If this area is removed, the volume of the largest cylinder obtained is 0.25 times that without removal. Therefore, the platform will be moved slightly. In order to move the platform as little as possible, the green area is assumed to be a complete ball when the maximum cylinder is calculated.

In order to prevent the UR5 robot from colliding with the platform, it is necessary to keep the end of the UR5 robot below its installation center during the movement of the UR5 robot. The area under the side wall of the machine is taken as the effective area. The volume of the cylinder included in Figure 13 is shown in Equation (13).

$$V = \pi (r \cos \theta)^2 \cdot (163 - r \sin \theta) \tag{13}$$

where *r*, *θ*, and *V* represent the maximal length of the UR5 robot, the radian of the intersection of the cylinder, and the maximum machining space of the UR5 robot, respectively. When *θ* equals −0.54 rad, the volume of the cylinder is the largest. At this time, the cylinder has a radius of 727 mm and a height of 602 mm. Due to the existence of the nonrecommended area, the UR5 robot inverse kinematics model is divided into a recommended area inverse kinematics model and a non-recommended area inverse kinematics model.

#### *4.1. Inverse Solution Model for Recommended Regions*

The three joint axes of the UR5 robot, joint axes 2, 3 and 4, which are shown in Figure 14, are parallel, and their spatial structure satisfies the Pieper criterion. The closedform solution method can be harnessed to solve the angular sequence of the UR5 robot joint under the target pose constraints.

**Figure 14.** The axes of the UR5 robot.

Equation (14) determines the pose of the end-effector of the UR5 robot.

$$\prod\_{i=1}^{6} \overline{i}^{i-1} \mathbf{T} = (\mathbf{Trans}(t\_{x}, t\_{y}, t\_{z}) \mathbf{Rot}(\mathbf{x}, \pi))^{-1} \cdot \mathbf{T} \cdot (\mathbf{T}\_{\text{tod}} \cdot \mathbf{T}\_{\text{dis}})^{-1} = \begin{bmatrix} n\_{x} & o\_{x} & a\_{x} & \mathbf{x}' \\ n\_{y} & o\_{y} & a\_{y} & \mathbf{y}' \\ n\_{z} & o\_{z} & a\_{z} & z \\ 0 & 0 & 0 & 1 \end{bmatrix} \tag{14}$$

The main idea of the closed-form solution is creating the constraint equation established by matrix changes. The constraint matrix of the UR5 robot, which is shown in Equation (15), is established in accordance with Equation (11). The third row of the left and right sides of Equation (15) is expanded to establish Equation (16). Equation (16) can be regarded as the equation set of *θ*1, *θ*5, and *θ*6.

The procedure for finding the values of *θ*2, *θ*3, and *θ*<sup>4</sup> is similar.

$$\mathbf{L} = (\prescript{0}{1}{\mathbf{T}}^{-1})\mathbf{T}\_{\text{robot}} = \prod\_{i=2}^{6} i^{-1} \mathbf{T} = \mathbf{R} \tag{15}$$

$$
\begin{bmatrix}
\mathbf{c}\theta\_6\mathbf{s}\theta\_5 \\
\mathbf{s}\theta\_6\mathbf{s}\theta\_5 \\
\mathbf{c}\theta\_5 \\
\mathbf{d}\_2 + d\_3 + d\_4 + d\_6\mathbf{c}\theta\_5 + a\_5\mathbf{s}\theta\_5 + a\_6\mathbf{c}\theta\_6\mathbf{s}\theta\_5
\end{bmatrix}^T = \begin{bmatrix}
n\_y\mathbf{c}\theta\_1 + n\_x\mathbf{s}\theta\_1 \\
o\_y\mathbf{c}\theta\_1 + o\_x\mathbf{s}\theta\_1 \\
a\_y\mathbf{c}\theta\_1 + a\_x\mathbf{s}\theta\_1 \\
y\mathbf{c}\theta\_1 + x\mathbf{s}\theta\_1
\end{bmatrix}^T
\tag{16}
$$

By employing Equations (15) and (17), the result can be obtained in Equation (18),

$$\mathbf{L} = ({}^{0}\_{1}\mathbf{T}^{-1}){}^{0}\_{6}\mathbf{T}({}^{4}\_{5}\mathbf{T}^{-1})({}^{5}\_{6}\mathbf{T}^{-1}) = \prod\_{i=2}^{4} i^{-1}\mathbf{T} = \mathbf{R} \tag{17}$$

$$\mathbf{R} = \begin{bmatrix} \mathbf{c}\theta\_{234} & 0 & -\mathbf{s}\theta\_{234} & a\_3 \cdot \mathbf{c}\theta\_{23} + a\_3 \cdot \mathbf{c}\theta\_2 + a\_4 \cdot \mathbf{c}\theta\_{234} \\ -\mathbf{s}\theta\_{234} & 0 & \mathbf{c}\theta\_{234} & -a\_3 \cdot \mathbf{s}\theta\_{23} - a\_3 \cdot \mathbf{c}\theta\_2 - a\_4 \cdot \mathbf{c}\theta\_{234} \\ 0 & -1 & 0 & d\_2 + d\_3 + d\_4 \\ 0 & 0 & 0 & 1 \end{bmatrix} \tag{18}$$

where *θi*1*i*2···*in* is defined in Equation (19).

$$
\theta\_{i\_1 i\_2 \dots i\_n} = \sum\_{j=1}^n \theta\_{ij} \tag{19}
$$

In Equation (18), *θ*234, *θ*23, and *θ*<sup>2</sup> can be acquired by the fourth column of **R** in Equation (17); *θ*2, *θ*3, and *θ*<sup>4</sup> can be obtained afterwards. In Equations (16) and (18), s represents the sine function and c represents the cosine function. Nonetheless, the solution involves a large number of inverse trigonometric functions, and the range of the angles within which the UR5 robot's joints move is −2π to 2π, which results in multiple solutions. Therefore, the solution with the smallest Euler distance from the initial joint angle sequence is selected.

#### *4.2. The Inverse Solution Model for Non-Recommended Regions*

In a non-recommended area, the UR5 robot may not be able to achieve certain poses. It is necessary to move the motion platform to match the position of the UR5 robot. Then, the inverse solution model of the manipulator can be changed to obtain the minimum joint rotation radian and the number of movements of the manipulator's motion platform under posture and pose constraints. This problem is a nonlinear optimization problem with constraints. Sequential quadratic programming (SQP) is an iterative method for constrained nonlinear optimization. When the input value is close to the real solution, the algorithm has a second-order convergence speed and can quickly solve the target solution [23,24].

#### 4.2.1. The Objective Function and Constraints

In the process of solving the inverse kinematics, in order to avoid singularities, the platform is allowed to move slightly. The number of movements of the platform and the total arc of the UR5 robot joint give the objective function, which is shown in Equation (20).

$$f = \sqrt{\sum\_{i=1}^{i=6} \left(\theta\_i^r - \theta\_i\right)^2 + K\_d \left(\left(\mathbf{x}^r - t\_x\right)^2 + \left(y^r - t\_y\right)^2 + \left(z^r - t\_z\right)^2\right)}\tag{20}$$

where *θ<sup>i</sup> <sup>r</sup>* and *θ<sup>i</sup>* are the actual joint curvature and the starting point arc of the UR5 robot. The target point *Kd* represents the number of movements of the platform; *xr* , *y<sup>r</sup>* , and *z<sup>r</sup>* represent the actual moving distance.

The error *es* is allowed in the actual working situation. The constraint condition should be that the Euler distance between **T** and the target pose **T**' is less than *es*, which is shown in Equation (21).

$$\sqrt{\sum\_{j=1}^{j=3} \left( {}^{0}\_{6}\mathbf{T}\_{j4} - {}^{0}\_{6}\mathbf{T}\_{j4}^{'} \right)^{2}} - \epsilon \mathbf{s} \le 0 \tag{21}$$

During the painting process, an error *es'* between the end axis of the spray gun and the target point axis is allowed. The pose constraints are shown in Equation (22).

$$\left| \arccos \left( \frac{\mathbf{n} \cdot \mathbf{n}\_0}{|\mathbf{n}| \cdot |\mathbf{n}\_0|} \right) \right| < \text{cs}' \tag{22}$$

where **n** is the normal direction of **T** to the *z* axis, and **n**<sup>0</sup> is the normal direction of **T'** to the *z* axis.

4.2.2. The Solution Process of the SQP Algorithm

The SQP algorithm decomposes the problem into the quadratic programming (QP) sub-problem, obtaining the descending direction *d* by solving the current angle sequence *θ<sup>k</sup>* in the QP problem and updating *θk*+1 via *d* until the Karush–Kuhn–Tucker (KKT) condition is satisfied or the maximum number of iterations is reached. The specific solving process of the algorithm is described below.

(1) The Lagrange function *L*(*θk, λ*) of the current angular sequence *θ<sup>k</sup>* is constructed, and the expression of *L*(*θk, λ*) is shown in Equation (23).

$$L(\theta\_{k'}\lambda) = f(\theta\_k) + \lambda g(\theta\_k) \tag{23}$$

where *λ* is the Lagrangian multiplier, and λ ≥ 0.

(2) The descending direction *d* of the current angular sequence *θ<sup>k</sup>* is obtained. Equation (23) is converted into the solution to *d*, which is shown in Equation (24).

$$\begin{array}{ll}\min\_{d} \, \nabla f(\theta\_{k})^{T}d + \frac{d^{T}H\_{k}d}{2} \\ \text{s.t.} \, \ g(\theta\_{k}) + \nabla g(\theta\_{k})^{T}d \le 0 \end{array} \tag{24}$$

In Equation (24), *Hk* represents the Hassel matrix of the Lagrangian function *L*(*θk, λ*).

(3) The angle sequence in which *θk+1* equals *θ<sup>k</sup>* plus *d* is updated, and whether the result of the solution satisfies the KKT condition or reaches the maximum number of iterations is determined. If the KKT condition is satisfied, the iteration is stopped. Step 2 is repeated if the KKT condition is not satisfied. The KKT condition is shown in Equation (25) [25].

$$\begin{array}{l} \nabla f(\theta\_k) + \lambda g(\theta\_k) = 0 \\ \lambda \ge 0 \\ \operatorname{g}(\theta\_k) = 0 \\ \lambda g(\theta\_k) = 0 \end{array} \tag{25}$$

In the KKT condition, the third condition means that the result of the solution satisfies the constraint. If *g*(*θk*) equals 0, the KKT condition becomes *f*(*θk*), which equals 0. If *g*(*θk*) is smaller than 0 and *λ* equals 0, the KKT condition also becomes *f*(*θk*), which equals 0.

#### **5. The Painting Trajectory**

During the painting process, not only should the trajectory of the motion platform be planned, but the trajectory of the UR5 robot's movement should also be planned. During the painting process, the posture of the UR5 robot end is shown in Equation (26).

$$\mathbf{Trans}(x, y, z) \cdot \mathbf{Rot}(x, p) \cdot \mathbf{T}\_{\text{robot}} = \mathbf{T} \cdot \mathbf{T}\_{\text{dis}} \,^{-1} \cdot \mathbf{T}\_{\text{tool}} \,^{-1} \tag{26}$$

Since **Rot**(*x*, π) only affects the directions of the *x*-axis and *z*-axis of the 3-dimensional Cartesian system of the UR5 robot system, it does not change the range of the UR5 robot's operating space. Equation (26) can be transformed into Equation (27).

$$\mathbf{T}\mathbf{r}\mathbf{n}\mathbf{s}(x,y,z) \cdot \mathbf{T}\_{\text{robot}}{}^{'} = \mathbf{T} \cdot \mathbf{T}\_{\text{dis}}{}^{-1} \cdot \mathbf{T}\_{\text{tcol}}{}^{-1} \tag{27}$$

where **T***'*UR5 robot equals **Rot**(*x,*π)·**T**UR5 robot.

#### *5.1. The Trajectory of the Motion Platform*

When the size of the object to be painted exceeds the maximum painting area of the UR5 robot, the auxiliary movement of the motion platform is required to complete the painting. During the painting process, the motion platform is moved to bring the UR5 robot close to the target area, and the UR5 robot paints the area. After painting, the motion platform is moved again, and the UR5 robot paints another area. Then, the trajectory planning of the motion platform is optimized by taking the least number of movements of the motion platform as the objective function.

Because the UR5 robot's permitted painting space is a cylinder and the height of the painted object is generally less than the maximum height of the painting space, the model can be projected onto the *xy* plane. The model can be transformed to cover all target points with the fewest circles.

#### 5.1.1. The Minimum Envelope Rectangle of the Target Points

Since the number of target points is large, in order to simplify the calculation, a rectangle enveloping the target points—instead of a set of target points—is harnessed. The solution model, which is shown in Figure 15, is created by employing the minimum rectangle method of the main direction target points. The dotted section in the figure represents the target points; the blue line represents the main direction of the target points, and the red rectangle represents the minimum envelope rectangle of the target points.

**Figure 15.** The minimum envelope rectangle of the target points.

The method for obtaining the principal direction of the plane is shown in Equation (28).

$$\theta = \frac{\operatorname{atan2}(2M\_{11}, M\_{20} - M\_{02})}{2} \tag{28}$$

in which *θ* is the minimum angle between the major axis direction and the direction of the positive side of the *x*-axis, and *Mpq* is the *p+q* order center moment of the projection surface.

*Mpq* is shown in Equation (29).

$$M\_{p\eta} = \sum\_{\mathbf{x}} \sum\_{\mathbf{y}} (\mathbf{x} - \overline{\mathbf{x}})^p (\mathbf{y} - \overline{\mathbf{y}})^q \tag{29}$$

where *x* and *y* represent the center coordinates of the original target point.

#### 5.1.2. The Minimum Number of Movements of the Motion Platform

When a circle fills a rectangle, the effective filling area of each circle is generally rectangular. Then, the model can be simplified again by filling the target rectangle with the smallest number of rectangles inside the circle. The filling model is generally shown in Figure 16. The smaller rectangle in the figure represents the target rectangle that needs to be filled. The larger rectangle represents the largest filled area, and the circular areas outside the largest rectangle represent the areas not recommended for use by the UR5 robot. Each circle represents the maximum processing range of the system each time. *P*<sup>x</sup> and *P*<sup>y</sup> represent the vertical distance and horizontal distance between the target rectangle and the largest filled area; *l* and *h* represent the length and width of the effective rectangle inside each circle.

**Figure 16.** The schematic diagram of the filling model.

The size of the rectangle inside the circle indicates the effective area size of the circle. The length and width of the maximum inscribed rectangle in the circle are both <sup>√</sup>2*r*. The length and width of the inscribed rectangle of the circle are generally close to <sup>√</sup>2*<sup>r</sup>* when the target rectangle is selected. The constraints of the filled model are shown in Equation (30).

$$\begin{aligned} n &= \left[ l\_{\text{max}} / \sqrt{2}r \right] + 1\\ \sum\_{i=1}^{n} l\_{\text{tr}} &\ge l\_{\text{max}}\\ \sqrt{4r^2 - l\_i^2} &\ge h\_{\text{max}} \end{aligned} \tag{30}$$

where [*l*max/ <sup>√</sup>2*r*] represents the largest integer that does not exceed *<sup>l</sup>* √max 2*r* .

Since the number of circles in Figure 16 is an integer, there are many kinds of *P*x, *P*y, *h*1, and *h*<sup>2</sup> satisfying Equation (29). However, the UR5 robot's non-recommended area may cause the movement of the motion platform. The minimum number of target points in the non-recommended area during processing is the optimization goal, and the optimal *P*x, *P*y, *h*1, and *h*<sup>2</sup> are obtained by employing the exhaustive method.

#### 5.1.3. The Movement Sequence of the Motion Platform

Reasonably planning the movement sequence of the motion platform can reduce the movement of the motion platform. During processing, the initial point of the motion platform is at zero. The movement sequence planning model of the motion platform can be transformed into a TSP model starting from zero and returning to zero after accessing all target circle centers. Because the number of points is low, the problem lies in creating a small TSP model. The problem can be directly solved by employing the example method. Since each axis of the motion platform is controlled by a separate motor, the Halton distance between two points is taken as the weight when creating the TSP model.

#### *5.2. The Kinematic Trajectory of the UR5 Robot*

According to Figure 8, some painting areas overlap, and, therefore, the greedy principle is harnessed to include as many target points as possible for each time of painting. The painting process requires uniform motion during the painting process, and, therefore, the shorter the total painting path is and the shorter the painting time is, the higher the painting efficiency is.

When each area is painted, the painted model can be transformed into a TSP model that finds the shortest path which can access all of the target points [26]. When different areas are painted, in order to prevent the UR5 robot from colliding with the processed objects, the UR5 robot's joints must be moved after returning to the origin of the 3-dimensional Cartesian system. Then, the UR5 robot kinematics trajectory planning model can be transformed into the solution of multiple TSP models.

However, due to the large number of points to be painted, this model is a large TSP model. Genetic algorithms (GA) are harnessed to create each TSP model [27,28].

#### 5.2.1. The TSP Model Based on the Genetic Algorithm

A genetic algorithm-based computational model that simulates the evolutionary process of natural selection and genetic mechanisms of Darwin's theory of evolution is proposed. The genetic algorithm does not need continuous function limitation and has a better global optimal solution. The core method of GA is to evaluate the fitness of all individuals in each generation of the population and select some individuals for genetic selection, crossover, and mutation to form the next generation population. The main solution steps of the genetic algorithm are shown in Figure 17.

**Figure 17.** The solution steps of the genetic algorithm.

5.2.2. Operations Related to the Genetic Algorithm

(1) Coding method: all target points are encoded into one chromosome *GiGj* ... *GwGk*, and *Gi* in the chromosome represents the *i*-th number of the target point. In this genetic algorithm, each individual has one and only one chromosome.

(2) Weight: the Euler distance between two points is taken as the weight between the two points.

(3) Chromosome distance and individual fitness: the fitness of the individual *f* equals *<sup>N</sup>*/ *<sup>n</sup>* ∑ *i*=2 *di*(*i*+1), where *N* represents the gain coefficient. The value of *N* only affects the value of fitness and does not affect the final TSP solution result.

(4) Selection operation: the fitness of the population is calculated; the best individual is screened, and the coding method of the individual is retained. For the remaining individuals, the individual distribution function is calculated in accordance with their individual fitness. The maximum and minimum normalization algorithms are employed to normalize the distribution function to 0-1. The selection operation process is shown in Figure 18.

1: An empty population vector of *n* is created.

2:The population records the best optimal individual

3: A random number of 0-1 is generated.


7: The population is output.

**Figure 18.** The selection operation process.

(5) Crossover operation: in order to ensure the positive optimization of the genetic algorithm, crossover operations are performed only on non-optimal individuals in the population. The crossover operator uses the Order Crossover operator. The crossover algorithm is shown in Figure 19, where *Pc* represents the crossover probability.


**Figure 19.** The crossover algorithm.

(6) Mutation operation: in order to prevent the GA from falling into a local optimal solution, an adaptive mutation rate is harnessed. The equation for calculating the mutation probability is shown in Equation (31),

$$P\_m = \begin{cases} \frac{K\_1(f\_{\text{max}} - f)}{f\_{\text{max}} - f\_{\text{avg}}}, f \ge f\_{\text{avg}}\\ k2, \ f < f\_{\text{avg}} \end{cases} \tag{31}$$

where *f* max represents the maximum fitness of the group; *f* is the fitness of the individuals to be crossed. *K*<sup>1</sup> and *K*<sup>2</sup> are the adaptive parameters. The mutation operation is shown in Figure 20.

1: One individual in the population in order is selected. The individual's chromosome is *G*1*G*2*…Gn* 2: A random *P* number within 1 is generated. 3: If *P* > *Pc*, this individual is output and the cross operation of the individual is finished. 4: Cross positions *start* and *end* are randomly generated. 5: The genes are reversed from start to end in an individual and the chromosome of this individual will become *G*1…*Gstart* ƺ<sup>1</sup> *Gend* …*Gstart Gend* + 1…*Gn* 6: This individual is output.

**Figure 20.** The mutation operation.

#### **6. Simulation Experiments**

The spray gun of model WA-101 was adopted as the painting equipment, and the offset distance of the spray gun from the UR5 robot end axis was **T**tool = **Trans** (5.5, 94.5, 165.5).

The painting process requirements are described in this paragraph. The distance between the fixture and the target point was **T**dis, which equals **Trans** (0, 0, 113), and the end of the spray gun generated a circle with a radius of 50 mm at this distance. The maximum position error of the target point was 5 mm, allowing a 5◦ error between the gun's end axis and the target's normal vector.

#### *6.1. The Pose Acquisition of the Target Objects*

The analysis of the STL file of a car is shown in Figure 21, and the point cloud model of the car is shown in Figure 22. The shapes and sizes of Figures 21 and 22 are the same as the actual three-dimensional model. The target extraction algorithm proposed in this paper has promising practicality.

**Figure 21.** The STL analysis model of a car.

**Figure 22.** The point cloud model of the car.

The part of the car roof where *z* coordinates are larger than 1000 mm was taken for the target points for painting. The normal vector at the end of the target point was created by employing the PCA algorithm, as is shown in Figure 23 (because the performance of the computer graphics card was not satisfactory, and only the normal phase of some points is shown in the figure). As can be seen from Figure 23, the normal vectors of the target points conform to the shape of the roof, and the algorithm can better establish the normal vectors of the target points. After the variable *r* in Equation (6) had been given a value of 50, the model of the compression points of the car roof, which is shown in Figure 24, was achieved as per Equation (7). After Figures 23 and 24 are compared, it is clear that the size and shape of the target points remain unchanged after compression.

*6.2. Simulation of the Establishment of the Motion Platform Trajectory*

After Equation (27) is given values **T**tool and **T**dis, the distribution of points at the ends of the UR5 robot joints is shown in Figure 25. By utilizing the smallest rectangle model in Figure 15, the smallest rectangle containing the target points was obtained, which is shown in Figure 26.

**Figure 25.** Distribution of points at the ends of the UR5 robot's joints.

**Figure 26.** The minimum envelope rectangle.

The rectangle in Figure 26 has a width of 2635 mm and a length of 1959 mm. Data in Figure 26 were extracted and employed in Figure 16 and Equation (30) to obtain the preliminary partition model of the target points, which is shown in Figure 27.

**Figure 27.** The preliminary partition model of the target points.

The model in Figure 27 was created by the exhaustive method. The result is shown in Figure 28, which describes the minimum number of target points in the non-recommended area when *h*<sup>1</sup> and *h*<sup>2</sup> are determined. The blue areas in the figure represent an invalid combination that does not satisfy Equation (30). In this optimal combination, *h*<sup>1</sup> equals 980 mm; *h*<sup>2</sup> equals 980 mm; *P*<sup>x</sup> equals 7 mm; and *P*<sup>y</sup> equals 3 mm.

**Figure 28.** Statistical results of the objective function.

The model in Figure 27 was given values *P*x, *P*y, *h*1, and *h*<sup>2</sup> to obtain the partition model of the target points which are shown in Figure 29. The red circles in Figure 29 are the maximum processing area and the non-recommended processing area of the UR5 robot. *C*<sup>1</sup> represents the coordinates of the i-th circle, and the coordinates of *Ci* are shown in Table 2.

**Figure 29.** The partition model of the target points.

**Table 2.** The center point *Ci*.


All of the possibilities in Figure 29 were iterated, and the processing order with the smallest total movement of the motion platform was taken. The final painting order of the UR5 robot was *C*<sup>1</sup> *C*<sup>2</sup> *C*<sup>3</sup> *C*<sup>6</sup> *C*<sup>5</sup> *C*4. The painting intervals of the UR5 robot are shown in Figure 30. The dots in different colors in Figure 30 represent different painting intervals.

**Figure 30.** The painted zones of target points.

#### *6.3. The Establishment of the Motion Platform Trajectory*

The target points with the same color in Figure 30 were introduced into and utilized in GAs, and the population number was set to 5000. The maximum number of iterations is 25,000, and the cross probability was 0.9. *K*<sup>1</sup> equals 0.15, and *K*<sup>2</sup> equals 0.2 in the selection probability. The GA solution process is shown in Figure 31, and the final painting trajectory is shown in Figure 32.

**Figure 31.** The iterative process of the genetic algorithm.

**Figure 32.** The final painting trajectory.

#### *6.4. The Inverse Kinematics Simulation of the UR5 Robot*

When the UR5 robot found its inverse kinematics, it generated multiple solutions. When the UR5 robot's inverse kinematics were being solved, the angle values of the target points on the previous painting trajectory were harnessed as the initial angle values, and the target points on the final painting trajectory in each colored area with the smallest radian changes from the initial angle values were selected. The solution was the inverse kinematics of the target points. The position errors and axis errors of the statistical UR5 robot simulation are shown in Figures 33 and 34, respectively.

**Figure 33.** The UR5 robot's position errors.

**Figure 34.** The UR5 robot's axis errors.

As per Figures 33 and 34, the errors are within the technical requirements of painting large objects, and the algorithm has satisfactory engineering practicability. Since the main solution in this paper was an analytical solution, when the joint radian of the UR5 robotic manipulator's motion platform in the recommended area needed to be solved, an iterative method was harnessed to solve Equations (14) to (18).

#### **7. Discussion and Conclusions**

This paper proposes an algorithm that employs a nine-axis robotic manipulator to automatically paint large objects. With the proposed algorithm, automatic processing of complex objects is achieved. The algorithm is divided into three aspects that consist of extraction of the target model, the establishment of the inverse kinematics model of the manipulator, and the planning of the target trajectory.

In the section of the proposed algorithm for extracting the target model, as per the STL file of the target trajectory, the triangles of the target trajectory are extracted. A full 3D data point cloud model is obtained with a triangle-based filling algorithm. Subsequently, the PCA algorithm is harnessed to identify the normal vectors of the target points inside the point cloud model. With the normal vectors, Rodrigues' rotation formula is harnessed to extract the pose of each point of the painting trajectory. Finally, the number of target points is compressed to reduce the time and space complexity of the algorithm so that the painting process requirements can be satisfied.

In the section of the proposed algorithm where the inverse kinematics model of the manipulator is established, in conformity with the processing range of the robot, the inverse solution algorithm is divided into the inverse solution of the recommended region and the inverse solution of the non-recommended region. The corresponding inverse kinematics model is established by employing the closed-form solution method and SQP, respectively.

During the planning of the painting trajectory, in line with the point cloud model of the target points, the minimum envelope rectangles of the target points are found. The target points are divided into different painting areas in accordance with the minimum rectangles. For each painting area, a TSP trajectory planning model based on GA is harnessed to plan the robot's painting trajectory with which the inverse kinematics model of the UR5 robot is created.

Not only does the trajectory created by the proposed algorithm consider the singularities of the kinematic joints of the UR5 robot joints, but it also helps the UR5 robot to paint large objects with precision and efficiency. Multiple reliable simulations of the painting process on the car roof surface were conducted, and the results show that the algorithm can meet the technical requirements of painting and that the algorithm has promising practicability.

The proposed algorithm has several benefits. The trajectory created by the proposed algorithm allows the motion platform of the UR5 robotic manipulator to have the least amount of movement while the UR5 robot paints a large object. Therefore, the proficiency of the painting process can be significantly improved. The standard triangle language (STL) file, algorithm of principal component analyses (PCA), and k-dimensional tree (k-d tree) were employed to obtain the point cloud model of the car roof to be painted. The point cloud model was later converted into multiple traveling salesman problem (TSP) models, each of which was created with genetic algorithms (GAs). In this way, the UR5 robot could identify the object to be painted and generate a trajectory to paint the large object more efficiently.

However, limitations still exist in the current research. The closed-form solution method and SQP were employed to establish the inverse kinematics model of the UR5 robot. SQP is not intelligent enough and still requires a myriad of training solutions. If a more intelligent neural network model had been utilized, the preparation time for the entire painting process would have been reduced. In this research, only the painting trajectory was created, but the dynamics and velocity-planning of the UR5 robot were not involved. Therefore, future studies need to focus on a more intelligent neural network model that helps to establish the inverse kinematics model of the UR5 robot as well as the dynamics and velocity-planning of the UR5 robot.

**Author Contributions:** Conceptualization, F.L.; Data curation, J.W.; Formal analysis, M.Y. and K.Z.; Funding acquisition, J.W. and Q.W.; Investigation, M.Y. and F.L.; Methodology, M.Y., F.L. and K.Z.; Project administration, Q.W.; Resources, K.Z.; Software, F.L.; Supervision, J.W.; Visualization, K.Z.; Writing—original draft, K.F.; Writing—review & editing, M.Y. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The study did not report any data.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **High-Precision SLAM Based on the Tight Coupling of Dual Lidar Inertial Odometry for Multi-Scene Applications**

**Kui Xiao 1, Wentao Yu 1,\*, Weirong Liu 2, Feng Qu <sup>1</sup> and Zhenyan Ma <sup>1</sup>**


**Featured Application: This paper fuses different sensors to form a general high-precision SLAM framework for multi-scene applications. The algorithm framework in this paper can be extended to the fields of autonomous driving, robot navigation, and 3D reconstruction.**

**Abstract:** Simultaneous Localization and Mapping (SLAM) is an essential feature in many applications of mobile vehicles. To solve the problem of poor positioning accuracy, single use of mapping scene, and unclear structural characteristics in indoor and outdoor SLAM, a new framework of tight coupling of dual lidar inertial odometry is proposed in this paper. Firstly, through external calibration and an adaptive timestamp synchronization algorithm, the horizontal and vertical lidar data are fused, which compensates for the narrow vertical field of view (FOV) of the lidar and makes the characteristics of vertical direction more complete in the mapping process. Secondly, the dual lidar data is tightly coupled with an Inertial Measurement Unit (IMU) to eliminate the motion distortion of the dual lidar odometry. Then, the value of the lidar odometry after correcting distortion and the pre-integrated value of IMU are used as constraints to establish a non-linear least-squares objective function. Joint optimization is then performed to obtain the best value of the IMU state values, which will be used to predict the state of IMU at the next time step. Finally, experimental results are presented to verify the effectiveness of the proposed method.

**Keywords:** simultaneous localization and mapping; dual lidar inertial odometry; IMU; time synchronization; tight coupling

#### **1. Introduction**

Simultaneous localization and mapping (SLAM) require building a map of an unknown environment by a mobile vehicle and simultaneously localizing the vehicle in such a map [1–3]. SLAM is essential for vehicles to fulfill many tasks, including vehicle rescue [4] and exploration [5]. The perception of the unknown external environment by various onboard sensors provides vital information for SLAM. Thus, integrating different sensors to develop a practical SLAM framework that can be applied in multiple scenes is essential.

Generally, both vision-based and lidar-based SLAM [6–10] are used. Although the vision-based SLAM can obtain high-precision positioning [11–13], the vision sensors are vulnerable to light change in the environment and cannot work in dark or untextured scenes. In comparison, lidar is not affected by light and can usually measure the angle and distance of obstacles with higher accuracy. Therefore, this paper focuses on the design of lidar-based SLAM to adapt the multi-scene applications.

In recent years, many different lidar-based SLAM schemes have been proposed. Among them, the Lidar Odometry and Mapping in Real-time (LOAM) method, where a single-line lidar and a motor are used to form a multiline lidar to realize low-drift and

**Citation:** Xiao, K.; Yu, W.; Liu, W.; Qu, F.; Ma, Z. High-Precision SLAM Based on the Tight Coupling of Dual Lidar Inertial Odometry for Multi-Scene Applications. *Appl. Sci.* **2022**, *12*, 939. https://doi.org/ 10.3390/app12030939

Academic Editors: António Paulo Moreira, Pedro Neto and Félix Vidal

Received: 30 November 2021 Accepted: 13 January 2022 Published: 18 January 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

low-calculation real-time positioning and mapping, has been studied extensively [14]. In the LOAM method, SLAM is divided into two parts: lidar odometry and lidar mapping. In the lidar odometry part, to reduce the computation, the plane smoothness of the lidar point cloud, which is utilized to distinguish the edge points, is calculated according to the curvature and then invalid point clouds are discarded to improve the feature of point cloud. After the point cloud information is obtained through feature extraction, a scan-to-scan method [15] is proposed to realize feature matching between frames, including edge points and planar points matching. To eliminate the motion distortion of lidar, linear interpolation is utilized. After the distortion is eliminated, the pose transformation of the lidar point cloud data of two adjacent frames is acquired to obtain the lidar odometry. Then, after accumulating a certain number of point cloud data, lidar mapping is performed through a map-to-map matching method [16]. Although LOAM can create a high-precision point cloud map, it cannot remove moving people or objects during feature extraction. Furthermore, in an environment with fewer feature points, it is easy for the lidar odometry to fail, resulting in positioning and mapping with worse accuracy.

To solve the above problems, a Lightweight and Ground-Optimized Lidar Odometry and Mapping on Variable Terrain (LeGO-LOAM) [17] is proposed. Its core comprises four modules: point cloud segmentation and denoising, feature extraction, lidar odometry, and lidar mapping. Firstly, the point cloud segmentation technology [18] solves the defects of moving people or objects in LOAM mapping and filters out noise. Then, feature extraction is applied to obtain planar and edge features from the segmented point cloud. The lidar odometry module uses the features to find out the optimal pose transform between two consecutive scans with the help of a two-step Levenberg–Marquardt (LM) optimization method [19]. The extracted features are further processed by a scan-to-map [20] matching method to obtain a global point cloud map in lidar mapping. LeGO-LOAM optimizes LOAM to a certain extent, but in large scenes (e.g., long corridors) or environments with few feature points, the low frequency of lidar and less characteristic information can lead to significant errors in positioning and mapping.

Some recent research has used low-cost IMU [21–23] to assist lidar for SLAM. The simplest way to integrate the IMU with lidar is loose coupling [24], in which IMU is regarded as an independent module to assist the lidar. The authors of [25] loosely couple IMU and optional GPS measurement with lidar through the Extended Kalman Filter (EKF) to improve computational efficiency and accuracy. However, the IMU error will continue to accumulate in the case of long distances, so the localization error is still increasing in such cases. To solve this problem, IMU and lidar are generally coupled via a tight coupling method that usually offers improved accuracy. A tightly coupled lidar inertial odometry and mapping framework (LIO-Mapping) is introduced in the literature [26]. It comprises the state optimization for the odometry and the refinement with rotational constraints. Results showed that this method outperformed the state-of-the-art lidar-only and loosely coupled methods. Since LIO-Mapping is designed to process all sensor measurements, real-time performance is not achieved. The authors of [27] proposed a tightly coupled Lidar Inertial Odometry via Smoothing and Mapping (LIO-SAM) based on LeGO-LOAM. LIO-SAM performs highly accurate, real-time mobile vehicle trajectory estimation and map building, suitable for multi-sensor fusion and global optimization. Although the tight coupling of lidar and IMU can improve the localization accuracy, lidar sensors have certain drawbacks. Conventional lidar can only be used to scan the environment in a narrow range of vertical angles and the point cloud feature information obtained is limited. For example, in an indoor corridor or staircase environment, lidar can receive the point cloud information of the sidewall, but only a tiny part of the point cloud information is obtained from the floor and ceiling. In such cases, the matched lidar features can easily cause ill-constrained pose estimation and incomplete mapping.

Aiming to broaden the FOV of lidar, lidars can be actuated in a number of ways. The periodic nodding and continuous rotation can be adopted and the resulting configurations can potentially enlarge the FOV. While the implementation of this method depends on a complex mechanical structure, the localization accuracy is relatively low [28,29]. More recently, a new design for a 3D sensor system was introduced [30], involving construction from a 2D range scanner coupled with a passive linkage mechanism, such as a spring. However, the system requires reasonably continual excitation to induce motion of the sensor head. Therefore, the system is not considered appropriate for electric ground vehicles operating with infrequent accelerations on smooth terrain. In order to solve the above problems, some researchers have started investigating the use of multiple 3D lidars for better coverage of the environment. In [31], a high-precision lidar odometry system was proposed to achieve robust and real-time operation under challenging perceptual conditions. In this system, the two lidars are mounted at 180 degrees to each other to make up for the self-similar areas with low lidar observability. The authors of [32] proposed a system to achieve robust and simultaneous extrinsic calibration, odometry, and mapping for multiple lidars, while [33] proposed a scheme to combine multiple lidars with complementary FOV for feature-based lidar-inertia odometry and mapping. While the above methods can work well in single-scene applications, their adaptability to different environments was not considered, especially the environments that need to perceive height information.

This paper proposes a high-precision SLAM framework based on tight coupling of dual lidar inertial to overcome the above problems. In this framework, the horizontal lidar and the vertical lidar are integrated to broaden the FOV. Then the dual lidar and IMU are tightly coupled to improve the localization accuracy. In the outdoor environment, this paper introduces GPS measurements to further improve positioning accuracy. The main contributions of this paper include:


#### **2. Prerequisites**

#### *2.1. IMU State Prediction and Pre-Integration*

Usually, the frequency of IMU is much higher than lidar. IMU can obtain the acceleration and angular velocity at each time step, which can be used to predict the next state of IMU through integration.

In general, the IMU states at time *t* can be modeled as:

$$X\_{\mathbf{f}}^{W} = [p\_{\mathbf{f}'} \mathbf{v}\_{\mathbf{f}}, \mathbf{R}\_{\mathbf{f}}, b\_{\mathbf{f}}] \tag{1}$$

where *X<sup>W</sup> <sup>t</sup>* represents the state in the world frame *W* at time *t*; *pt*, *vt*, and *Rt* represent the position, velocity, and rotation matrix, respectively, at time *t*; and *a*ˆ*<sup>t</sup>* and *ω*ˆ *<sup>t</sup>* are the acceleration and angular velocity of the raw IMU measurements (*a*ˆ*<sup>t</sup>* and *ω*ˆ *<sup>t</sup>* are affected by a slowly varying bias, *bt*).

The next state, *X<sup>W</sup> <sup>t</sup>*+Δ*t*, is predicted through the integration of IMU, where Δ*t* is the interval between two consecutive IMU measurements. The state prediction value of the IMU is then used to infer the motion of the vehicle. The velocity, position, and rotation of the vehicle at time *t* + Δ*t* can be computed by Equation (2) [27]:

$$\begin{aligned} \mathbf{v}\_{t+\Delta t} &= \mathbf{v}\_t + \mathbf{g}\Delta t + \mathbf{R}\_t(\hat{\mathbf{a}}\_t - \mathbf{b}\_t^a - \mathbf{n}\_t^a)\Delta t \\ \mathbf{p}\_{t+\Delta t} &= \mathbf{p}\_t + \mathbf{v}\_t\Delta t + \frac{1}{2}\mathbf{g}\Delta t^2 + \frac{1}{2}\mathbf{R}\_t(\hat{\mathbf{a}}\_t - \mathbf{b}\_t^a - \mathbf{n}\_t^a)\Delta t^2 \\ \mathbf{q}\_{t+\Delta t} &= \mathbf{q}\_t \otimes \begin{bmatrix} \frac{1}{2}\Delta t(\hat{\boldsymbol{\omega}}\_{\Delta t} - \mathbf{b}\_{\Delta t}^\omega - \mathbf{n}\_{\Delta t}^\omega) \\ 1 \end{bmatrix} \end{aligned} \tag{2}$$

where *g* is gravitational acceleration; *b<sup>a</sup> <sup>t</sup>* is the varying bias of the acceleration *a*ˆ*t*; and *n<sup>a</sup> <sup>t</sup>* is the white noise of the acceleration *a*ˆ*t*. The quaternion *qt* under Hamilton notation (which corresponds to *Rt*) is used; <sup>⊗</sup> is used for the multiplication of two quaternions; *<sup>b</sup><sup>ω</sup> <sup>t</sup>* is the varying bias of the angular velocity *ω*ˆ *<sup>t</sup>*; and *n<sup>ω</sup> <sup>t</sup>* is the white noise of the angular velocity *ω*ˆ *<sup>t</sup>*.

The motion state of IMU between the *i*-th and the *j*-th time steps can be represented by the IMU pre-integrated measurement value Δ*pij*, Δ*vij*, Δ*qij*, which can be computed by:

$$\begin{aligned} \Delta \mathbf{v}\_{ij} &= \mathcal{R}\_i^T (\mathbf{v}\_j - \mathbf{v}\_i - \mathbf{g}\Delta t\_{ij}) \\ \Delta p\_{ij} &= \mathcal{R}\_i^T \left(\mathbf{p}\_j - \mathbf{p}\_i - \mathbf{v}\_i \Delta t\_{ij} - \frac{1}{2} \mathbf{g}\Delta t\_{ij}^2\right) \\ \Delta q\_{ij} &= q\_i^{-1} \otimes q\_j = \prod\_{k=i}^{j-1} \left[ \begin{array}{c} \frac{1}{2} \Delta t\_{ij} \left(\hat{\boldsymbol{\omega}} \mathbf{z}\_k - \mathbf{b}\_k^{\boldsymbol{\omega}} - \mathbf{n}\_k^{\boldsymbol{\omega}}\right) \\ 1 \end{array} \right] \end{aligned} \tag{3}$$

where Δ*pij*, Δ*vij*, Δ*qij* is the position, velocity, and rotation matrix of the IMU, respectively, which has the covariance *CIi Ij* in the error-state model. Due to space limitations, we invite readers to refer to the descriptions in [26,34] to understand the detailed derivation of IMU pre-integration.

#### *2.2. Segmentation and Feature Extraction*

Before extracting features, to filter out the noise, the point cloud is segmented to reduce noise interference. *Dt* = {*d*1, *d*2,..., *dn*} is the point cloud information acquired at time *t*; *dτ* is a point in *Dt*; *Dt* is projected onto a range image; the point *dτ* represents a pixel of the image; and the pixel value *rτ* represents the Euclidean distance from the point *dτ* to the lidar. An image-based segmentation method [17] is applied to the range image to group points into many clusters. Points of the same category have the same label (the ground is a particular category). Features are then extracted from ground points and segmented points, with the corresponding edge points and plane points obtained by calculating the curvature *E* of each point. To evenly extract features from all directions, the range image is divided horizontally into several equal sub-images. Then, the points are sorted in each row of the sub-image based on their curvature values, *E*. The curvature *E* is computed by Equation (4).

$$E = \frac{1}{|S| \cdot ||r\_{\pi}||} \Big| \sum\_{\lambda \in S, \lambda \neq \pi} (r\_{\lambda} - r\_{\pi}) \Big| \tag{4}$$

where *S* is the set of continuous points *dτ* from the same row of the range image; *λ* is a point in the set *S*; *Eth* is a threshold which is set to distinguish different types of features; the points are called with *E* > *Eth* edge features; and the points are called with *E* < *Eth* planar features.

#### **3. Multi-Sensor Fusion**

#### *3.1. Hardware System Description*

The configuration of our system is shown in Figure 1, with the left model obtained through Unigraphics NX and the designed equipment in our paper shown on the right. The designed equipment is a scanning arm composed of various sensors that can be installed on a backpack or a vehicle. The sensor suite used in the scanning arm includes two 10 HZ Velodyne VLP-16 lidars, a 200 HZ YIS510A IMU, and a 5 HZ HY-269 GPS. The panoramic camera prepares for the follow-up research. The lidar has a horizontal FOV of 360◦ with 0.4◦ resolution, a vertical FOV 15◦ with 2◦ resolution, and an accuracy of ±3 cm. The Roll/Pitch of IMU accuracy is 0.25◦, HY-269 GPS is a small-volume positioning and directional receiver, and the positioning accuracy of GPS is 1.2 m.

τ λ

**Figure 1.** A CAD (Computer Aided Design) model of the scanning arm is shown on the left, with a photograph of the scanning arm shown on the right.

#### *3.2. System Overview*

In this paper, horizontal and vertical lidars are fused to make up for the shortcomings of lidar's narrow FOV. When indoors, accurate positioning information is obtained through the tight coupling of dual lidar inertial odometry. When outdoors, GPS measurements are added, providing a relatively accurate initial position for positioning, thereby further improving outdoor positioning accuracy.

Figure 2 provides a brief overview of our proposed framework. Firstly, after obtaining the IMU data, the IMU state prediction and pre-integration are updated, as detailed in Section 2.1. IMU pre-integration is used to construct a factor graph constraint, which will be used for joint optimization. Secondly, the dual lidar point clouds are obtained through external calibration and an adaptive timestamp synchronization algorithm, as detailed in Section 3.3. Since lidar has continuous measurement, the lidar measurement is obtained in continuous motion and it is almost certain that motion distortion will occur. When dual lidar point cloud is received, deskewing is applied on the dual lidar raw point cloud to obtain deskewed dual lidar point cloud [14]. To be ground-optimized and reduce the amount of calculation, point cloud segmentation is applied to filter out noise (e.g., moving people or objects) and the planar points and edge points are distinguished by calculating the curvature of the lidar point, as detailed in Section 2.2. After feature extraction, the dual lidar point cloud is registered to construct the lidar odometer factor. Then, the sliding window factor graph optimization method of local finite frame is used, with the IMU pre-integration factor and lidar odometer factor jointly optimized to realize the mutual parameter optimization of the dual lidar and IMU. Joint optimization is taken to obtain a maximum a posteriori (MAP) estimation of the IMU states, as detailed in Section 3.4, avoiding the drift from IMU propagation. To further improve outdoor positioning accuracy, GPS is introduced in this framework. When GPS is loosely coupled with IMU through an Unscented Kalman Filter (UKF), GPS measurements are used to further improve the positioning accuracy of IMU, with the optimized IMU state then used for the state estimation of IMU at the next time step. GPS measurements are added, providing a relatively accurate initial position for localization, thereby further improving outdoor localization accuracy [25]. However, the

drift of lidar inertial odometry grows very slowly; in practice, GPS measurements are used when the estimated position covariance is larger than the received GPS position covariance. Finally, the output of the figure is the global map and localization.

#### *3.3. Fusion of Horizontal Lidar and Vertical Lidar*

To accurately fuse the horizontal and vertical lidar data, their individual coordinate systems must be transformed into a unified coordinate system, which is termed "frame coordinate system". The horizontal lidar coordinate system is used as the frame coordinate system; the vertical lidar coordinate system is registered in this frame coordinate system through external parameter calibration, with the external parameters obtained by the CAD model of the scanning arm. This paper adopts the adaptive time synchronization algorithm to ensure that the timestamps of the horizontal and vertical lidars can be simultaneously output. The fusion process of dual lidar is divided into two parts: (1) external parameter calibration of the horizontal and vertical lidars and (2) the adaptive time synchronization algorithm. Both parts are described as follows.

#### 3.3.1. External Parameter Calibration of Horizontal Lidar and Vertical Lidar

Assuming that *L*<sup>1</sup> is the horizontal lidar coordinate system and *L*<sup>2</sup> is the vertical lidar coordinate system, the horizontal lidar coordinate system is used as the frame coordinate system and the vertical lidar coordinate system *L*<sup>2</sup> is registered in this frame coordinate system *<sup>L</sup>*<sup>1</sup> through the rotation matrix R*L*<sup>1</sup> *<sup>L</sup>*<sup>2</sup> and translation matrix *HL*<sup>1</sup> *L*2 , which can be computed by Equation (5):

$$L\_1 = \mathbb{R}\_{L\_2}^{L\_1} \times L\_2 + H\_{L\_2}^{L\_1} \tag{5}$$

where R*L*<sup>1</sup> *<sup>L</sup>*<sup>2</sup> is the rotation matrix and *HL*<sup>1</sup> *<sup>L</sup>*<sup>2</sup> is the translation matrix which can be obtained by external parameters. From the hardware structure of the scanning arm, the vertical lidar coordinate system first rotates around the *y*-axis and then translates to the horizontal lidar coordinate system, where

$$
\mathbb{R}\_{L\_2}^{L\_1} = \begin{bmatrix}
\cos\theta & 0 & \sin\theta \\
0 & 1 & 0 \\
\end{bmatrix} \tag{6}
$$

where the angle of the rotation matrix *<sup>θ</sup>* = −77.94◦ and the translation matrix *HL*<sup>1</sup> *<sup>L</sup>*<sup>2</sup> = (*x*, 0, *z*), with *x* = −0.177 m and *z* = −0.213 m. θ θ

θ

θ

θ

θ

 θ  θ

 θ

The point cloud of the lidar after external parameter calibration in the frame coordinate system is shown in Figure 3, with the point clouds of the vertical and horizontal lidars not affecting each other. θ

**Figure 3.** Dual lidar point clouds' information after external parameter calibration.

3.3.2. The Adaptive Time Synchronization Algorithm

In this paper, the lidar time synchronization is achieved through hardware time synchronization and GNSS is used as the master clock. However, there is a time offset between the two lidars. As the time offset is constant, an adaptive timestamp synchronization algorithm is used to solve it. The adaptive time synchronization algorithm is used to match its timestamp and realize the simultaneous localization and mapping of the lidars. The timestamps of the dual lidar is shown in Figure 4, where lidar\_201/scan represents the timestamp of the horizontal lidar and lidar\_202/scan represents the timestamp of the vertical lidar.


**Figure 4.** Timestamps of dual lidar.

The output of the adaptive time synchronization algorithm only depends on the timestamp, not on the arrival time of dual lidar point cloud messages. It means that the adaptive time synchronization algorithm can be safely used on dual lidar point cloud messages that have suffered arbitrary processing delays. As shown in Figure 5, time goes from left to right, with the first row representing the horizontal lidar timestamp and the second row representing the vertical lidar timestamp. Each dot represents the lidar point cloud with the timestamp. The blue dot represents the pivot of the lidar point clouds, with the broken line linking the dual lidar point clouds in a set. Suppose the horizontal lidar point clouds' queue is *PLP* and the vertical lidar point clouds' queue is *PVP*, with lidar point clouds inserted in a topic-specific queue (*PLP* or *PVP*) as they arrive. Once each topic-specific queue contains at least one message, the latest message is found at the head of the queues, known as the pivot. Assume that the queue with pivots is *PLP*, which matches *PVP*, being the smallest difference between the timestamps, *T*. The two queues are combined into a set; finally, this set is synchronous output.

**Figure 5.** Timestamp matching of dual lidar.

#### *3.4. Tight Coupling of Dual Lidar and IMU*

Although the fusion of dual lidar makes up for the shortcomings of lidar's narrow FOV, the localization accuracy is improved. However, lidars mounted on moving vehicles suffer from motion distortion, which directly affects the localization accuracy. IMU can accurately measure the three-axis acceleration and three-axis angular velocity of moving vehicles, and provide a priori information for lidar odometry. However, in the case of long or short distances, the IMU error will continue to accumulate, which directly affects positioning accuracy. In this paper, the tight coupling of dual lidar and IMU is proposed to solve the above problems. The process is divided into two parts: (1) external parameter calibration of horizontal lidar and IMU and time synchronization, and (2) joint optimization, which are described as follows:

#### 3.4.1. External Parameter Calibration of Horizontal Lidar and IMU and Time Synchronization

Similar to Section 3.3, the external parameters of the lidar and IMU are also obtained based on the CAD model. To accurately fuse data from the horizontal lidar, the vertical lidar, and the IMU, the IMU coordinate system *I* is also registered in this frame coordinate system *<sup>L</sup>*<sup>1</sup> through the rotation matrix R*L*<sup>1</sup> *<sup>I</sup>* and the translation matrix *HL*<sup>1</sup> *<sup>I</sup>* . It can be seen from the structure of the scanning arm in Figure 1 that the IMU is located directly under the horizontal lidar. Therefore, the IMU coordinate system can be registered in the horizontal lidar coordinate system only by translation, where

$$
\mathbb{R}\_I^{L\_1} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \tag{7}
$$

where the translation matrix *HL*<sup>1</sup> *<sup>I</sup>* = [−0.62.66, 0, −0.125].

In this paper, YIS510A IMU is used, which supports trigger correction of its internal clock by PPS signal. Lidar-IMU time synchronization is achieved through hardware time synchronization, with GNSS used as the master clock. GNSS can output PPS signals to unify the clock source of lidar-IMU and achieve time synchronization.

#### 3.4.2. Joint Optimization

In this paper, a fixed-lag smoother and marginalization are introduced to obtain the optimal state. The fixed-lag smoother maintains a certain amount of IMU state in the sliding window, and the sliding window can effectively control the amount of calculation. When the new state enters the sliding window, the past state is marginalized. The state variable to be estimated for the whole window is *X* = [*X<sup>W</sup> <sup>η</sup>* , ... , *X<sup>W</sup> <sup>κ</sup>* ,Z*L*<sup>1</sup> *<sup>I</sup>* ], where *<sup>X</sup><sup>W</sup> <sup>η</sup>* is the state of IMU at the starting point *η* of the sliding window; *X<sup>W</sup> <sup>κ</sup>* is the state of IMU at the end of the sliding window; and Z*L*<sup>1</sup> *<sup>I</sup>* = [R*L*<sup>1</sup> *<sup>I</sup>* , *<sup>H</sup>L*<sup>1</sup> *<sup>I</sup>* ] is the external parameter between lidar and IMU. Then, the following cost function with a Mahalanobis norm is minimized to obtain the MAP estimation of the states *X*,

$$\mathbf{X} = \min\_{\mathbf{X}} \frac{1}{2} \left\{ \left\| \boldsymbol{\mu}\_{\theta}(\mathbf{X}) \right\|^{2} + \sum\_{\mathbf{X} \in \mathbf{Q}\_{\mathcal{J}}} \left\| \boldsymbol{\mu}\_{\theta}(\mathbf{X}, \mathbf{X}) \right\|\_{\mathcal{L}\_{\theta}^{\mathsf{T}}}^{2} + \sum\_{\boldsymbol{\xi} \in \{\mathfrak{p}, \ldots, \kappa - 1\}} \left\| \boldsymbol{\mu}\_{\boldsymbol{\eta}}(\mathbf{Z}\_{\xi + 1}^{\mathsf{E}}, \mathbf{X}) \right\|\_{\mathcal{L}\_{\xi + 1}^{\mathsf{I}\_{\mathsf{F}}}}^{2} \right\} \tag{8}$$

where *uo*(*X*) is the prior information from marginalization [26]. *uφ*(*χ*, *X*) is the residual of the relative lidar constraints that can be represented as point-to-plane distance [26], where *χ* ∈ *Q<sup>β</sup>* is the residual for each relative lidar measurement with the previous correspondences. *β* ∈ {*η* + 1, . . . , *κ*}, *η* + 1 and *κ* are the timestamps of the lidar sweep next to the starting one and the current lidar sweep in the window, with the covariance matrix *C<sup>χ</sup> <sup>L</sup><sup>β</sup>* determined by the lidar accuracy [35]. *<sup>u</sup>γ*(*Z<sup>ς</sup> <sup>ς</sup>*+1, *X*) is the residual of the IMU constraints, where

$$\boldsymbol{\mu}\_{\boldsymbol{\gamma}}(\mathbf{Z}\_{\xi+1}^{\xi},\mathbf{X}) = \begin{bmatrix} \mathsf{R}\_{\xi}^{T} \left( \boldsymbol{p}\_{\xi+1} - \boldsymbol{p}\_{\xi} - \boldsymbol{\upsilon}\_{\xi} \Delta \mathbf{t} - \frac{1}{2} \mathsf{g} \Delta \mathbf{t}\_{\xi,\xi+1}^{2} \right) - \Delta \boldsymbol{p}\_{\xi,\xi+1} \\\ \mathsf{R}\_{\xi}^{T} \left( \boldsymbol{\upsilon}\_{\xi+1} - \boldsymbol{\upsilon}\_{\xi} - \mathsf{g} \Delta \mathsf{t}\_{\xi,\xi+1} \right) - \Delta \boldsymbol{\upsilon}\_{\xi,\xi+1} \\\ 2 \left[ \Delta \mathsf{q}\_{\xi,\xi+1} \prescript{-1}{\boldsymbol{\big}} \otimes \mathsf{q}\_{\xi}^{-1} \otimes \mathsf{q}\_{\xi+1} \right]\_{\mathsf{x}\boldsymbol{y}\boldsymbol{z}} \\\ \mathsf{b}\_{\xi+1}^{\mathsf{a}} - \mathsf{b}\_{\xi}^{\mathsf{a}} \\\ \mathsf{b}\_{\xi+1}^{\omega} - \mathsf{b}\_{\xi}^{\omega} \end{bmatrix} \tag{9}$$

*<sup>u</sup>γ*(*Z<sup>ς</sup> <sup>ς</sup>*+1, *X*) can be obtained by IMU state prediction (Equation (2)) and IMU pre-integration (Equation (3)), and " Δ*qς*,*ς*+<sup>1</sup> <sup>−</sup><sup>1</sup> ⊗ *<sup>q</sup>*−<sup>1</sup> *<sup>ς</sup>* ⊗ *qς*+<sup>1</sup> # *xyz* stands for the vector part of a quaternion. With the continuous-time linearized propagation of the error states and the IMU noise parameters, the covariances *CI<sup>ς</sup> <sup>I</sup>ς*+<sup>1</sup> of the pre-integration measurements and biases can be estimated. The cost function, in the form of a non-linear least-square, can be solved by the Gauss–Newton algorithm. Ceres Solver [36] is used for solving this nonlinear problem.

It can be seen from Section 3.2 that the new states *X* are obtained by the joint optimization, which is used as the next state of the IMU, avoiding the drift from IMU propagation. The state of the IMU is applied to deskewing, thereby eliminating the motion distortion of the dual lidar.

#### **4. Experiment**

#### *4.1. Data Acquisition Equipment*

To verify the performance of the proposed high-precision SLAM framework based on the tight coupling of Dual Lidar Inertial Odometry (HSDLIO) for multi-scene applications, this paper describes a series of experiments to qualitatively and quantitatively analyze our proposed framework. The device runs on an industrial computer with a processor of i7-9700 through the port and network port, and the system configured of the industrial computer is ubuntu16.04. The scanning arm is installed on the backpack, with the industrial computer and battery installed in the backpack. The small display screen of the backpack shows the localization and mapping in real-time, with the backpack equipment shown in Figure 6. Data of three representative environments are collected, including the featureless corridor, stairs with height, and complex outdoor environments. To illustrate the effectiveness of the HSDLIO algorithm, LeGO-LOAM and LIO-SAM algorithms are compared with it.

**Figure 6.** Backpack collection equipment.

#### *4.2. Indoor Experiment 1*

Indoor experiment 1 was carried out in a corridor with few feature points. Since there was no GPS signal indoors, SLAM was mainly carried out through the tight coupling of dual lidar odometry and IMU.

To highlight the performance of the HSDLIO algorithm, LeGO-LOAM and LIO-SAM algorithms are compared with it. Figure 7a shows the environment map of the corridor, while Figure 7b–d shows the overall mapping of LeGO-LOAM, LIO-SAM, and HSDLIO algorithms in the corridor scene, with the white points representing backpack motion trajectories. Compared with the HSDLIO algorithm in Figure 7d, the overall mapping of LeGO-LOAM (Figure 7b) has a significant deviation and the structure is unclear. Figure 8a,b shows that the mapping of LeGO-LOAM is incomplete because the lidar has a considerable drift in a scene with fewer feature points. The mapping results of LIO-SAM (Figure 7c) is close to the real environment, with its structure complete. The main reason is that LIO-SAM tightly couples the horizontal lidar and IMU to make up for the lidar drift. However, Figure 9a,b indicates the mapping result of HSDLIO is more abundant than LIO-SAM in the top surface and ground information. The reason is that HSDLIO fuses horizontal and vertical lidars so that its angle of FOV in both horizontal and vertical directions is 360◦. The dual lidars and IMU are tightly coupled to make up for the lidar drift.

**Figure 7.** (**a**) Corridor scene for experiment 1. (**b**) Mapping results of LeGO-LOAM in the corridor scene, with the mapping of LeGO-LOAM having a significant deviation and the structure incomplete. (**c**) Mapping results of LIO-SAM in the corridor scene, with LIO-SAM having a complete mapping structure but lacking information on the top and ground. (**d**) Mapping results of HSDLIO in the corridor scene, with HSDLIO having a complete mapping structure.

**Figure 9.** Comparison of mapping details at the top of the corridor, where (**a**) the mapping of LIO-SAM lacks top information and (**b**) the mapping of HSDLIO shows finer structural details of the environment.

In the long corridor scene, compared with LeGO-LOAM and LIO-SAM, the structure of HSDLIO mapping is more complete and the point cloud information on the ground and top surface is richer. To verify the positioning accuracy of the HSDLIO, the trajectories of LeGO-LOAM, LIO-SAM, and HSDLIO are compared in Figure 10. It can be observed in Figure 10 that the LeGO-LOAM trajectory has drifted. The main reason is that in the LeGO-LOAM algorithm, the lidar odometry error continues to increase in a long corridor environment with fewer feature points. Both LIO-SAM and HSDLIO use the tight coupling of IMU and lidar for positioning, which improves positioning accuracy. HSDLIO further improves the accuracy of positioning through the tight coupling of dual lidars and IMU.

To further improve the positioning accuracy of HSDLIO, the relative translational error (RTE)—the distance from the start point to the end point—is introduced. In all experiments, the data collection started and ended at the same point; when the positioning accuracy of

the device is very high, the start and end point will coincide. The RTE, when the backpack returns to the start point, is shown in Table 1. The RTE can be computed by Equation (10):

$$\text{RTE} = \sqrt{\mathbf{x}\_o^2 + \mathbf{y}\_o^2 + \mathbf{z}\_o^2} \tag{10}$$

where *xo* = *xs* − *xe*, *yo* = *ys* − *ye*, *zo* = *zs* − *ze*, *xs*, *ys*, and *zs* are the coordinates of the starting point and *xe*, *ye*, and *ze* are the coordinates of the end point. The result of LeGO-LOAM is not shown because its trajectory has severely shifted. Table 1 shows that the error of HSDLIO in the *x* and *y* direction is similar to LIO-SAM, but the accuracy in the *z* direction is higher than LIO-SAM and the RTE of HSDLIO is smaller than LIO-SAM.

**Figure 10.** Comparison of LeGO-LOAM, LIO-SAM, and HSDLIO trajectories. The red line is the trajectory of HSDLIO, the blue line is the trajectory of LIO-SAM, and the black line is the trajectory of LeGO-LOAM, with the latter having drifted.



#### *4.3. Indoor Experiment 2*

Indoor experiment 2 aimed to show the effectiveness of HSDLIO in the indoor environment with a certain height information. In this experiment, the backpack was carried up and down four flights of stairs. Figure 11a shows the stairs scene, while Figure 11b–d shows the overall mapping of LeGO-LOAM, LIO-SAM, and HSDLIO algorithms. The white points are backpack localization trajectories. LeGO-LOAM (Figure 11b) cannot obtain the structure of the stairs. Because the vertical FOV of the horizontal lidar is relatively narrow, the mapping result of LIO-SAM (Figure 11c) does not show the height information. With the help of the vertical lidar and the tight coupling of the IMU, the mapping of HSDLIO (Figure 11d) is the most complete and accurate, which is closer to the real stairs' scene.

Figure 12 shows the trajectories obtained by LeGO-LOAM, LIO-SAM, and HSDLIO. The red, black, and green lines represent the trajectories of HSDLIO, LeGO-LOAM, and LIO-SAM, respectively, with the trajectory of HSDLIO approximating the real trajectory. LeGO-LOAM has no IMU correction, which causes the drift of the trajectory. Although the trajectory of LIO-SAM does not drift, due to the narrow vertical FOV of the horizontal lidar, a significant error occurs in the *z* direction information. Comparing the height

information of LIO-SAM and HSDLIO in the *z* direction in Figure 13, it can be seen that when going up and down four floors of stairs, the localization results of LIO-SAM provide no height information and also indicate a great drift, while HSDLIO provides accurate height information. Therefore, the tight coupling of dual lidars and IMU makes the advantages of HSDLIO clearly apparent. Table 2 shows that the error of HSDLIO in the *x* and *y* direction is smaller than LIO-SAM, but the accuracy in the *z* direction is much higher than LIO-SAM and the RTE of HSDLIO is much smaller than LIO-SAM.

**Figure 11.** (**a**) Stairs scene for experiment 2. (**b**) Mapping results of LeGO-LOAM in the stairs scene, with LeGO-LOAM's mapping result having failed. (**c**) Mapping results of LIO-SAM in the stairs scene, with LIO-SAM's mapping results lacking height information. (**d**) Mapping results of HSDLIO in the stairs scene, with the mapping of HSDLIO the most complete and accurate.

**Figure 12.** Comparison of LeGO-LOAM, LIO-SAM, and HSDLIO trajectories. The red line is the trajectory of HSDLIO, the blue line is the trajectory of LIO-SAM, and the black line is the trajectory of LeGO-LOAM. While LeGO-LOAM's trajectory drifted and LIO-SAM's trajectory produced cumulative errors in the *z* direction, the trajectory of HSDLIO approximated the real trajectory.

#### *4.4. Outdoor Experiment*

The outdoor experiment was carried out on a city street. This scene has rich feature points, so the drift of lidar odometry grows very slowly. The localization accuracy of LeGO-LOAM, LIO-SAM, and HSDLIO have very little difference, but the mapping of HSDLIO can provide finer structural details.

**Figure 13.** Comparison of the height information of LIO-SAM and HSDLIO in the *z* direction. The trajectory of HSDLIO provides height information from going up and down four floors of stairs, while LIO-SAM failed to position in the *z* direction.

**Table 2.** Relative translational error when the backpack returns to the starting point (meters).


The trajectories obtained by LeGO-LOAM, LIO-SAM, and HSDLIO are shown in Figure 14. Because there is more feature information outdoors, the drift of lidar odometry grows very slowly. LeGO-LOAM can obtain relatively accurate localization only through horizontal lidar. In LIO-SAM and HSDLIO, GPS measurement is introduced by loosely coupling with IMU through UKF, which can provide relatively accurate initial positioning. The localization accuracy of HSDLIO and LIO-SAM is higher than LeGO-LOAM. The available data in Table 3 further supports the performance of the HSDLIO algorithm.

**Figure 14.** Comparison of LeGO-LOAM, LIO-SAM, and HSDLIO trajectories. The red dashed line is the trajectory of HSDLIO, the blue dashed line is the trajectory of LIO-SAM, and the black dashed line is the trajectory of LeGO-LOAM.


**Table 3.** Relative translational error when the backpack returns to the starting point (meters).

The 3D city street scene is shown in Figure 15a. In Figure 15a–d, the marks in the white circle represent trees and the marks in the yellow circle represent the building structure. Figure 15b–d shows the overall mapping of LeGO-LOAM, LIO-SAM, and HSDLIO algorithms in the city street scene, with the white points being their localization trajectories. Figure 15b,c shows that the overall effect of the mapping lacks building height information, while the structure of buildings and trees is blurred. The mapping details are analyzed at the marker, compared with the mapping results of LeGO-LOAM and LIO-SAM (Figure 15b,c), while the HSDLIO mapping result is shown in Figure 15d. It can be seen in Figure 15d that the structure of the trees and buildings is complete and has finer details.

**Figure 15.** (**a**) City street scene, with the marks in the white circle representing trees and the marks in the yellow circle representing the building structure. (**b**) The mapping results of LeGO-LOAM in the city street scene provide a lack of building height information and the structure of buildings and trees is blurred. (**c**) The mapping results of LIO-SAM in the city street scene provide a lack of building height information, though the mapping of trees is relatively clear. (**d**) The mapping results of HSDLIO in the city street scene show that the point cloud features of the trees and the building structure are finer and more complete, including finer structural details.

To further demonstrate the advantages of HSDLIO mapping, the mapping details of the building are shown in Figure 16, with the mapping results of LeGO-LOAM and LIO-SAM lacking the structural details and height information of the building. However, the mapping result of HSDLIO show finer structural details and more height information of the building.

(**b**)

**Figure 16.** Comparison of mapping details in the city street scene showing that (**a**) the mapping of LeGO-LOAM lacks structural details of the building, (**b**) the mapping of LIO-SAM lacks the height of the building structure, and (**c**) the mapping of HSDLIO provides finer structural details of the building.

#### **5. Conclusions**

This paper proposes a high-precision SLAM framework for multi-scene applications. In this framework, dual lidars are fused to make up for the shortcomings of lidars' narrow FOV and hence improve the completeness of mapping. Meanwhile, dual lidars and IMU are tightly coupled to improve the localization accuracy. Extensive experiments were carried out and the results showed that compared with the commonly used LeGO-LOAM and LIO-SAM methods, our proposed method can produce more precise localization and more accurate mapping results with more details.

**Author Contributions:** Conceptualization, K.X.; methodology, K.X. and W.Y.; software, K.X., W.Y., W.L. and F.Q.; validation, W.Y. and Z.M.; formal analysis, K.X. and W.L.; investigation, W.L. and K.X.; resources, K.X. and W.Y.; data curation, K.X. and W.Y.; writing—original draft, K.X.; writing—review and editing, K.X. and W.Y.; visualization, K.X.; supervision, K.X. and Z.M.; project administration, K.X. and W.Y.; funding acquisition, K.X. and W.Y. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the National Natural Science Foundation (grant nos. 61602529 and 61672539). This work was also supported by Hunan Key Laboratory of Intelligent Logistics Technology (2019TP1015) and Scientific Research Project of Hunan Education Department (No. 17C1650).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **A Method of Enhancing Rapidly-Exploring Random Tree Robot Path Planning Using Midpoint Interpolation**

**Jin-Gu Kang 1, Yong-Sik Choi <sup>2</sup> and Jin-Woo Jung 1,\***


**\*** Correspondence: jwjung@dongguk.edu; Tel.: +82-2-2260-3812

**Abstract:** It is difficult to guarantee optimality using the sampling-based rapidly-exploring random tree (RRT) method. To solve the problem, this paper proposes the post triangular processing of the midpoint interpolation method to minimize the planning time and shorten the path length of the sampling-based algorithm. The proposed method makes a path that is closer to the optimal path and somewhat solves the sharp path problem through the interpolation process. Experiments were conducted to verify the performance of the proposed method. Applying the method proposed in this paper to the RRT algorithm increases the efficiency of optimization by minimizing the planning time.

**Keywords:** robot path planning; RRT; midpoint interpolation; triangular rewiring; path smoothness

#### **1. Introduction**

Recent path planning research for the robot has encompassed a wide range of topics [1,2]. Path planning is an important capability for autonomous mobile robots. A robot must be able to identify a path from its current position to its destination in order to move successfully. A mobile robot must be able to discover an optimal or sub-optimal collision-free path in the environment from the starting position to the destination [3].

Path planning is the formulation of a route for a mobile robot to proceed from a starting point to a destination point in Euclidean space as efficiently as possible while avoiding both static and dynamic obstacles and maintaining optimality, clearance, and completeness [4]. An optimal path is one with the ideal path length, a clear path is one without obstacles for the mobile robot to collide with, and a complete path is one in which the robot can move from the start point to the destination point without colliding with obstacles.

Furthermore, it is indeed possible for the robot to be able to optimize its path by determining the quickest and safest path to its destination point in order to save time and energy. However, an algorithm that generates the optimal path increases the computation, and an algorithm that quickly generates a path does not guarantee the optimal path [5].

It is difficult to ensure optimality with the sampling-based rapidly-exploring random tree (RRT) algorithm [6]. As shown in Figure 1a, the RRT algorithm is a path planning algorithm that involves repeatedly adding a randomly sampled position as a child node in a tree with the starting point as the root node until the destination point is reached. The tree extends out in the shape of a stochastic fractal, as shown in Figure 1b, and has an algorithm used to locate the destination point.

The RRT algorithm and other sampling-based algorithms [7,8] offer the benefit of planning a path in a shorter time with less computing than traditional path planning algorithms like the visibility graph- [9], cell decomposition- [10], and potential field-based algorithms [11]. On the other hand, it does not ensure optimality and has the drawback of having probabilistically assured completeness. The latter is also known as probabilistic completeness [12], which implies that completeness is assured when the number of random samples is infinite but not always when the number of random samples is limited. The

**Citation:** Kang, J.-G.; Choi, Y.-S.; Jung, J.-W. A Method of Enhancing Rapidly-Exploring Random Tree Robot Path Planning Using Midpoint Interpolation. *Appl. Sci.* **2021**, *11*, 8483. https://doi.org/10.3390/app11188483

Academic Editor: António Paulo Moreira

Received: 8 July 2021 Accepted: 9 September 2021 Published: 13 September 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

goal of this research is to enhance the RRT algorithm, which ensures completeness and performs better than the related algorithms.

**Figure 1.** Overview of the rapidly-exploring random tree (RRT) algorithm: (**a**) planned path R from starting point *qstart* through waypoints *q*1, *q*2, *q*<sup>6</sup> to destination *qgoal* (*qi* is a point on the path); (**b**) process of finding destination point by stochastic fractal shape from the root node (S) as a starting point.

To solve these problems, the main idea of the proposed post triangular processing of midpoint interpolation method is effective in path planning algorithms that do not guarantee optimality, such as the RRT algorithm that has a locally piecewise linear shape and can be used as a post-processing method after a path has been planned using one of these algorithms. It may also be used for different route planning methods since it is a post-processing technique that has no impact on calculation time.

The sampling-based path planning algorithm's primary strength is the fast planning speed based on the small amount of computation compared to the traditional path planning algorithms [7].

Performance verification using simulation in various environments and mathematical modeling were used to validate the performance of the proposed method in this paper. The case in which the proposed algorithm is applied to the sampling-based path planning method and the case in which it is not applied are compared through simulation. Here, the planning time and path length of the first complete path to reach a destination point from a starting point are evaluated.

This paper is organized as follows: Section 2 reviews some related works. Section 3 introduces the proposed post triangular processing of the midpoint interpolation method. Various experimental environments are constructed for path planning in Section 4 to examine the effectiveness and improvements of the proposed method. Finally, the conclusions are presented in Section 5.

#### **2. Related Works**

In this section, we introduce the previous works about the RRT algorithm in Section 2.1 and the Triangular Rewiring Method for the RRT Algorithm in Section 2.2, respectively.

#### *2.1. RRT*

This section shows the pseudocode of the RRT algorithm used in the experiment of this paper that was designed based on [6] in which the RRT algorithm was proposed. In 1998, LaValle proposed the RRT algorithm, which is a representative sampling-based path planning algorithm [6]. It is designed to have many degrees of freedom and is useful for planning a path under non-holonomic constraints.

As shown in Figure 2, when a random sample is generated in the configuration space, the node nearest to the position of the random sample is identified among the nodes constituting the tree with the starting point as the root node. A new node is generated at the random sample position and inserted into the tree if the random sample position is nearer than the step length. The process of tree extension is repeated until the destination point is reached. The RRT algorithm implemented for the proposed method and comparison experiment is Algorithm 1.

**Algorithm 1** Pseudocode of RRT Algorithm

**Figure 2.** Process of the RRT algorithm: When creating the new node *qnew* at a position separated by step length λ in direction of random sample position (*qrand*) based on *qnear* node (position) closest to random sample position (*qrand*) in tree *T* with starting point *qstart* as the root node.

In order to overcome the limitations on optimality and convergence time [6], RRT-Connect can find a connected path more quickly by setting the start point and the destination point as the root of a separate tree, and further expanding the two trees alternately [7]. Rapidly-exploring Random Tree Star (RRT\*) [13] was developed to overcome the limitation that the path generated from RRT does not guarantee convergence to the optimal path. Informed-RRT\* that can find a connected path quickly by enhancing the sampling probability inside the elliptical region with the start point and the destination point as the respective focal points [14]. The RRT\*-Connect algorithm combines the advantages of RRT-Connect and RRT\* [15]. RRT\*-Smart [16], Quick-RRT\*[17], and the proposed algorithm in [8] can show closer optimality by finding and connecting linearly connectable ancestor nodes to random samples through triangular inequality in the process of adding random samples.

#### *2.2. Triangular Rewiring Method for the RRT Algorithm*

This section shows the principle and pseudocode of the Triangular Rewiring Method for the RRT algorithm.

The triangular rewiring method is used to rewire the component based on the triangular inequality concept [8]. The triangular inequality-based RRT algorithm is a rewiring of the RRT method that is based on the concept of triangular inequality between nodes in path planning; thus, it is closer to the optimum than the RRT.

The triangular rewiring method not only can find a better initial solution but also can converge to a better solution rapidly.

The pseudocode for the triangular rewiring method is shown in Algorithm 2. This iterative process continues until no *qancestor* exists (when no parent node exists for the previous *qancestor*, i.e., when *qancestor* is *qstart*) or an obstacle exists between *qchild* and *qancestor*. The method for triangular rewiring is as follows: the node with the position qparent and the edge connecting the *qchild* and *qancestor* nodes are deleted. After the edge is deleted, the qancestor node is updated with the *qparent* node, and the parent node of the *qancestor* is updated with the *qancestor* node. The existing *qparent* node is then deleted. Then, the Trapped method is used to check if it collides with an obstacle between the *qchild* node and the updated *qparent* node. Then, in tree T, the last created *qparent* is inserted as the parent node of *qchild*.

**Algorithm 2** Pseudocode of Triangular Rewiring Method for RRT Algorithm


#### **3. Proposed Post Triangular Processing of the Midpoint Interpolation Method**

The proposed post triangular processing of the midpoint interpolation method can be applied to path planning algorithms that do not guarantee optimality, such as the RRT algorithm, and rewiring and midpoint interpolation based on the triangular inequality principle. The assumptions of the proposed method are as follows:


The basic principle is that the node serving as a waypoint in the planned path checks whether an obstacle collides with its own grandparent node, and if it is free from obstacle collision, the grandparent node rewires to the parent node. If it is not free from obstacle collision, as shown in Figure 3, the locally piecewise linear path created between the node, its parent node, and its grandparent node is made a more optimal path through the interpolation process. In this process, a new node is interpolated into the path and deviates from the piecewise linear path, so it has the advantage of being able to somewhat solve the sharp path problem (the problem facing a mobile robot that has kinematic constraints because the slope is not smooth).

**Figure 3.** Summary of post triangular processing of the midpoint interpolation method: (**a**) line segment *γ* with node *q*<sup>0</sup> and its grandparent node *q*<sup>2</sup> in tree *R* is not free from obstacle collision; (**b**) Parent node *q*<sup>1</sup> of *q*<sup>0</sup> is deleted, and *qa* between *q0* and *q*1, and *qb* between *q*<sup>1</sup> and *q*<sup>2</sup> are inserted as waypoints of a new path between *q*<sup>0</sup> and *q*2.

This post triangular processing of the midpoint interpolation method was designed based on the polygon approximation algorithm [18,19], so, as shown in Figure 4, the path is calculated through a constant value called *ε* (the threshold of minimum clearance) (*ε* > 0). It determines how closely the obstacle is approximated or, in other words, how close to the optimal path it is. Here, in Figure 4, *d* follows Equation (1):

$$d\_{\mathbb{R}}(q\_{\mathbb{i}}) = \begin{cases} \begin{pmatrix} (d\_{n-1}(q\_{\mathbb{i}}))/2, & n > 0\\ \left(2\sqrt{s(s-a)(s-\beta)(s-\gamma)}\right)/\gamma, & n = 0 \end{pmatrix} (s = (a+\beta+\gamma)/2) & \text{(1)} \end{cases}$$

**Figure 4.** Interpolation function of post triangular processing of the midpoint interpolation method: (**a**) height *d*<sup>0</sup> of the triangle formed by waypoints *q*0, *q*1, and *q*<sup>2</sup> of the path is greater than *ε* (interpolation continuation); (**b**) height *d*<sup>1</sup> of the triangle formed by midpoints *mF*(*q*0,1) and *q*<sup>1</sup> of *q*<sup>0</sup> and *q*<sup>1</sup> and midpoint *mF*(*q*0,1) is the midpoint of *q*<sup>0</sup> and *q*1, also *mF*(*q*1,1) is the midpoint of *q*<sup>1</sup> and *q*2.

Here, *ξ*() is a function that receives the node as a variable and returns the parent node of that node. The *n*-squared (*n* ≥ 0) of the *ξ*() function can be expressed as *n*

*ξn*(*qi*) := ' () \* (*<sup>ξ</sup>* ◦ *<sup>ξ</sup>* ◦ ... ◦ *<sup>ξ</sup>*)(*qi*), and if *<sup>n</sup>* is 0, *<sup>ξ</sup>*0(*qi*) := *qi* holds.

For the waypoint *qi*, the value of *d* decreases by 1/2 as interpolation proceeds (*n*). The initial value *d*<sup>0</sup> is the height of the triangle formed by the three line segments *α*, *β*, and *γ* (*γ < α + β*), and the value of *dn* becomes (*dn*−1)/2. Based on *qi*, let *α* be the line segment from itself to the parent node, *β* be the line segment from the parent node to the grandparent node, and *γ* be the line segment from itself to the grandparent node. This *d* value serves as a measure to confirm the clearance of the obstacle compared to *ε*. This is because the smaller the *d*, the closer the path is to the obstacle.

Equation (1) converges to 0 when n becomes infinitely large in *dn*, so epsilon receives only a value greater than 0 (positive number) from the user. *ε* is always greater than *dn* when *n* is infinitely large. i.e., *d* is always smaller than epsilon at some point when *n* becomes infinitely large. This can be expressed as Equations (2) and (3) as follows:

$$\lim\_{n \to \infty} d\_n(q\_i) = \lim\_{n \to \infty} \frac{d\_{n-1}(q\_i)}{2} = \lim\_{n \to \infty} \frac{d\_0(q\_i)}{2^n} = \lim\_{n \to \infty} \frac{\mathbb{C}}{2^n} = 0,\tag{2}$$

$$\varepsilon \colon \lim\_{n \to \infty} d\_n(q\_i) = 0, \ \varepsilon > 0 \to \varepsilon > \lim\_{n \to \infty} d\_n(q\_i). \tag{3}$$

However, since optimality and clearance are opposite attributes, the closer *ε* is to 0 as shown in Figure 5, the more similar the path or path length is to the visibility graph, but it is not a smooth path [20]. The farther away it is from 0 (within a significant value where the path is modified), the farther it is from the optimum, but a smooth (kinetic) path is made, so *ε* should be set to a suitable value according to the environment.

**Figure 5.** Variations according to *ε* values in post triangular processing of midpoint interpolation method (*ε* value of (**a**) > *ε* value of (**b**) > *ε* value of (**c**)): (**a**) When *ε* value is set relatively large; (**b**) When *ε* value is set relatively medium (smooth path); (**c**) When *ε* value is set relatively small (closer to the optimal path).

The following Algorithm 3 shows the pseudocode of the proposed post triangular processing of the midpoint interpolation method. It is mainly composed of the post triangular function (Algorithm 4) and interpolation function (Algorithm 5).

The input values of the post triangular processing of the midpoint interpolation method consist of the path *R* planned through a path planning algorithm, such as the RRT algorithm, the obstacle area information *C*, and the threshold value ε of the minimum clearance.

The *fmodify* is a variable that determines whether the input path *R* has been modified by this method, and if the path is modified even once, the entire process is repeated. If the path modification does not occur in the process of repeating, the algorithm is terminated. *t* refers to the index of the currently focused waypoint of *R*. That is, if *t* is 0, it is the starting point, which is the first point of *R*.


In *R*, when the first focusing point is *qchild*, the next point of that point *qchild* is *qparent*, and the next point of that point *qparent* is *qancestor*. It is determined whether the distance between *qchild* and *qancestor* is free from obstacle collision (*isTrapped*() function). If it is free from collision, it is called *postTriangular*(); otherwise, it is called *interpolation*(). *postTriangular*() connects *qchild* and *qancestor* as in the triangular rewiring method [8], and the *qparent* between them is deleted from the path. *interpolation*() finds (interpolates) the points between *qchild* and *qparent* and between *qparent* and *qancestor* that are free from obstacle collision when connected and rewires *qchild*, *qancestor*, and those two points are found. If *R* and *t* are updated due to *postTriangular*() or *interpolation*(), *qchild* (the *t*-th waypoint of *R*), *qparent*, and *qancestor* are updated accordingly. If *qparent* is the last point in *R*, *fmodify* is checked. Otherwise, the above process is repeated for the updated *qchild* and *qancestor*.

Here, path modification by *postTriangular*() deletes the waypoints and makes a more optimal path but has the effect of sharpening the path shape, and path modification by *interpolation*() creates a new waypoint between the waypoints. Adding/inserting has the effect of making a more optimal path while also smoothing the path shape. Of course, due to the effect of making a more optimal path, path modification by *postTriangular*() is more efficient than that by *interpolation*().

The input values of *postTriangular*() of the post triangular processing of the midpoint interpolation method consist of the path *R*, the focusing point index *t*, and the path modification *fmodify* from the post triangular processing of midpoint interpolation method.

Rewiring is performed on the *t*-th waypoint *qchild* of *R*, the next point *qparent*, and the next point *qancestor* of that point again. First, the path between *qchild* and *qparent* and the path between *qparent* and *qancestor* are deleted. Then the path is inserted between *qchild* and *qancestor*. Finally, *fmodify* returns "true" because the path has been modified. Here, Path <*A*, *B*> means a partial path from Waypoint A to Waypoint B in the complete path.


The goal of the proposed post triangular processing of midpoint interpolation method is to find an interpolation point (*mF*(*q*0), *mF*(*q*1)) free from obstacle collisions between waypoints (*q*0*~q*1, *q*1*~q*2) while descending in the direction of the midpoint (*q*1), as shown in Figure 6 in the interpolation process.

**Figure 6.** Details of post triangular processing of the midpoint interpolation method: (**a**) midpoint *mF*(*q*0,1) of waypoint *q*0, *q*<sup>1</sup> of path and midpoint *mF(q*1,1) of *q*1, *q*<sup>2</sup> are not free from obstacle collision; (**b**) midpoint *mF(q*0,2) of interpolation point *mF(q*0,1), *q*<sup>1</sup> and midpoint *mF(q*1,2) between *q1*, interpolation point *mF(q*1,1) is free from obstacle collision.

However, the interpolation point *mF* follows Equation (4):

$$m\_F(q\_{i\prime}k) = \begin{cases} \left(\frac{m\_F(q\_i, k-1) \cdot x + \zeta(q\_i) \cdot x}{2}, \frac{m\_F(q\_i, k-1) \cdot y + \zeta(q\_i) \cdot y}{2}\right), & k > 0\\ \qquad q\_{i\prime}, & k = 0 \end{cases} \tag{4}$$

When the *k*-th interpolation point of the interpolation point *qi* is *mF*(*qi*,*k*), the 0-th interpolation point becomes itself *qi*, and the first interpolation point is the midpoint of *qi* and the point *ξ*(*qi*) next to *qi*, and the second interpolation point becomes the midpoint of *mF* (*qi*,1) and *ξ*(*qi*). That is, *mF*(*qi*,*k*) (*k* > 0) becomes the midpoint between *mF*(*qi*,*k* − 1) and *ξ*(*qi*). The following Algorithm 5 shows the pseudocode of *interpolation*() of the proposed post triangular processing of the midpoint interpolation method.


The input values of *interpolation*() of the post triangular processing of midpoint interpolation method consist of the path *R*, the obstacle area information *C*, the focusing point index *t*, and the path modification *fmodify* from the post triangular processing of the midpoint interpolation method.

From the *t*-th waypoint *qchild* of *R*, the next point *qparent*, and the next point *qancestor* of that point, the height *d* of the triangle is obtained, *ma* is the midpoint of *qchild* and *qparent*, and *mb* is the midpoint of *qparent* and *qancestor*. If the path between *ma* and *mb* is free from obstacle collision (*isTrapped*()), the path between *qchild* and *qparent* is deleted, and the path between *qchild* and *ma*, the path between *ma* and *mb*, and the path between *mb* and *qancestor* are inserted. Moreover, since the path is modified, *fmodify* returns "true," and the method terminates. If the line segment between *ma* and *mb* is not free from obstacles, the value of *d* decreases by 1/2, *ma* is updated to the midpoint of *ma* and *qparent*, and *mb* is updated to the midpoint of *mb* and *qparent*. It is then determined whether *ma* and *mb* are free from obstacles again. This repeated process proceeds until a case is found in which *ma* and *mb* are free from obstacles or *d* becomes smaller than *ε*. If *d* becomes smaller than *ε*, the value of *t* is increased by 1 and the method is terminated.

Figure 7 shows the overall flowchart of the proposed post triangular processing of the midpoint interpolation method. Here, *ξ<sup>t</sup>* (*qgoal*) denotes the *t*-th next waypoint from the starting point *qgoal* of the path *R*, and *ξt*+*n*(*qgoal*) is the *n*-th next waypoint in the *ξ<sup>t</sup>* (*qgoal*). That is, there are *n* waypoints between *ξ<sup>t</sup>* (*qgoal*) and *ξt*+*n*(*qgoal*). In the flowchart shown in Figure 7, the stop condition follows the sequence:


**Figure 7.** Flowchart of post triangular processing of midpoint interpolation method.

The stopping criterion of the proposed algorithm is based on *ε*. As shown in Figures 4 and 7, when the value of *dk*(*ξ<sup>t</sup>* (*qgoal*)) becomes smaller than the *ε*, the algorithm stops the loop. As Equations (1) and (2) and Figure 6 show, the value *dk*(*ξ<sup>t</sup>* (*qgoal*)) decreases deterministically.

#### **4. Experimental Results**

The path between the RRT in various environment maps through simulation and the RRT algorithm to which the proposed post triangular processing of the midpoint interpolation method is applied were used to validate the performance of the method proposed in this paper, and the path planning results were compared.

The performance measures compared were the average values after repeating the trial 100 times (sampling position was changed for each trial) of the path length (px) and the planning time (ms) of the first complete path (the first complete path to reach a destination point from a starting point).

Various environment maps have been examined and used to validate the performance of the proposed path planning algorithms in related works. Since the efficiency of the performance measures expected during the experiment varies somewhat based on the composition of obstacles (e.g., number, location, shape), it is important to choose which environment map to utilize carefully.

The four environment maps used in the experiment are shown in Figure 8. The four environment maps were created by partially referring to the experimental environment proposed by Han in 2017 [21]. All environment maps were 600 × 600 px in size, with a 30 px step length. The start points (S) are represented by green circles, while the destination points (G) are represented by purple circles. Obstacles are black polygons with yellow contours (blue in the experimental results).

**Figure 8.** Environment maps for experiments: (**a**) Map 1; (**b**) Map 2; (**c**) Map 3; (**d**) Map 4.

Map 1 of Figure 8a appears to be an efficient environment for validating optimality and completeness but a weak environment for sampling-based path planning algorithms like the RRT algorithm. Many samplings are required since the probability of finding a solution is low. Map 2 of Figure 8b appears to be an efficient environment for validating the optimality and completeness of the path planning algorithm. Map 3 of Figure 8c is an environment that is efficient for validating the optimality and the planning time of the path planning algorithm, as it consists of obstacles (50 vertices) that approximate circles. Map 4 of Figure 8d is an environment that is efficient for validating the optimality and the planning time of the path planning algorithm but a weak environment for sampling-based path planning algorithms such as the RRT algorithm.

The number of samples and planning time required increase drastically when the path to the destination point is narrow or there are few entrances since the sampling-based path planning algorithm depends on probabilistic completeness.

The specification of the computer used in the simulation is shown in Table 1. The simulator used for the simulation [8] was developed based on C# WPF (Microsoft Visual Studio Community 2019 Version 16.1.6 Microsoft .NET Framework Version 4.8.03752), and only a single thread was used for calculations except for the visual part. Generally, depending on the specification of the computer, the result of performance measurements, such as planning time, may vary during the simulation.

**Table 1.** Computer performance for simulation.


We validate the experimental results (path length, planning time) in the four environment maps in which the post triangular processing of the midpoint interpolation method (*ε*: 50, 30, 10 px) is applied to the RRT algorithm and its path planning results. Since *ε* requires a higher amount of computation as it gets smaller, it was set to a value close to the 30 px step length of the experimental environment.

The experimental results for each map consist of a figure and table. The figure is a path planning result for each algorithm shown in the case of a single trial (the figure for each algorithm is not the result of repeated trials). The table shows an average value of the results of planning time and path length from the path planning repeated trials. There may be a significant difference between the performance observed visually and the numerical results in the table for one of the repeated trials. The form of the planned path for each algorithm is shown in the figure for visual reference. Furthermore, the proposed post triangular processing of the midpoint interpolation method is used to see whether there is a region where the piecewise linear path is smoothed.

The path planning results are shown in Figure 9 for Map 1 among the specified environment maps for each algorithm. In terms of appearance, the one to which the post triangular processing of the midpoint interpolation method (*ε*: 10 px) was applied seems to have a smoother slope compared to those of the other algorithms, and the path length with the method (*ε*: 10 px) seems to be the shortest.

**Figure 9.** Experimental results of Map 1: (**a**) RRT; (**b**) proposed method (*ε*: 50 px) applied; (**c**) proposed method (*ε*: 30 px) applied; (**d**) proposed method (*ε*: 10 px) applied.

The path planning results (average values after repeating the trial 100 times) are shown in Table 2 for Map 1 among the specified environment maps for each algorithm. When applying the post triangular processing of the midpoint interpolation method (*ε*: 10 px), the path length becomes 62% (1224/1994(%)) compared to the RRT, which is the shortest of all the algorithms, and the planning times are all similar.

**Table 2.** Experimental results of Map 1 (numbers in parentheses (averages of repeating trial 100 times) are relative ratios to RRT results (values less than 1 are counted as 1)).


The path planning results are shown in Figure 10 for Map 2 among the specified environment maps for each algorithm. In terms of appearance, the one to which the post triangular processing of midpoint interpolation method (*ε*: 10 px) was applied seems to have a smoother slope compared to those of the other algorithms, and the path length with the method (*ε*: 10 px) seems to be the shortest.

**Figure 10.** Experimental results of Map 2: (**a**) RRT; (**b**) Proposed method (*ε*: 50 px) applied; (**c**) Proposed method (*ε*: 30 px) applied; (**d**) Proposed method (*ε*: 10 px) applied.

The path planning results (average values after repeating the trial 100 times) are shown in Table 3 for Map 2 among the specified environment maps for each algorithm. By applying the post triangular processing of midpoint interpolation method (*ε*: 10 px), the path length becomes 74% (730/986(%)) compared to the RRT, which is the shortest of all the algorithms, and the planning time is similar to that of the RRT algorithm when the method (*ε*: 50 px) is applied. However, the absolute time difference is 1 ms, which seems to be large when the method is applied because it is an environment that requires less planning time.


**Table 3.** Experimental results of Map 2 (numbers in parentheses (averages of repeating trial 100 times) are relative ratios to RRT results (values less than 1 are counted as 1)).

The path planning results are shown in Figure 11 for Map 3 among the specified environment maps for each algorithm. In terms of appearance, the one to which the post triangular processing of midpoint interpolation method (*ε*: 10 px) was applied seems to have a smoother slope compared to those of the other algorithms, and the path length with the method (*ε*: 10 px) seems to be the shortest.

**Figure 11.** Experimental results of Map 3: (**a**) RRT; (**b**) Proposed method (*ε*: 50 px) applied; (**c**) Proposed method (*ε*: 30 px) applied; (**d**) Proposed method (*ε*: 10 px) applied.

The path planning results (average values after repeating the trial 100 times) are shown in Table 4 for Map 3 among the specified environment maps for each algorithm. By applying the post triangular processing of midpoint interpolation method (*ε*: 10 px), the path length becomes 82% (505/613(%)) compared to the RRT, which is the shortest of all the algorithms, and the planning time is the most similar to that of the method (*ε*: 10 px), as it takes 16% more time than the RRT algorithm. However, the absolute time difference is 2 ms, and since it is an environment that requires less planning time, the difference appears to be large when the method is applied.

**Table 4.** Experimental results of Map 3 (numbers in parentheses (averages of repeating 100 times) are relative ratios to the RRT results (values less than 1 are counted as 1)).


The path planning results are shown in Figure 12 for Map 4 among the specified environment maps for each algorithm. In terms of appearance, the one to which the post triangular processing of the midpoint interpolation method (*ε*: 30 px) was applied seems to have a smoother slope compared to those of the other algorithms, and the path length with the method (*ε*: 10 px) seems to be the shortest.

**Figure 12.** Experimental results of Map 4: (**a**) RRT; (**b**) Proposed method (*ε*: 50 px) applied; (**c**) Proposed method (*ε*: 30 px) applied; (**d**) Proposed method (*ε*: 10 px) applied.

The path planning results (average values after repeating the trial 100 times) are shown in Table 5 for Map 4 among the specified environment maps for each algorithm. By applying the post triangular processing of the midpoint interpolation method (*ε*: 10 px), the path length becomes 77% (1190/1536(%)) compared to the RRT, which is the shortest of all the algorithms, and all the planning times are similar.

**Table 5.** Experimental results of Map 4 (numbers in parentheses to (averages of repeating 100 times) are relative ratios to RRT results (values less than 1 are counted as 1)).


Consequently, the post triangular processing of the midpoint interpolation method (ε: 10 px) performed well in the path length aspect for all maps, demonstrating that the proposed method is efficient in terms of optimality.

#### **5. Conclusions**

In this research, the post triangular processing of the midpoint interpolation method minimized the planning time and shortened the path length of the sampling-based algorithm.

The proposed post triangular processing of the midpoint interpolation method was closer to the optimal path and somewhat solved the sharp path problem through the interpolation process. Furthermore, all path planning algorithms that plan a locally piecewise linear path could apply the proposed algorithm. This strength has significance in that it can be applied not only to the algorithm presented in this paper but also to various path planning algorithms. We validated the performance of the proposed method after it was applied to the RRT algorithm and its path planning results through simulation. It was verified that the path length was shortened by 18–38% (average 26%) depending on the threshold *ε* when applied to the RRT algorithm in the four different environment maps. As a result, the RRT algorithm applying the proposed post triangular processing of the midpoint interpolation method showed a more optimal path.

The proposed post triangular processing of the midpoint interpolation method is based on a general global planning case in which a robot first discovers a global path from its start point to its destination point before beginning to navigate. In dynamic environments, not only local planners but also global planners must deal with kinodynamic problems in real-time. Furthermore, the method proposed in this paper is more efficient in terms of the optimality of robot path planning, but optimality is not guaranteed. Future work should determine how to make a path that is closer to the optimal path.

**Author Contributions:** Idea and conceptualization: J.-G.K. and J.-W.J.; Methodology: J.-G.K. and J.-W.J.; Software: J.-G.K. and J.-W.J.; Experiment: J.-G.K. and J.-W.J.; Validation: J.-G.K., Y.-S.C. and J.-W.J.; Investigation: J.-G.K. and J.-W.J.; Resources: J.-G.K. and J.-W.J.; Writing: J.-G.K., Y.-S.C. and J.-W.J.; Visualization: J.-G.K. and J.-W.J.; Project administration: J.-W.J. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by a National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2020R1F1A1074974) and by the Ministry of Trade, Industry, and Energy (MOTIE) and the Korea Institute for Advancement of Technology (KIAT) through the International Cooperative R&D program (Project No. P0016096). It was also supported by the Technology development Program(S3041234) funded by the Ministry of SMEs and Startups (MSS, Korea) and by the MSIT (Ministry of Science and ICT), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2021-2020-0-01789) supervised by the IITP (Institute for Information & Communications Technology Planning & Evaluation).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **A Mobile Robot with Omnidirectional Tracks—Design and Experimental Research**

**Mateusz Fiede ´n \* and Jacek Bałchanowski**

Faculty of Mechanical Engineering, Wroclaw University of Science and Technology, Wybrzeze, Wyspia ´ ˙ nskiego 27, 50-370 Wroclaw, Poland; jacek.balchanowski@pwr.edu.pl

**\*** Correspondence: mateusz.fieden@pwr.edu.pl

**Abstract:** This article deals with the design and testing of mobile robots equipped with drive systems based on omnidirectional tracks. These are new mobile systems that combine the advantages of a typical track drive with the advantages of systems equipped with omnidirectional Mecanum wheels. The omnidirectional tracks allow the robot to move in any direction without having to change the orientation of its body. The mobile robot market (automated construction machinery, mobile handle robots, mobile platforms, etc.) constantly calls for improvements in the manoeuvrability of vehicles. Omnidirectional drive technology can meet such requirements. The main aim of the work is to create a mobile robot that is capable of omnidirectional movement over different terrains, and also to conduct an experimental study of the robot's operation. The paper presents the construction and principles of operation of a small robot equipped with omnidirectional tracks. The robot's construction and control system, and also a prototype made with FDM technology, are described. The trajectory parameters of the robot's operation along the main and transverse axes were measured on a test stand equipped with a vision-based measurement system. The results of the experimental research became the basis for the development and experimental verification of a static method of correcting deviations in movement trajectory.

**Keywords:** omnivehicle; omnitracks; Mecanum rollers; FDM

#### **1. Introduction**

In 2018, the world mobile robot market was worth USD 19 billion [1]. On the basis of the latest forecast, it is estimated that by 2023 its value will have nearly tripled. These robots have already moved from closed research centres and innovation fairs to the reality of everyday life. Autonomous transport vehicles used for supplies in storage facilities, autonomous mowers in house gardens, smart vacuum cleaners in homes, teleoperated robots, or bomb disposal robots are not only becoming a common element in company equipment, but also a commodity purchased by the average consumer [2,3]. Terrestrial mobile robots can be categorised using a number of criteria, and their constructed systems differ in terms of locomotion types. They can be wheeled, stepping, tracked, or ball balancing robots [4].

Wheeled robots are the most recognized, investigated and frequently used group of mobile robots. They achieve the highest speeds on flat surfaces. Their downside, however, is their poor ability to operate in difficult terrain, with some of them being vitiated by a significant turning radius [5]. Currently, research on improving their mobility in difficult terrain is being conducted. One new solution that has been proposed are hybrid wheeled– stepping robots [6–8].

At the opposite end of the robot spectrum are stepping robots, i.e., devices that are characterised by the best adaptation to overcome obstacles in terrain. Important disadvantages of their most common versions is low movement speed and complex problems related to the stepping control methodology (e.g., problems with maintaining balance

**Citation:** Fiede ´n, M.; Bałchanowski, J. A Mobile Robot with Omnidirectional Tracks—Design and Experimental Research. *Appl. Sci.* **2021**, *11*, 11778. https://doi.org/10.3390/ app112411778

Academic Editor: António Paulo Moreira

Received: 18 November 2021 Accepted: 7 December 2021 Published: 11 December 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

during dynamic stepping in unknown terrain) [9]. In recent years, research on improving the movement speed of stepping robots has been conducted [10,11].

Tracked robots can be classified as a compromise between wheeled and stepping robots. Tracks allow for the mobility of the robot in difficult terrain, while also helping them to maintain a satisfactory movement speed. This type of locomotion has been recognised, investigated and used to a significantly higher degree than stepping systems [12].

Ball-balancing robots, also called ballbots, are robots in which the wheelset system is a ball [13]. This locomotion type, apart from ensuring holonomic control, is a successful solution in robot locomotion that is used to demonstrate technology, and in toys or presentation robots. It is visually attractive. The ballbot group also encompasses robots with a spherical outer casing, or a casing resembling a spherical shape [14,15]. Due to such a construction, these robots can not only achieve a high speed, but can also overcome water obstacles in difficult terrain.

Mobile robotics is now at the stage of dynamic development, where it quickly adapts to novel technological solutions while maintaining some previously used ones. Various simple solutions are successfully used in modern applications. Paper [16] addresses the issues of the hybridization of animals and machines, as well as the use of robots for research on animals. Many of their functions, such as the movement of a mechanical fish, are performed by simple mechanical systems. Paper [17] describes the construction of a mobile robot, in which the stepping movements of 4 legs are produced by a single drive.

An example use of an existing and well-investigated solution can be seen in Mecanum wheel systems. They were patented in 1972 by Swedish engineer Bengt Erland Ilon [18]. The wheels, due to top free rotation rollers oriented at 45◦ on the wheel circumference, as well as appropriate steering, allow vehicles to move in any direction on a flat surface [19] (Figure 1). Another alternative is the use of transversal rollers oriented at 0◦—although such a solution means that the implementation of an optimal wheel with a minimum gap may require an expensive manufacturing process [20]."

**Figure 1.** Mecanum wheel [21] (**a**), and the URANUS mobile robot with Mecanum wheels (**b**) [22].

A four-wheeled robot with Mecanum wheels (Figure 1b), which is driven by four independent motors, can move in any direction on a plane without the necessity to rotate [19,23,24]. A robot equipped with Mecanum wheels is holonomic, so the number of controlled degrees of freedom is equal to the number of degrees of freedom that are held. This is of particular significance in the case of robots that operate in very limited spaces. Fork-lifts and transport robots with Mecanum wheels, which are therefore able to move in any direction, can be used in storage facilities and production halls in which there is a lot of equipment and a large number of storage racks in a small space.

Apart from the whole array of advantages, Mecanum wheels also have considerable disadvantages: robots and vehicles equipped with such wheels need flat, even and clean surfaces. However, there have been many investigations aimed at improving the locomotion of such systems in difficult terrain [25]. The vibration of rotating rollers, which occurs during motion, is another drawback of Mecanum wheels. The reason for this vibration is

the cyclical shifting of the load between the subsequent rollers of a wheel [26]. In most applications, the vibration will not have a significant impact on the robot's operation, however, in some specific uses it can have a significant influence on it [27]. Another potential issue is the distribution of load on the floor surface. Due to their structure, Mecanum wheels have a small wheel-surface contact area. To improve the force distribution and to prevent defects of the surface on which the robot moves, additional, surplus wheels are used in numerous applications for the purpose of reducing the impact of load on the surface. In some applications, non-driven castor wheel types are used, with additional driven Mecanum wheels being used in some others [28].

In recent years, research has been conducted on the possibility of combining Mecanum wheels with a classical drive track. Mobile robots with such a track system have all the advantages of vehicles with Mecanum wheels, but do not take on their disadvantages. They can move on uneven, dirt-covered terrain and can carry heavier loads due to the multiple number of rollers in the tracks. Systems with such a drive can be used as mobile storage robots, automated construction machines (excavators, drilling rigs) [29], inspection or rescue robots, transport platforms and bomb disposal robots. The use of multidirectional tracks in the construction of mobile robots significantly increases their maneuverability.

In 1999, in Isod's work [30], the issue of multidirectional tracked robots was discussed. Such attempts to transfer the properties of Mecanum wheels to tracked wheels were made in 2015 and are described by Zhang [31].

In 2017, a German company, IVA Johann GmbH, in cooperation with scientists from the Bremer Institut für Produktion und Logistik, presented a commercial prototype of a mobile robot equipped with multidirectional tracks [29]. In 2018, the results of simulation research conducted by Zhang [32] on a platform equipped with multidirectional tracks were published. In 2020, Fang published research results involving the construction and tests of a fork-lift equipped with multidirectional tracks, which was designed to transport loads of a maximum of 2 tons [33]. During the research, which was both experimental and numerical, it was shown that a robot with multidirectional tracks preserves its holonomic characteristic.

The paper published by Huang discussed mobility in difficult terrain [34]. During movement along the robot's main axis, multidirectional tracks preserve the robot's ability to overcome terrain obstacles, such as embankments, stairs, thresholds or curbs. In the case of transverse movement to the robot's main axis, the ability to overcome thresholds and steps was significantly reduced. Due to the ability to overcome obstacles while travelling along the main axis, the new robot will replace systems with Mecanum wheels in warehouse facilities and production halls that may have floors with uneven surfaces, thresholds and kerbs.

Simulation research results on rectilinear motion in the transverse direction of a robot with multidirectional tracks are presented in [33]. It was demonstrated that the robot's motion trajectory becomes curved during such motion, which was also confirmed by the experimental research discussed in [32]. Deviations from the rectilinear path depend on the robot's mass, and such deviations increase with a decrease in system mass. An article written by Huang [34] also discusses research on the transverse motion of a robot on a smooth and slippery surface and includes the problems resulting therefrom.

Primarily robots with longitudinally symmetric non-overlapping tracks were investigated in the publications referred to above, and only in one case were robots with symmetric and partly overlapping tracks scrutinized. However, the publications do not discuss issues related to the applications of non-symmetric tracks, nor do they discuss other track systems. Research on the influence of load on driving properties has been conducted for robots with a mass of a maximum of 5 tons. The issue of vibration during the locomotion of robots with multidirectional tracks was not discussed. No attempt to compensate for deviations of direction during locomotion was made.

In this paper is developed a new mobile robot with a drive system that uses multidirectional tracks. The research encompasses the ability of a prototype to perform basic manoeuvres, i.e., motion along the main and transverse robot axes. A static method of improving motion parameters by correcting control parameters was developed and verified. The tests were conducted on a prototype made with fused deposition modelling (FDM) 3D printing technology on a test bench equipped with a vision-based measurement system.

#### **2. Materials and Methods**

#### *2.1. Construction of a Mobile Robot with Multidirectional Tracks*

A classical vehicle with a track drive system (excavator, bulldozer, tank, armoured personnel carrier, bomb disposal robot, etc.) is made of a body and two continuous track systems located symmetrically to the body (Figure 2a). A vehicle with a drive system with multidirectional tracks is equipped with four continuous track systems, which can usually be positioned in various ways with regards to the vehicle's body. The typical schemes of continuous track positions encompass tracks that are completely overlapping (Figure 2b), tracks that are partially overlapping (Figure 2c), as well as systems with non-overlapping tracks (Figure 2d).

**Figure 2.** Schemes of track positions with two and four continuous tracks: (**a**) symmetrical system, (**b**) symmetrical system with completely overlapping tracks, (**c**) system with partially overlapping tracks, (**d**) system with non-overlapping tracks.

It was assumed that the mobile robot to be constructed would have the structure of a symmetrical system with completely overlapping tracks, i.e., it will be made of a body and four parallel track drive systems of equal length with completely overlapping tracks (Figure 2b).

During contact between the continuous track and the surface, tractive force is generated. This force is necessary for a vehicle to move. A large surface-track contact area and an appropriately adjusted track preload result in a lower unit pressure of the track when compared to the wheel. A standard, single continuous track system is made of a track located between the drive wheels, idler, road wheels and return rollers (Figure 3a). In some track vehicles (e.g., excavators), there are solutions in which the drive wheels and idlers also take the load-carrying capacity (Figure 3b). This type of powertrain was also selected for the designed mobile robot. However, due to the large radius of the drive wheels and idlers in relation to their spacing, no extra road wheels were used (Figure 3c).

Tracks in the continuous track system are composed of a closed sequence of several dozen links that are connected to one another with rotating pairs. In the case of multidirectional tracks, each link is equipped with an additional rotating roller positioned at angle γ (usually 30◦ < γ < 60◦) to the link axis. The view of a single link with length *a,* width *b*, and height *h,* and also a cylindrical roller with radius *rr,,* is presented in Figure 4.

The designed mobile robot is equipped with four independent continuous track systems with multidirectional tracks that have only drive wheels and idlers. It is a fourdegrees-of-freedom system. The drive is transferred from the rotating drives to the drive wheels using belts. In the middle part of the robot's body, at point *R,* the local x'y' coordinate system is located. The kinematic scheme of the robot is shown in Figure 5.

**Figure 4.** View of a single multidirectional track link: 1—link, 2—roller.

It was assumed that the designed robot would be used indoors (residential buildings, offices, warehouse facilities, production halls, etc.). Therefore, its dimensions must be adapted to overcome the typical obstacles seen in such places: doors, thresholds, staircases, etc. The robot should move at a maximum speed of *v*<sup>R</sup> = 0.2 m/s, which is the speed considered acceptable for a robot moving in a 'human environment' [35]. This means that the maximum track speeds 1, 2, 3, 4:

$$
v\_R = v\_1 = v\_2 = v\_3 = v\_4\tag{1}$$

The resulting basic dimensions are described in Table 1.

**Figure 5.** Kinematic scheme of the robot with a continuous track system that has multidirectional tracks.



Figure 6 shows an overview of the developed solid model of the robot and the detailed views of the solid model of the track system.

**Figure 6.** General view (**a**) and bottom view (**b**) of the solid model of the robot with the continuous track system with multidirectional tracks.

Figure 7 shows the views of a single continuous multidirectional track with its own drive system. The continuous track system is composed of a track, an idler and a drive wheel. In each track segment there is one cylindrical rotating roller. The drives are located in the central part of the robot. The active moment is shifted to the drive wheel using transmission with a toothed belt.

**Figure 7.** View of the continuous track system with multidirectional tracks: (**a**) side view and (**b**) front view (1—idler, 2—link with a rolling roller, 3—track, 4—motor pulley, 5—drive wheel, 6—toothed belt).

The track drive systems, track links, rollers and the body of the robot were made using incremental FDM technology. The material used is PET-G, which is a polymer characterised by low shrinkage and high mechanical resistance. The free rollers rotate on steel axes. There are plastic slip rings between the rollers and the mount, which is connected to the track plate. The rollers consist of a printed core, onto which the tyres are pulled. The tyres of the rollers, also made using FDM technology, are made of thermoplastic polyurethane with a Shore scale hardness of 40D. The view of the finished prototype of the mobile robot is presented in Figure 8.

**Figure 8.** View of the finished robot prototype equipped with a continuous track system with multidirectional tracks.

The motor is connected to the drive wheel by a reduction pulley transmission system with a reduction ratio of *ik* = *r*s/*r*ks = 0.25. For the assumed maximum robot motion speed *v*<sup>r</sup> (Table 1), the maximum angular velocity of the motor *ω<sup>s</sup>* was determined from the following equation:

$$
\omega\_s = \frac{v\_r}{r\_{ks}i\_k} = 10.67 \, rad/s \tag{2}
$$

The rated active moment of motor *M*<sup>s</sup> was selected using the simplified method of the experimental measurements of the robot's motion resistance in the longitudinal and transverse direction using a tensometric force sensor. The highest motion resistance was measured experimentally during motion in the direction perpendicular to the main axis, and is expressed as the maximum force that excites motion in a set direction, which was *FT* = 13 N. It was assumed that the motor moment *M*<sup>s</sup> should be greater than:

$$M\_s > \frac{r\_k F\_T}{4} \tag{3}$$

On the basis of the calculations made to design the prototype construction, POLOLU-2274 motors were selected with a torque of *Ms* = 0.57 Nm, and a nominal speed of *ns* = 21.99 rad/s. Each motor is equipped with a magnetic encoder that provides 48 counts per revolution, and therefore one complete revolution of the wheel will provide 2249 pulses, with a drive wheel accuracy of Δφ<sup>s</sup> = 0.0007 rad.

#### *2.2. Control System of the Mobile Robot with Multidirectional Tracks*

For the purpose of exciting the set robot motion, it is necessary to have an appropriately designed system and control algorithms. The mobile robot has four drives that independently excite the motion of each track (Figures 5 and 8). Thus, a control system is required to control the robot. The control system should have four control regulators of speed *ωsi* of motor *i* (*i* = 1, 2, 3, 4), which start the drive wheels of the continuous tracks. In the robot prototype, the speed control of the used DC motors will have the form of a pulse width modulation (PWM) signal. The general scheme of the developed robot's control system is presented in Figure 9.

**Figure 9.** General block diagram of the mobile robot's control system.

Motor *i*, which excites the motion of the continuous track system *i,* is individually controlled using a PID*<sup>i</sup>* regulator operating in the feedback loop with a frequency of 50 Hz. The value of *vR(t)* is the set speed of the entire robot, the value of ωTi(t) is the value of the set speed of track motor *i* (*i = 1, 2, 3, 4*), *Ei(t)* is the control error, *Ui(t)* is the steering signal, *P*i*(t*) is the PWM value responsible for motor control DC, ωMi(*t*) is the actual motor rotational speed, Θ*i*(*t*) is the motor rotation angle measured with an encoder, while ω*i*(t) is the motor angular speed. The scheme of the speed regulator of continuous track system *i* is presented in Figure 10.

A RoboTrack control program was developed to control the movement of the robot. The program is run on an external PC.

**Figure 10.** Scheme of the speed regulator controlling continuous track system *i* (*i* = 1, 2, 3, 4).

The programme is responsible for planning the robot's motion trajectory and for sending the set speed values ωTi(t) for particular motors *i* to the robot's control system based on speed values set by the user. RoboTrack also ensures the on-line registration of ride parameters. Communication between the external PC and the mobile robot is performed with a Bluetooth communication link

The excitation of the robot's motion along the set trajectory requires a suitable control of speeds *v1, v2, v3, v4* of tracks *1, 2, 3, 4*. This paper discusses robot motion along two trajectories: a straight line μ<sup>w</sup> at constant speed *vRx* along the longitudinal axis, and a straight line μ<sup>p</sup> at constant speed *vRy* along the robot's transverse axis (Figure 11). In the case of motion along the longitudinal axis at speed *vRx*, the tracks should move at the same speeds (Figure 11):

$$
v\_1 = v\_2 = v\_3 = v\_4 = v\_{Rx} \tag{4}$$

while during motion along the transverse axis at speed *vrx*, the tracks' speed takes the following values:

$$
v\_1 = v\_3 = v\_{Ry} \qquad v\_2 = v\_4 = -v\_{Ry}.\tag{5}$$

#### *2.3. External Measurement Test Bench*

The mechanical prototype proposed in this paper is not able to develop self-localization [36]. The localization of the mobile robot is determined with an external measurement test bench, the general scheme and view of which are presented in Figure 12. The vision-based measurement system was made of a camera located perpendicularly to the measured area. The ELP-USBFHD04H-MFV camera with a resolution of 1920 × 1080 px was located at the height *h* = 1.95 m with regard to the main coordinate system *xy* (Figure 12).

**Figure 12.** Robot motion test bench: (**a**) general scheme, (**b**) test bench view.

The measuring accuracy of the bench was checked using a printed calibration sheet with ArUco markers (Figure 13). The position reading error δ*l* of the marker was less than 0.008 m and the orientation angle error δ*α* was no more than 0.5◦.

**Figure 13.** Calibration sheet with ArUco markers.

The vision-based system records and analyses images with a frequency of 25 Hz (frames/s). In order to allow for the detection of the robot's position in a camera image, the vision marker ArUco was placed on its top surface (Figure 14). The ArUco marker is a print on a flat tile with white and black rectangles. Owing to the precisely defined distribution of these areas, the marker can transfer its ID number, and it is therefore possible to recognize its presence in a ready image, as well as to determine its linear and angular position. For known dimensions, it is also possible to conduct a programmable estimation of its location in three-dimensional space. The middle of the marker is located in point *R* on the robot's body (Figure 14). During the measurement, the whole marker must be located in the camera vision area—this efficiently reduces the observation area to the area of the following dimensions: *a* = 0.75 m, *b* = 1.30 m.

**Figure 14.** ArUco marker view on the robot's body.

The robot is controlled by RoboTrack software (Figure 15), which allows robot motion to be controlled and track and motor motion to be registered. The software was integrated with a VisionM module that allows the covered distance to be measured and recorded. This module is based on OpenCV libraries that are extended with OpenCV Contrib [37] modules. Its task is the online analysis of individual image frames recorded by a camera placed on the test stand. The module provides data on the angular orientation of the tracked robot, its current location and the distance covered, which are recorded in real time.

**Figure 15.** Data flow at the measurement test bench.

#### *2.4. Static Correction Method of the Robot's Locomotion Direction*

During the motion of the mobile robot with the continuous track system with multidirectional tracks travelling at a constant speed, the phenomenon of curved trajectories may take place [24,27]—the middle *R* of the robot's body deviates from the trajectory, and the orientation angle α<sup>R</sup> of the body changes. The deviations from the straight-line trajectory depend on the robot's mass, and they increase with the system's mass [27]. During movement along the transverse axis, the problems with maintaining straight line movement become even more significant when the robot moves on a slippery surface. This disadvantageous phenomenon can be prevented by the introduction of suitable corrections to the set speeds of particular continuous tracks.

For the purpose of maintaining the robot's motion along the set straight line trajectory, the fixed orientation angle α<sup>R</sup> of the robot's body should be ensured by introducing suitable corrections of track speeds.

A difference in the orientation α<sup>R</sup> of the body, i.e., between the real and set robot position, results from the robot's body rotation at constant angular speed ωr. The rotation can be reduced by forcing the body to rotate in the opposite direction by making appropriate changes in track set speeds.

A simplified model of the change in the robot's body orientation and motion direction (Figure 16) was proposed, which entails the introduction of virtual tracks located between tracks 1 and 2,—marked as *L12,* with 3 and 4 being marked as *R34*. The speed *vk* of the virtual tracks excites the corrective rotation of the body, which eliminates the change of orientation, can be determined from Equations (6) and (7). These equations take into consideration the change in the angle Δα<sup>P</sup> of the orientation at time Δt, structural dimension *c* describing the distance between the pairs of continuous tracks, and an empirical adjustment factor *cf* resulting from the surface type and the adhesion of the rotating rollers:

$$
\omega\_{\mathbb{R}} = \frac{\Delta \propto\_{\mathbb{R}}}{\Delta t} \tag{6}
$$

$$
v\_{k} = -\omega\_{\mathcal{R}} c\_f \frac{1}{2} c \tag{7}$$

**Figure 16.** Illustration of the change in angular orientation α<sup>R</sup> of the robot's body under the influence of angular speed ωr, and also the description of this change using a method of eliminating the change in orientation by introducing corrective speeds *v*<sup>k</sup> in virtual tracks *R* and *L*.

In the case of the robot's motion along the longitudinal axis, the correction of track speed involves its suitable increasing or decreasing by value *vk* according to the following formula (Figure 17a):

$$
v\_1 = v\_2 = v\_{Rx} + v\_k \qquad v\_3 = v\_4 = v\_{Rx} - v\_k \tag{8}$$

**Figure 17.** Illustration of track speed correction for the purpose of maintaining constant robot orientation during motion: (**a**) along the longitudinal axis along trajectory μw, (**b**) along the transverse axis along trajectory μp.

Speed correction *vk* for motion along the robot's transverse axis consists of differentiating the speed on each pair of tracks according to formula (Figure 17b):

$$
\upsilon\_1 = -\upsilon\_4 = \upsilon\_{Ry} - \upsilon\_k \qquad \upsilon\_2 = -\upsilon\_3 = \upsilon\_{Ry} + \upsilon\_k \tag{9}
$$

#### *2.5. Experiment Plan*

A number of experimental tests of the mobile robot were carried out on the test stand in order to verify the operation of the proposed drive systems and to determine the basic kinematic properties of the movement.

The tests encompassed a number of rides on straight line trajectory μ<sup>z</sup> located along the longitudinal and transverse axes of the robot. Various measurements of the location, the linear and angular speeds of the robot's body, the rotation angle and angular speeds of the drive wheels, and also the speed of the tracks were conducted during the rides. The research was conducted on a specially designed test bench equipped with a vision-based location measurement system. Figure 18 presents the scheme of the experiment carried out on the measurement test bench.

Ride tests were conducted at two stages. In the first part of the experiment, the set values of the linear speeds of all the tracks did not take into account the correction parameters. In the second part of the experiment, the speeds of particular drives were differentiated by the correction value, which was calculated on the basis of the proposed model of ride trajectory correction.

**Figure 18.** General scheme of the conducted tests of the robot's ride tests on a set straight line trajectory along the longitudinal axis (**a**) and along the transverse axis (**b**) of the robot (*l*1, *l*2—lengths of set trajectory, μT—set trajectory, μR—real trajectory, *x*R(t), *y*R(t)—coordinates of point *R* on the robot's body, α*r*(t)—body coordination angle).

#### **3. Results**

The results of the measurements obtained during the investigated robot rides along the longitudinal and transverse robot axes are presented and discussed below. The tests and measurements were carried out on the designed measurement test bench (Figure 12). The length of the tested trajectory was limited by the area of the test bench. The trajectory waveforms μ<sup>R</sup> (*x*R, *y*R), the dependence of change Δα on time, and the change of the coordinates *x*R, *y*<sup>R</sup> over time were plotted on the basis of the stationary measuring system, while the speeds of the individual tracks were determined on the basis of the angles indicated by the encoders in relation to time.

#### *3.1. Robot Ride Tests*

The tests of the robot's rides along the longitudinal axis were conducted in accordance with the scheme shown in Figure 18a. The robot moved along the planned straight-line trajectory μ<sup>t</sup> by length L1 = 1.2 m at the constant speed *v*Rx = 0.12 m/s.

The scheme presented in Figure 18b shows the plan of the research on the robot's rides along the transverse axis. The robot moved along the planned straight-line trajectory μ<sup>t</sup> by length L2 = 1.2 m at constant speed *v*Ry = 0.12 m/s.

During the research, the waveforms of real ride trajectories μ<sup>R</sup> (*x*R, *y*R), the changes of body orientation angle Δα, coordinates *x*R, *y*<sup>R</sup> of point *R* on the robot's body, and the real linear speeds *v1*, *v2*, *v3*, *v4* of the tracks calculated on the basis of encoder indications were measured and registered. Figures 19–22 show the measurement results for rides along the longitudinal axis, while Figures 23–26 depict the measurement results of the robot's rides along the transverse axis.

**Figure 19.** Trajectory μ<sup>R</sup> (*x*R, *y*R) of point *R* movement on the robot's body during the ride along the longitudinal axis.

**Figure 20.** Waveforms of changes in angle Δα of the robot's body orientation at time *t* during the ride along the longitudinal axis.

**Figure 21.** Plots of coordinates *x*R, *y*<sup>R</sup> of point *R on the robot's body* during the ride along the longitudinal axis.

**Figure 22.** Waveforms of real linear speeds *v1*, *v2*, *v3*, *v4* of tracks *1, 2, 3, 4* of the robot during the ride along the longitudinal axis.

**Figure 23.** Trajectory μ<sup>R</sup> (*x*R, *y*R) of point *R* movement on the robot's body during the ride along the transverse axis.

**Figure 24.** Waveforms of changes in angle Δα of the robot's body orientation to time *t* during the ride along the transverse axis.

**Figure 25.** Plots of coordinates *x*R, *y*<sup>R</sup> of point *R on the robot's body* during the ride along the transverse axis.

**Figure 26.** Waveforms of real linear speeds *v1*, *v2*, *v3*, *v4* of tracks *1, 2, 3, 4* of the robot during the ride along the transverse axis.

The experiment showed that during the rides the real, linear speeds of tracks *v*<sup>i</sup> adopted the values of the set speeds, vRx and vRy. In the case of longitudinal motion, the largest speed errors occurred at the start-up of the robot's motion. After the stabilisation of speed, the errors took significantly smaller values (Figure 22). In the case of transverse motion, the values of the track speed errors occurring at the start-up were comparable to the deviations observed during the whole ride (Figure 26). It should, therefore, be noted that no such error jump between the set and real speed in transverse motion was observed. Despite the speed jump at start-up, the maximum deviation between the real and set speed Δ*vi* was Δ*vi* =0.02 m/s for the longitudinal motion, and Δ*vi* =0.03 m/s for the transverse motion (Figures 22 and 26).

In both longitudinal and transverse motion, deviations from the planned trajectory were observed in the form of a curved ride and a change in the orientation of the robot's body (Figures 20 and 24). The location error of point R was determined to be ΔyR **=** yRT – yR. In both the longitudinal and transverse motion, the set trajectory is a straight line overlapping with the X axis of the measurement test bench coordinate system. Due to this, for both rides the theoretical location of point R in relation to the Y axis is yRT = 0. Location deviation ΔyR = yR.

In the case of the longitudinal motion, the final location error of point R was ΔyR= 0.037 m (Figure 21), while the orientation change was Δα =5◦ (Figure 20) at time *t* = 10 s.

During the transverse motion, the robot did not cover the whole planned distance (L2 = 1.2 m), and the measurement was interrupted after it had covered the distance of 0.7 m (Figure 25) because the robot overstepped the measurement area boundary in the direction of the y axis. In the case of the transverse motion, the final orientation change was Δα = 35◦ (Figure 24), and the location error of point R was ΔyR= 0.36 m (Figure 25) at time *t* = 10.8 s.

The analysis of the waveforms of changes in the orientation angle Δα of the robot's body at time *t* indicated that they had a linear nature. This was confirmed by the conducted statistical analyses. The value of the Pearson linear correlation starts to be higher at 0.97 onwards for both the longitudinal and transverse motions.

#### *3.2. Tests of Robot Rides with Consideration for the Static Correction Method of Ride Direction*

The phenomenon of the change in the angle Δα of body orientation had a linear nature, and therefore the application of the static correction method set out in Section 2.5 was justified. On the basis of the planned changes of orientation angle Δα during a ride (Figures 20 and 24), the values of speed *vkw*, corrections for the longitudinal motion, and *vkw* for the transverse motion, were calculated using Formulas (6) and (7).

In the calculations, the adaptation factor cf was taken into account. For the given robot's dimensions, the value of the c parameter was 0.16 m. With consideration for the type of substrate (track link rollers made of TPU filament with a hardness of 40D moving on ceramic tiles), the value of the factor was determined, empirically, to be cf = 4.4.

The corrective speed values, obtained using Formula (7), were *v*kw = 0.04 m/s for the longitudinal motion and *v*kp = 0.35 m/s for the transverse motion. The corrective speed value *v*kp = 0.35 m/s were restricted to 0.2 m/s due to the limitations of the drive and control systems of the used equipment.

For the obtained values of the corrective speed, two more tests of the robot's rides along the longitudinal and transverse axes were conducted. Figures 27–30 present the results of the measurements for the rides along the longitudinal axis, while Figures 31–34 show the results for the measurements of the robot's rides along the transverse axis.

**Figure 27.** Trajectory μ<sup>R</sup> (*x*R, *y*R) of point *R* movement on the robot's body during the ride along the longitudinal axis with ride direction correction.

**Figure 28.** Waveforms of changes in angle Δα of the robot's body orientation at time *t* during the ride along the longitudinal axis with ride direction correction.

**Figure 29.** Plots of coordinates *x*R, *y*<sup>R</sup> of point *R on the robot's body* during the ride along the longitudinal axis with ride direction correction.

**Figure 30.** Waveforms of real linear speeds *v1*, *v2*, *v3*, *v4* of tracks *1, 2, 3, 4* of the robot during the ride along the longitudinal axis with ride direction correction.

**Figure 31.** Trajectory μ<sup>R</sup> (*x*R, *y*R) of point *R* movement on the robot's body during the ride along the transverse axis with ride direction correction.

**Figure 32.** Waveforms of changes in angle Δα of the robot's body orientation to time *t* during the ride along the transverse axis with ride direction correction.

**Figure 33.** Plots of coordinates *x*R, *y*<sup>R</sup> of point *R on the robot's body* during the ride along the transverse axis with correction.

**Figure 34.** Waveforms of real linear speeds *v1*, *v2*, *v3*, *v4* of tracks *1, 2, 3, 4* of the robot during the ride along the transverse axis with correction.

The measurements showed that the obtained linear speeds of the tracks *v*<sup>i</sup> assume values close to the set speeds *v*Rx and *v*Ry, which were corrected by values *vkw* and *v*kp. The differences between real value *v*<sup>i</sup> and set value *v*Rx for the longitudinal motion were Δ*vi* < 0.02 m/s (Figure 30), disregarding the temporary error jump at the start-up. For the transverse motion, the same value was Δ*v*<sup>i</sup> < 0.04 m/s (Figure 34). In this case, analogically as in the transverse motion without correction, no temporary error jump of speed at startup was recorded. Therefore, the worsening ability to maintain the set speed *v*<sup>i</sup> of the tracks in the transverse motion was observed, which was probably caused by the occurrence of low friction between the surface and the track.

Both in the case of the transverse and longitudinal motion, there was a significant flattening of the trajectory curve (Figures 27 and 31). In the presented case, the final change of orientation angle Δα of the body during longitudinal movement after the introduction of corrective speeds *vkw* was 1.4◦ (Figure 28), while the final point R location error was ΔyR= 0.01 m (Figure 29). A fourfold improvement of the drive parameter Δα was obtained (Table 2).

**Table 2.** Comparison of the conducted measurement results based on the distance of 0.7 m for the transverse motion, and 1.2 m for the longitudinal motion.


In the case of transverse motion without speed correction, the robot did not cover the whole planned distance (Figure 25). Moreover, the measurement was stopped after covering 0.7 m at time *t* = 9 s (Figure 33), because the robot crossed the boundary of the measuring field in the direction of the Y axis. The same part of the trajectory was taken into account when analysing lateral movement while both considering and not considering, the correction of the driving (Figure 33).

The change of body orientation Δα after *t* = 9 s, i.e., at the time equal to the longitudinal drive without speed correction was 11◦ (Figure 32), while the location error was ΔyR = 0.07 m (Figure 33).

In the transverse motion, the observed improvement trajectory curve was smaller than in the case for longitudinal motion. This resulted from, among other things, a 50% decrease in the required corrective speed *vkp*, which took place for equipment-related reasons. The so-reduced correction is not perfectly efficient and therefore it did not allow for the complete improvement of the robot's trajectory. The obtained improvement (over a threefold reduction of error Δα (Table 2)), however, proved the correctness of the used method in the transverse motion of the robot.

The obtained measurement results of drive parameters were statistically analysed and are presented in Table 2.

#### **4. Discussion**

The paper presents a solution for a robot equipped with completely overlapping multidirectional tracks with a symmetrical roller position. A light robot prototype was designed and constructed using additive manufacturing technology to be later used in experimental research on a specially constructed test measurement bench. The robot's parameters were examined along the longitudinal and transverse axes.

The experimental research without ride direction correction confirmed the occurrence of the effect of motion trajectory deviation during transverse motion. Similar effects of the trajectory curve were observed during the simulation tests of the robot with nonoverlapping tracks [27] and with partly overlapping tracks [24].

The data quoted in the references indicate a 3.8% deviation between the theoretical speed and the one obtained in the dynamic simulation during transverse motion, and less than a 1% deviation in the main motion. In these references, no quantitative data was presented regarding the change in the angular orientation of the robot.

The data obtained during the conducted experiment showed a deviation increment of 0.016 rad/s in the longitudinal motion, and 0.133 rad/s in the transverse motion.

Such significant values of the trajectory curve probably result from the mechanical imperfections of the prototype, which were probably due to the adopted drive system's structure without idling rolls and the used manufacturing technology. The fact that the track links and rollers were made of TPU filament, were of 40D hardness in the Shor scale, and had a low friction coefficient resulted in low adhesion of the robot to the surface. It was shown that this effect could be corrected using a suitable control system. The proposed static method of ride direction correction performed this role. The application of the continuous track speed correction allowed the angular deviation during longitudinal motion to be reduced to 0.007 rad/s, and to 0.019 rad/s during transverse motion. Despite the used correction by a value smaller than would result from the transverse motion calculations, a significant improvement in trajectory parameters was observed. This confirms the correctness of the used method and creates opportunities for further research and experiments.

The downside of the used correction method is the fact that it is a static method, which means that it requires earlier experimental tests of a robot on a particular type of surface, with the results needing to be used as the basis for the determination of efficient correction parameters. The application of a dynamic method of ride direction correction would improve the performance even more, and, in effect, eliminate the consequences of deviations in trajectory during robot motion. In further research stages, the prototype will be equipped with additional body orientation angle measurement sensors, which will allow the use of dynamic motion direction correction.

#### **5. Conclusions**

The key considerations of this paper were the design and testing of a mobile robot with an omnidirectional track. The research confirmed the ability of the robot to drive omnidirectionally while maintaining a fixed body orientation. The use of multidirectional tracks in the construction of mobile robots significantly increases their manoeuvrability.

The obtained results clearly show that vehicles equipped with continuous track systems with multidirectional tracks can still be improved. The application of the approach, involving the compensation of mechanical imperfections with the use of additional sensors and appropriate control, will allow for an even more precise maintenance of the set motion direction. The mechanical design of the track system also had a great influence on the accuracy of the robot's movement. Introducing additional road wheels and idlers or changing the material and geometry of the rollers in the track links will improve the contact between the track and the ground.

The development of systems equipped with continuous track systems could be used, inter alia, in the construction of equipment operating on narrow construction sites with numerous obstacles. Tracked machines, which are able to move in the transverse direction, would facilitate manoeuvring and contribute to reducing the number of accidents. Another potential area of application is the transport of small, yet heavy loads, in very limited spaces—in this case a uniform load distribution on the whole track could be of key significance for the usability of such vehicles.

**Author Contributions:** Conceptualization, M.F. and J.B.; methodology, M.F. and J.B.; software, M.F.; validation, M.F. and J.B.; formal analysis, M.F. and J.B.; investigation, M.F. and J.B.; resources, M.F.; data curation, M.F. and J.B.; writing—original draft preparation, M.F.; writing—review and editing, M.F. and J.B.; visualization, M.F.; supervision, J.B.; project administration, M.F. and J.B.; funding acquisition, J.B. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by Grants-in-Aid at Wroclaw University of Science and Technology, Faculty of Mechanical Engineering, Poland, (No. 8211104160).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Contact via email.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Manipulation Planning for Large Objects through Pivoting, Tumbling, and Regrasping**

**Ang Zhang \*, Keisuke Koyama, Weiwei Wan and Kensuke Harada**

Graduate School of Engineering Science, Osaka University, 1 Machikaneyamacho, Osaka 560-8531, Japan; koyama@sys.es.osaka-u.ac.jp (K.K.); wan@sys.es.osaka-u.ac.jp (W.W.); harada@sys.es.osaka-u.ac.jp (K.H.) **\*** Correspondence: zhang@hlab.sys.es.osaka-u.ac.jp

**Abstract:** Robotic manipulation of a bulky object is challenging due to the limited kinematics and payload of the manipulator. In this study, a robot realizes the manipulation of general-shaped bulky objects utilizing the contact with the environment. We propose a hierarchical manipulation planner that effectively combined three manipulation styles, namely, pivoting, tumbling, and regrasping. In our proposed method, we first generate a set of superimposed planar segments on the object surface to obtain an object pose in stable contact with the table, and a set of points on the object surface for the end-effectors (EEFs) of a dual-arm manipulator to stably grasp the object. Object manipulation can be realized by solving a graph, considering the kinematic constraints of pivoting and tumbling. For pivoting, we consider two supporting styles: stable support (SP) and unstable support (USP). Our proposed method manipulates large and heavy objects by selectively using the two different support styles of pivoting and tumbling according to the conditions on the table area. In addition, it can effectively avoid the limitation arising due to the arm kinematics by regrasping the object. We experimentally demonstrate that a dual-arm manipulator can move an object from the initial to goal position within a limited area on the table, avoiding obstacles placed on the table.

**Keywords:** non-prehensile manipulation; manipulation planning; pivoting; robotics

#### **1. Introduction**

Humans often manipulate large and heavy objects utilizing the contact of the object with environment. Although robotic manipulation of general-shaped large and heavy objects is challenging, we aim to realize a robot manipulating such objects by effectively utilizing the contact of the object with a table (Figure 1). Among the manipulation styles using contact with a table, pivoting refers to the style of inclining and rotating an object on its vertex. Pivoting enables a robot to manipulate a bulky object with a relatively small manipulating force because the robot does not need to lift the object [1]. Moreover, the pivoting gait, which is a manipulation style, enables a robot to move an object by pivoting it multiple times by changing its rotational vertex. In addition, to start the pivoting gait from the designated initial pose of the object, a robot may once tumble the object by rotating it on its edge.

Pivoting gait has been explored for moving a simple box-shaped object [2,3]; however, this study aims to realize the robotic manipulation of a general-shaped object by combining pivoting, tumbling, and regrasping (Figure 2). Let us consider the example shown in Figure 3, where a robot moves a blue-colored object to the designated location on a table, with avoiding an obstacle (red colored). In this case, the robot first tumbles the object to an upright posture and then passes through the narrow passage using the pivoting gait. For realizing such combined manipulation, we need to determine the object face required to contact with table and the grasping pose of the object. In addition, a robot may change the grasping pose multiple times, and such changes of the grasping pose are referred as regrasping.

**Citation:** Zhang, A.; Koyama, K.; Wan W.; Harada, K. Manipulation Planning for Large Objects through Pivoting, Tumbling, and Regrasping. *Appl. Sci.* **2021**, *11*, 9103. https:// doi.org/10.3390/app11199103

Academic Editor: Alessandro Gasparetto

Received: 3 September 2021 Accepted: 27 September 2021 Published: 30 September 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

**Figure 1.** A dual-arm robot manipulates a piano by pivoting on the object's vertex (marked by a yellow dot).

**Figure 2.** Manipulation of a piano by (**a**) pivoting, (**b**) tumbling, and (**c**) regrasping.

**Figure 3.** Motion sequence for moving a piano forward and avoiding collision with an obstacle (red colored). Both the initial and goal object poses are in SP, whereas the intermediate object pose is in USP. The yellow trajectory indicates the motion of the object's CoM.

Our proposed planner comprises offline and online phases. In the offline phase, we first preprocess the mesh model of a general-shaped object clustered into superimposed segments [4] in order to determine the object's contact surface with table and the contact surfaces with the end-effectors (EEFs). In the online phase, we propose a hierarchical motion planning method to determine the motion of both the object and the robot. Hierarchical planning includes task level planning and motion level planning. In the task level, we sample the object configurations on the table considering the kinematic constraints of pivoting and tumbling. At this level, the planner constructs a graph, where each node in the graph represents an object pose and a grasp configuration, and each edge in the

graph indicates a primitive motion to transform the object poses or grasp configurations. The purpose of primitive motion is to realize either pivoting, tumbling, or regrasping. On the other hand, in the motion level, the robotic manipulation motions are generated. Here, the pivoting and tumbling motions are generated by predicting the object's future dynamics using model predictive control (MPC), considering the kinematic constraints for maintaining contact with the environment.

Furthermore, this study expands the feasible object poses used in motion planning to both stable placements (SP) and unstable placements (USP). Here, SP indicates that the vertical projection of the object's center of mass (CoM) is included in the object's supporting area. The robot can easily regrasp the object in SP. On the other hand, in USP, a robot can manipulate an object within a limited support area, which offers more choice for avoiding collision in a narrow space. Despite this advantage, it is difficult to regrasp the object in USP because USP not only requires contact with the environment but also contact with the EEFs. If contact between the object and the EEFs is lost during USP, the object will fall. To solve this problem, the two EEFs sequentially change the contact point position, i.e., the left EEF moves to the desired position while the right EEF maintains contact with the object and vice versa.

The contributions of this work are as follows:


This reminder of this paper is organized as follows. Section 2 reviews the related work. Section 3 provides an overview of the proposed method. The required preprocessing of the object model to manipulate a general-shaped object is presented in Section 4. Section 5 introduces the manipulation planning design. It describes the sampling of the object configurations and the design of primitive motions, such as pivoting, regrasping, tumbling, which are combined into a graph. The performed simulation and experiments are detailed in Section 6. Finally, Section 7 summarizes the results and discusses the future work.

#### **2. Related Works**

#### *2.1. Non-Prehensile Manipulation*

The non-prehensile manipulation allows a robot to manipulate an object without firmly grasping it by taking advantage of contact with environment [5]. Examples of non-prehensile manipulation include pushing [6], tumbling [7], scooping [8], tilting [9], throwing [10], catching [11], batting [12], sliding [13] and pivoting [14].

Pivoting is efficient for manipulating heavy objects because the weight of the object is mostly supported by the table. Aiyama et al. [15] first proposed the manipulation of a heavy object by pivoting. Doshi et al. [16] and Raessa et al. [17] proposed motion planners to reorient an object by pivoting. Yoshida et al. [18,19] proposed a motion planner planning an object path during the pivoting gait. Shi et al. [20] proposed a tumbling motion planner. In addition, there have been some research on motion planning combining the pivoting gait with other manipulation styles. Fakhari et al. [21] proposed the manipulation planner combining the pivoting gait with tumbling. Murooka et al. [22,23] combined pivoting, pushing and tumbling. In all the above mentioned works, the pivoting gait was planned for a simple box shaped objects. On the other hand, this is a first trial on planning the motion of a general shaped object by combining pivoting, tumbling and regrasping. In addition, our planner considers both SP and USP to efficiently manipulate an object.

#### *2.2. Regrasp Planning*

There are several studies on the manipulation planner with regrasping an object [24–29]. Berenson et al. [30], and Bouyamane et al. [31] presented regrasp planners with a change in contact state between the object and the environment. Harada et al. [32] proposed a regrasp planning for dual-arm robots. Hayashi et al. [33] implemented manipulation planning to

wrap-up fabric by incorporating two robot arms with regrasping. Wan et al. [34] proposed manipulation planning, which determines a sequence of dual-arm robot motion, to reorient an object with regrasping.

#### *2.3. Grasp Planning*

The grasp planning searches for a grasping pose of an object satisfying the force/form closure [35–38] or grasp stability condition [39–41]. In many regrasp planner, multiple grasping poses are calculated by using a grasp planner [4,42,43], before executing the motion planner. Then, a robot regrasps an object by switching among multiple grasping poses. To provide steady grasps, many elements must be considered, such as contact zones, object surface curvatures, mechanics of robot hands, and so on [44–48]. For two-fingered robot gripper, Jones et al. [49] and Wolter et al. [50] proposed grasp planners. Surface segmentation was used to determine the grasping points for a parallel-jaw gripper [42,51]. This study uses grasp planners for a parallel-jaw gripper [4] to determine the contact points on a general-shaped object manipulated by a dual-arm manipulator with ball-shaped EEFs. Cooperating two manipulators with the EEFs enables a robot to grasp a large object which may be too large for a gripper to grasp.

#### **3. Steps of the Proposed Motion Planner**

In this study, we assume that the 3D shape of the object is given. In addition, we assume that a dual-arm robot with ball-shaped EEFs manipulates the object at two contact points. To manipulate the object to the target pose, a hierarchical manipulation planner is built at the task and motion levels, which outputs the planned motions. The proposed motion planner is designed according to the following steps:


#### **4. Object Model Analysis**

To manipulate a general-shaped object, we preprocess its 3D-shape model and obtain the necessary information for manipulation planning. We aim to determine the following:


Superimposed segments are implemented when preprocessing the 3D-shape model of the object [4]. These segments often generate numerous grasp configurations. However, as numerous grasp configurations increase the calculation load of the manipulation planner, we reduce the number of grasp configurations by checking the collisions with the EEFs and evaluating the grasp stability, as described in the next subsection.

#### *4.1. 3D Surface Model Processing for Grasp Planning*

By calculating the convex hull of the model, its vertices, which are the potential rotational vertices used for pivoting, can be obtained (see Figure 4a). The rotational vertices are the supporting feet of objects during pivoting gait. In addition, the edges, which are the potential rotational edges used for tumbling, can be obtained from the convex hull.

**Figure 4.** Object model analysis. (**a**) Vertices of the object (red points). (**b**) Contact points sampled on the object surface. The facets segmented by the superimposed segments are denoted by different colors. The contact points on the green surface are removed due to collision between the object and the EEF.

Superimposed segments are implemented to process a mesh model for contact and grasp planning. It analyzes the object model by peeling it into facets where the facets are allowed to be overlapped by each other. In addition to superimposed segments, the rayshooting method [52] and simple segments [42] can also be used for analyzing mesh models. The ray-shooting method samples a contact point on the mesh surface and finds the candidate counter contact point by shooting a ray from the sampled point along the reversed normal direction. The simple segments method is similar to the superimposed segments while the difference lies in if the overlapping of facets is considered. The comparison among the three methods has been researched in [4] which showed that the superimposed segments can find more grasps while the time cost of it is faster than that of the ray-shooting method and is not significantly worse than that of the simple segments.

In superimposed segments, given a triangulated CAD model [53], a seed triangle is initialized and is compared with the surrounding triangles by evaluating the differences between surface normals of triangles. If the difference is smaller than a threshold, the neighboring triangle is clustered into the same facet. Otherwise, a new facet based on the neighboring triangle is built. After dealing with the initial facet, the algorithm selects a new seed and repeats the clustering until all the triangles are scanned. As a result, a set of facets are generated by the superimposed segments, where each face can be a candidate contact surface with the table in SP. When placing an object on the table, we define SP if the vertical projection of the object's CoM on the table is included in the support area. During SP, the robot is allowed to regrasp the object without influencing the stability of the object.

In addition, a set of contact points with the EEFs are computed by sampling the surface of the object model. To evenly distribute the contact points on the surface, sampling is initially performed over the entire surface. The sampled points are then repeatedly distributed as contact points on the superimposed facets. Further, some of the undesired sampling points are removed based on the following: (1) they are too close to the boundary of the facet, (2) they are close to each other, and (3) there is undesired collision between the object and the EEF. Examples of the sampled contact points are shown in Figure 4b.

#### *4.2. Grasp Planning*

In this study, a dual-arm robot with ball-shaped EEFs grasps the object at two contact points on nearly parallel facets with opposite contact normal vectors. By inspecting the candidate contact points on these facets, the planner computes the potential contact pairs. Among several grasping point candidates, we further analyze and select a feasible one for a given object pose. We check whether each grasp point candidate leads to collision among the links of the robot or between a link and the environment. In addition, we check whether the inverse kinematics (IK) is solvable and whether the grasp stability can be satisfied by checking the wrench cone generated by the contacts with the EEFs and the environment.

Figure 5 shows examples of the selected contact points. Given an object configuration, the surface normals of the object are compared with those of the table. The contact pairs on surfaces with normals r1 and r2 are removed because collision occurs between the EEF and table when the EEF contacts the bottom of the object. On the other hand, the surface normals of the object are also compared with the EEF's rotational vector (e1 and e2). The contact pairs on surfaces with normals r3 and r4 are removed because they are too far to be reached by the EEF. As a result, only the contact pairs on the facets with surface normals g1 and g2 (green) remain.

**Figure 5.** Selection of grasps with respect to the surface normals of the object and the rotational vector of the EEF (e1 and e2). The contact pairs on surfaces with normal vectors r1 and r2 are removed because collision occurs between the table and EEF if the EEF contacts the bottom of the piano. The contact pairs on surfaces with normal vectors r3 and r4 are removed because they are too far to be reached by the EEFs. The contact points on the surface with normal vectors g1 and g2 remain.

#### *4.3. Grasp Stability*

When pivoting, the object contacts two EEFs of the robot (see Figure 6). In addition, the object contacts the environment, such as the table. Considering these contacts, we can evaluate the grasp stability, which is the ability of an object to balance the external wrenches *We* by the force vector *f* applied at each contact point:

$$Gf = -W\_{t\_f} \tag{1}$$

where *G* denotes the grasp map that maps the contact forces to the total object wrench. If an object contacts the environment with a vertex, we define *f* = [ *f <sup>T</sup>* <sup>0</sup> *<sup>f</sup> <sup>T</sup>* <sup>1</sup> *<sup>f</sup> <sup>T</sup>* 2 ] *<sup>T</sup>*, where *f*0, *f*1, and *f*<sup>2</sup> are the force vectors applied by the environment and the two EEFs, respectively. On the other hand, if an object contacts the environment with two vertices during USP, one EEF will be sufficient to balance the external wrench. In this case, *f*<sup>0</sup> and *f*<sup>1</sup> are the force vectors applied by the environment, and *f*<sup>2</sup> is the force vector exerted by the EEF. This property allows us to regrasp the object in USP. Contact force *fi* = [ *foi*, *fti*, *fni*] *<sup>T</sup>*, (*i* = 0, 1, 2) must lie within friction cone *FCi*:

$$FC\_i = \left\{ f\_i : \sqrt{f\_{oi}^2 + f\_{ti}^2} \le \mu\_i f\_{ri} \right\}\_{\prime} \tag{2}$$

where *μ<sup>i</sup>* is the friction coefficient at the *i*-th contact point. In this study, we approximate the friction cone using a six-sided polyhedral cone. By calculating the Minkowski sum [54] of the force at each contact point, we obtain the grasp wrench space (GWS, *Ws*) [55].

$$\mathcal{W}\_{\mathbf{s}} = \left\{ w\_{\mathbf{s}} : w\_{\mathbf{s}} = \sum\_{\mathbf{i}=1}^{3} Gf\_{\mathbf{i}} + w\_{\mathbf{g'}} \, f\_{\mathbf{i}} \in F\mathbb{C}\_{\mathbf{i}} \right\},\tag{3}$$

where *wg* indicates the wrench generated by the gravitational force. The stability *s* can be evaluated by calculating the minimum distance between the origin in the wrench space and the convex hull of the set *Ws*. In this work, if the origin is inside of the GWS, the grasp is considered stable. In the physical world, the *s* indicates the ability to resist external wrenches *We* and it can be calculated by:

$$s = \min\_{d \in \text{convexchull}(\mathcal{W}\_s)} ||d||. \tag{4}$$

Based on the grasp stability, we select the stable grasp configurations.

**Figure 6.** Object, EEFs, and environment.

#### **5. Hierarchical Manipulation Planning**

We plan the motions to manipulate an object on which information is available. We propose hierarchical planning, which includes task and motion level planning. The manipulation planner constructs a graph. In task level planning, we design the discrete object poses on a table, which along the grasp poses are saved in the nodes. In motion level planning, we plan the motions for moving the object and changing the grasp configurations. Primitive motions, such as pivoting, tumbling, and regrasping, are represented by the edges of the graph. Further, given the initial and goal configuration of the object, we search the graph and find a motion sequence to manipulate the object to the goal position.

Different from previous planners [2,3,19], we consider tumbling and regrasping in addition to the pivoting gait. Moreover, we consider the SP and USP of the object. Object motion in USP allows object movement within a limited table area. However, the object may easily fall if the robot regrasps it in USP. To avoid this problem, we design the regrasping motion in USP, such that the robot sequentially changes the contact position of each EEF.

#### *5.1. Task Level Planning*

In task level planning, we sample the object poses on a table, and save information on the object configurations and the related grasp configurations in the nodes. The object poses are not sampled randomly on the table because pivoting includes a kinematic constraint that the object must be rotated around a fixed point on the table, which is also the object's rotational vertex. In the graph, we sample new object poses with respect to the pivoting and tumbling motions (see Figure 7). In pivoting motion, we discretize the object pose when rotating about the vertical line including the object vertex. The rotation is discretized in *θ<sup>t</sup>* intervals. We repeat such sampling for all the vertices in the current base surface. By sequentially changing the rotational vertices, the object can be moved through pivoting gait along a certain direction (see Figure 8).

**Figure 7.** Sampling new object configurations through pivoting and tumbling at one object vertex.

**Figure 8.** Examples of moving an object to certain direction through pivoting gait. The red dashed arrows indicate rotating directions. *θ<sup>i</sup>* indicates the amount of yaw rotation in the *i*-th sequence.

To sample new object configurations by tumbling, the object is rotated around an object edge until the object moves from one SP to another (see Figures 2b and 7). Tumbling enables a change in the support surface, which provides more choice in the path planning of the object, in case an obstacle is present (see Figure 3).

In this study, most of the object poses are designed in SP. However, within a limited table area, such as the boundary of the table, it is difficult to place the object in SP. For such an area, we sample the object pose in USP where an EEF of the robot assists in holding the object. To retain an object in USP, we check the grasp stability, discussed in Section 4.3, to select the contact points of the EEFs. Here, it is to be noted that sampling the object poses in USP corresponds to pivoting the object assuming the double support (DS) gait mode, as discussed in [3]. On the other hand, sampling the object poses in SP corresponds to pivoting the object assuming the quadruple support (QS) gait mode. Figure 9 shows the difference between the DS and QS gait modes. Refer [3] for the details on the DS and QS gait modes.

(**a**) Quadruple support (QS) mode. (**b**) Double support (DS) mode.

**Figure 9.** Changes in the object's rotational vertices (**a**) in the QS mode when four vertices are in contact with the table and (**b**) in the DS mode when two vertices are in contact with the table.

Figure 10a shows the sampled object poses on the table. Each node indicates an object pose; the red ones denote SP, whereas the blue ones denote USP. The red and blue edges indicate pivoting and tumbling motion, respectively. For better manipulation performance in the real physical world, we also evaluate the change in the angle between the EEF and the contact surface normal during pivoting motion.

**Figure 10.** The graphs generated at the task level. The object poses are first sampled and shown in the left graph, then the right graph is built by combining grasp configurations. (**a**) Graph of the sampled object poses. The red nodes indicate the object poses in SP, whereas the blue nodes indicate USP. The red and blue edges indicate pivoting and tumbling motion, respectively. (**b**) Grasps combined with the graph shown in (**a**). Each node includes information on both the object pose and grasp configuration. Yellow edges are added, which indicate the regrasping to change the grasp poses.

Combining the corresponding grasp configurations with the object configuration, we can generate the graph shown in Figure 10b. In this graph, each node includes information on the grasp configurations and the object pose related to Figure 10a.

The detailed explanations of the graph are shown in Figure 11, where the blue rectangular part in Figure 10b is amplified. A node contains information on both an object and a grasp configuration. Nodes insides a light blue circle (which is drawn only for explanations in Figure 11 and is not included in the graph) indicate they share the same object pose but different grasp configurations. Three primitive motions are represented by edges where yellow, red, and blue edges imply regrasping, pivoting, and tumbling, respectively. The configurations saved in nodes can be transferred by the edges connecting them. For example, in Figure 11b, the grasp configurations of two nodes are changed by a yellow edge which implies regrasping. The design of SP and USP is also reflected in the graph where blue nodes indicate USP, see Figure 11b and red nodes indicate SP, see Figure 11c,d. What is more, there are fewer nodes within the right blue circle in (a) because there are fewer grasps as the contact surface in the current object pose is small, see Figure 11d.

**Figure 11.** (**a**) Portion of the graph in the rectangle depicted in Figure 10b. The nodes within the light blue circle share the same object pose and each node indicates the grasp configuration for the object pose. Nodes are connected by edges which correspond to primitive motions. (**b**) An object pose and a corresponding grasp pose. (**c**) Two grasp poses can be changed by the yellow edge connecting them. (**d**) The object pose in (**d**) can be changed to (**b**) or (**c**) by tumbling.

#### *5.2. Motion Level Planning*

After sampling the object poses on the table and generating the corresponding grasps, the motions to move the object and change the grasp poses are generated in motion level planning. The primitive motions include the following:(i) pivoting motion, which is a transfer motion to change the object configuration without changing the grasps, (ii) tumbling motion, which includes both transfer and transit motions to rotate the object around one edge and change its base surface, and (iii) regrasping of an object in SP involving transit motion, which changes the grasp configurations without affecting the object configuration, and regrasping of an object in USP, which involves both transit and transfer motions.

#### 5.2.1. Pivoting

Pivoting motion is designed by raising the object up on a vertex, rotating it around the vertex, and placing it down. A property of pivoting motion is that two object poses must share the same rotational vertex. Figure 8 displays 2D examples in which pivoting gait is used to move an object in the direction indicated by arrows.

In Figure 6, the contact point between the object and table is denoted by *p*0. Here, we assume point contact with friction at each contact point. Due to contact with table at a fixed vertex (*p*0), the object motion is constrained as follows:

$$SD\_{B0}\begin{bmatrix}\dot{p}\_B\\\omega\_B\end{bmatrix} = S\begin{bmatrix}\dot{p}\_0\\\omega\_0\end{bmatrix} = o. \tag{5}$$

Contact between the object and *i*-th EEF (*i* = 1, 2) is constrained as follows:

$$SD\_{Bi} \begin{bmatrix} \dot{p}\_B\\ \omega\_B \end{bmatrix} = SD\_{Hi} \begin{bmatrix} \dot{p}\_{Hi} \\ \omega\_{Hi} \end{bmatrix} = \dot{p}\_{i\prime} \tag{6}$$

where *S* is a selection matrix that selects the linear velocity and *DBi* transforms the linear and angular velocity from Σ*<sup>B</sup>* to Σ*W*. Details on the *S* and *D* matrices can be found in our previous work [3]. The object is accelerated by the force (*f*1, *f*2) applied by the two EEFs. The dynamic of the object's rotational motion can be obtained as

$$r\_1 \times f\_1 + r\_2 \times f\_2 + r\_{\text{com}} \times m\_o g = \mathcal{Z}\_o \dot{\omega}\_B + \omega\_B \times \mathcal{Z}\_o \omega\_B,\tag{7}$$

where *ri* = *pi* − *p*0, (*i* = 1, 2), and *rcom* = *pcom* − *p*<sup>0</sup> where *pcom* denotes the position vector of the object's CoM. *mo* and I*<sup>o</sup>* indicate the mass and moment of inertia of the object, respectively. *ω<sup>B</sup>* and *ω*˙ *<sup>B</sup>* indicate the angular velocity and angular acceleration of the object in frame Σ*B*, respectively.

In the motion level, the robot motions to move the object between the object poses designed in the nodes are generated by MPC with respect to the kinematics (5), (6), and the dynamic (7). Figure 12b depicts an example of the generated motions for pivoting a piano. In MPC, we define the state vector as *xk* = [Ψ*<sup>B</sup> ωB*] T *<sup>k</sup>* where Ψ*<sup>B</sup>* indicates the Euler angles of the object in Σ*<sup>B</sup>* and select the input vector as *uk* = [ *f*<sup>1</sup> *f*2] *T <sup>k</sup>* . During the object's rotation on the vertex *p*0, the prediction of the state is obtained by,

$$\mathbf{x}\_{k+1} = A\mathbf{x}\_k + B\boldsymbol{u}\_k + D,\tag{8}$$

where *A*, *B*, and *D* are coefficient matrices defined as

$$A = \begin{bmatrix} I\_3 & \mathcal{W} \\ O\_3 & I\_3 \end{bmatrix}, \quad B = \begin{bmatrix} \mathcal{Z}\_o^{-1} \mathcal{W} T^2 / 2[r\_1 \times] & \mathcal{Z}\_o^{-1} \mathcal{W} T^2 / 2[r\_2 \times] \\ \mathcal{Z}\_o^{-1} T [r\_2 \times] & \mathcal{Z}\_o^{-1} T [r\_2 \times] \end{bmatrix}, \tag{9}$$

$$D = \begin{bmatrix} \mathcal{Z}\_o^{-1} \mathcal{W} T^2 / 2 [r\_{com} \times] mg\\ \mathcal{Z}\_o^{-1} T [r\_{com} \times] mg \end{bmatrix} \tag{10}$$

where the matrix *W* transforms the angular velocity to the velocity of the Euler angles.

The future states and free variables appear in the prediction horizon (*Np*) are defined as *Xk* = [*xk*, *xk*+1, ··· , *xk*<sup>+</sup>*np*−1] *<sup>T</sup>* and *Uk* = [*uk*, *uk*+1, ··· , *uk*<sup>+</sup>*np*−1] *<sup>T</sup>*, respectively. The MPC tracks a reference by solving the following optimization problem,

$$J\_{\rm mpc} = \frac{\alpha}{2} \left\| X\_{k+1} - X^{\rm ref} \right\|^2 + \frac{\beta}{2} \left\| \mathcal{U}\_k \right\|^2,\tag{11}$$

where *α* and *β* are the weights. *Xref* is the reference trajectory of the object and it is provided by the object configuration saved in nodes of the graph. The first and the second terms in (11) evaluate the state error and the amount of the manipulating force of the EEF, respectively.

**Figure 12.** Example of pivoting motion to move a piano. (**a**) Two poses of the piano. (**b**) Transformation of the object poses shown in (**a**) through pivoting motion. The object is tilted, rotated around one vertex, and then placed down.

#### 5.2.2. Tumbling and Regrasping

Changing the base surface of the object improves the obstacle avoidance performance. The object's base surface is the support surface between the object and the table. The proposed method has better collision-avoidance capability since pivoting gait is performed with the combination of tumbling and regrasping. By tumbling, we can control the area of support polygon and it contributes to the case when a manipulated object passes through a narrow passage. In addition, by regrasping, we may avoid the collision between the robotic arms and the environment. An example of the object motion is shown in Figure 13, where the gaits 1 cannot avoid the collision with the red obstacle since the object is not tumbled. If the object is tumbled to the right as shown in the right of Figure 13, the supporting polygon can be changed. Such a change allows the object to be moved without colliding the obstacle by performing the gaits 2. Tumbling motion contributes to change the type of gaits adaptively according to the environment.

**Figure 13.** A change of the object's supporting base improves the ability of collision avoidance. The object motion following the gaits 1 cannot avoid a collision with the red obstacle. If the object is tumbled, the supporting base will be changed. Such a change allows the object to be moved without colliding the obstacle by performing the gaits 2. To pass a narrow passage, gaits 2 is more area-efficient than gaits 1.

For changing the object's base surface, we design tumbling motion, which is a special case of pivoting when the rotation passes through the object's edge. Rotating an object up to SP is simple, where one EEF rotates the object around one edge (the line connecting the two supporting vertices) of the object until it moves to SP. The kinematics and dynamics of tumbling motion can be derived through minor modifications in (5)–(11). However, rotating an object up to USP is more complex. This is because to maintain an object in USP, the EEF must contact and hold the object, which leads to difficulty in changing the grasp poses. To solve this problem, we incorporate multiple arms to achieve regrasping by sequentially using different arms to contact the object. Figure 14 depicts a tumbling motion where the object is rotated to USP in the boundary of the table. If the right EEF does not contact the object, it will fall from the table. The tumbling motion is designed as follows: First, the left EEF contacts and rotates the piano (see motion l1). Meanwhile, the right EEF moves to the desired position, which is also the right EEF's grasp pose for contacting the object in USP (see motion r1). The left EEF rotates the object until it contacts the right EEF. Next, the right EEF alone holds the object. The stability is evaluated by calculating the contact wrench of the contact between the right EEF and the object, as well as that between the object and table, as mentioned in Section 4.3. The left EEF then leaves the object and moves to the desired grasp configuration (see motion l2). Compared to regrasping with a gripper, which can only regrasp an object in SP, a multi-arm robot can regrasp an object in both SP and USP.

**Figure 14.** Tumbling motion by incorporating two manipulators.

In addition to regrasping an object in USP, regrasping motion in SP is also designed. When an object is in SP, we can easily plan the motions for regrasping without affecting the object pose. The regrasp motions are denoted by the yellow edges in Figures 10b and 11a.

#### *5.3. Graph Searching*

Given the initial and goal poses of the object, grasps of these object poses can be determined by data provided by object analysis introduced in Section 4. Then, we look for the same object and grasp configurations that are embedded in nodes of the graph. If the new object poses and their grasp configurations can not be found in the graph, new nodes will be created and be connected with the graph. The reachability of new nodes can be guaranteed by the controllability of pivoting motion [18], which proved that a sequence of three pivoting motions can move the object to an arbitrary object configuration with the same base surface in the neighborhood.

Weights are manually assigned to edges in the graph. As the object can be efficiently moved by pivoting and we expect the robot finishes the placement of the object quickly, we set the weight of edges representing pivoting to a small value 0.1. The weight of edges indicating regrasping is set to 0.5 and the weight of edges indicating tumbling is set t 1. A relative large weight is designed to edges related to tumbling because the tumbling is mainly designed for changing the object's supporting base instead of moving the object. Knowing the initial and goal configurations, we search the weighted graph to find a path. A solution path includes sequences of discrete object and grasp configurations. Intermediate configurations are added to ensure a smooth motion during pivoting and tumbling by the MPC introduced in Section 5.2. In a lower-level motion planning, we use rapidly exploring random tree (RRT) to sample the space considering grasp configurations. At this level, if configurations with feasible inverse kinematics and obstacle clearance cannot be found, we go back to the graph, remove certain nodes and edges which caused the problem and re-search the graph for a new path.

#### **6. Simulation and Experiments**

In this study, the target object is a toy piano sized 0.425 × 0.45 × 0.205 m with a weight of 3.1 kg. The piano is placed on a table, which is in front of the robot, and the size of the table surface is 0.9 × 0.6 m. We use a dual-arm Yaskawa Motoman SDA5F robot to manipulate the object. The two EEFs of the robot are the same and ball-shaped. They are produced by a 3D printer where the diameter of the ball is 4 cm. In addition, to increase the friction between the EEFs and the object, anti-slip stickers are added to the tip of EEFs.

#### *6.1. Object Analysis*

A mesh piano model is processed by the superimposed segments offline. The computational costs and the results of the analysis of the mesh model are shown in Table 1. The results are obtained by executing the process on a desktop PC where the CPU is Intel Core i7-9700K @ 3.60 GHZ and a memory in size of 8 GB. The program is performed using Python 3.6.



The number of faces of the object generated by the superimposed segments is 262 which is large. This is because curvatures of the object, for example, the legs of the piano, can be separated to many faces, see Figure 4b. It is a time-consuming process to refine samples and plan contact pairs where bad samples are removed due to the rules mentioned in Section 4.1 and collision checking between the EEFs and the object. The time cost is reasonable because a mesh-to-mesh collision detection is implemented. After the refinement, contact pairs are created based on the remaining samples.

Contact pairs under a given object configuration can be refined by grasp stability and collision checking between the EEFs and the environment by the rules discussed in Sections 4.2 and 4.3. In Figure 10b, the total number of sampled object configurations is 310 and the number of feasible grasp configurations after the refinement is 4207 where the process of the refinement costed 9.362 s. The time cost of the refinement is relatively high but acceptable since we only need to do it once and offline. If the refinement is not implemented, the evaluation of the grasp stability will be performed 33,170 (107 × 310) times and cost over one minute.

#### *6.2. Simulation*

In the simulation, the target is to move the piano to the goal pose, which is far from the robot (see Figure 15). The robot first grasps the piano placed on the table (see Figure 15a). Further, it pivots the piano around one vertex under the front right leg (see Figure 15b). Here, we define the left and right sides of the object and those of the robot according to the view from the robot. In Figure 15c, the piano is placed on the table and the rotational vertex is changed from right to left. Then, through pivoting motion, the piano is placed in the pose shown in Figure 15d. Note that both arms of the robot are close to their kinematic limits and the robot cannot further move the object forward. However, the robot changes its grasp configuration and grasps the rear of the object (see Figure 15e,f). Thereby, the robot can pivot the piano to move it forward and successfully place it in the goal position (see Figure 15g,h). By changing the grasp configuration, the piano is moved forward by as much as 12.5 cm (see Figure 15d,h).

This simulation demonstrates that regrasping motion extends the scope of feasible placements of the object manipulated by the robot.

**Figure 15.** Manipulation of the piano by the robot for moving it forward by pivoting. The support vertices are marked as black dots. The initial configurations are shown in (**a**). The robot manipulates the object by pivoting in (**b**–**d**). When the joint configurations are close to the limits (**d**), regrasping is performed in (**d**–**f**). After regrasping, the robot moves the object to the goal pose, see (**f**–**h**).

#### *6.3. Experiment 1: Pivoting Gait*

In the first experiment, we perform the simulated pivoting gait in a real scenario. The inputs are the initial and target poses of the piano. The system automatically plans the grasp, pivoting motion, and regrasping motion. Figure 16 depicts the experimental process. The robot first grasps the piano (see Figure 16a) and pivots it around the right (from the robot view) rotational vertex (see Figure 16b). Further, the robot changes the rotational vertex to the left (see Figure 16c) and pivots the piano (see Figure 16d,e). In Figure 16e, both the arms are close to the limit of the robot workspace; hence, the robot cannot further move the piano forward. Pivoting gaits generated by the graph MPC [3] faces such limitations and the object cannot be placed in the target location. To solve this problem, regrasping is planned in this work and the robot changes the grasp poses to grasp the rear part of the object (see Figure 16f,g) and successfully pivots the piano to the target pose (see Figure 16h–j. Table 2 compares the performances of two methods where only the proposed planner can generate motions to move the object to the target location.

**Figure 16.** Experiment 1: The robot manipulates the object by pivoting in (**a**–**d**) and regrasping the object in (**e**–**g**). In (**e**), without regrasping, the robot cannot further move the object forward because the robot arms are close to the kinematic limit. However, after regrasping, the robot can move the object forward for two pivoting steps, see (**h**–**j**). The yellow dots indicate the object vertices that contact the table.


**Table 2.** Performances of the graph MPC and the proposed planner.

In the experiment, after moving the piano a few steps, the robot cannot further move the object with the initial grasp poses because of the robot workspace. To continue moving the object, the robot finds a new grasp configuration, regrasps the piano, and manipulates it to the target location. Compared with the pivoting gaits generated by the graph MPC [3], the regrasping motion enables further motion of the object by as much as 12.5 cm along the *x*-axis.

#### *6.4. Experiment 2: Object Orientation*

Experiment 2 tests a combination of regrasping and pivoting during the rotation of the piano around the z-axis in the world frame (see Figure 17). The initial pose of the piano is shown in Figure 17a and the support vertices are marked as yellow dots. The robot grasps the piano, pivots it around one of its support vertices, and places it down in a pose where a rotation of −45 degrees in the z-axis is performed (see Figure 17a–c). Further, the robot changes the grasp poses to different contact surfaces as depicted in Figure 17d,e to avoid self-collision of the robot and maintain the EEF in a reachable pose. After regrasping, the robot rotates the piano by another −45 degrees around the *z*-axis and places it down (see Figure 17f). In Figure 17g,h, a new regrasping motion is performed after which the robot pivots the piano to the target pose (see Figure 17i,j).

Experiment 2 shows that to perform a relatively large change in the object orientation, the robot needs to contact different surfaces of the object; with a combination of regrasping and pivoting, the robot successfully manipulates the object to the target pose.

**Figure 17.** Experiment 2: the robot changes the orientation of the object by pivoting and regrasping. The robot first rotates the object by pivoting in (**a**–**c**). Then the robot regrasps the object at its new contact surfaces, see (**c**–**e**) and pivots the object, see (**e**,**f**). In (**f**–**h**), the robot changes the grasp configurations and finally pivots the object to the goal pose, see (**h**–**j**).

#### *6.5. Experiment 3: Obstacle Avoidance*

In the third experiment, the robot moves the piano to the target pose considering the environment where an obstacle exists and the limited support surface (see Figure 18). Different from the sheering method [2], which cannot find a feasible path to avoid collision, changing the object's base surface by tumbling can realize better collision avoidance be(**a**) (**b**) (**c**) (**d**) (**e**) (**f**) (**g**) (**h**) (**i**) (**j**) (**k**) (**l**)

tween the object and the obstacle. Moreover, the USP design allows the robot to manipulate the object within a narrow support surface without falling.

**Figure 18.** Experiment 3: An obstacle is placed in front of the object, see (**a**,**b**). The robot tumbles the object using the robot's right EEF, whereas the left EEF moves to the desired position and waits for contacting the object (**c**,**d**). After the left EEF holds the object, the right EEF moves to the desired grasp pose (**e**,**f**). Note that the object is close to the boundary of the table and it is in a USP pose where the object will fall if there is no support from the EEF. After regrasping, the robot moves the object by pivoting it in the DS mode (**g**,**h**) and then places it in SP (**i**). The robot moves the object in the QS mode (**j**,**k**) and tumbles the object to change its support surface (**l**,**m**). Finally, the robot regrasps (**m**,**n**) and moves the object to the goal position (**o**,**p**). There is no collision between the object and obstacle during these motions.

(**m**) (**n**) (**o**) (**p**)

A grey pen holder is placed in front of the piano as an obstacle (see Figure 18a). The poses of the obstacle and the piano are input to the system. As the obstacle is very close to the piano, the piano cannot avoid collision by performing the pivoting gait using the sheering method. Therefore, the robot tries to tumble the piano. The right EEF moves to a new grasp configuration and prepares for the tumbling motion, whereas the left EEF moves to the planned grasp configuration preparing to catch and hold the object after rotation (see Figure 18b,c). The robot utilizes the right EEF to rotate the piano (see Figure 18d). Once the piano is in contact with the left EEF, the right EEF leaves the piano and moves to a new

grasp pose (see Figure 18e,f). As a result, after the rotation, regrasping is performed by sequentially incorporating the right and left EEFs to hold the piano, which is an advantage of the dual-arm robot. Note that as the supporting table is narrow and the piano is near the table boundary, sufficient space is not available for the piano to be placed in SP and the left EEF is used to keep the piano in USP. For example, if the left EEF does not hold the piano, the piano will fall from the table (see Figure 18e). The USP design allows the robot to manipulate the object within a limited support space. Subsequently, the robot performs the pivoting gait designed for the USP, which is the DS gait mode (see Figure 18g,h). The robot places the piano in SP when there is sufficient space on the table, and then changes the grasp configurations (see Figure 18i,j). After regrasping, the robot pivots the piano for two steps along the y-direction (see Figure 18k,l) and then tumbles the piano to change the support surface through the right EEF (see Figure 18m). Finally, the robot regrasps the object and pivots it for two steps along the x-direction, and places it at the target location (see Figure 18n–p). During the entire process, there is no collision between the object and the obstacle.

Experiment 3 demonstrates that changing the object's base surface improves the collision avoidance performance in pivoting gait. In addition, the USP design enables the robot to move the object within a limited support space. Moreover, the multiple arms incorporated to sequentially contact the object achieve regrasping in USP.

#### **7. Conclusions**

In this study, we proposed a manipulation plan for moving a general-shaped large object. With superimposed segments, the object model was first analyzed, and the contact between the object and EEFs, as well as that between the object and environment were determined. Utilizing the object's contact with the environment, we realized the manipulation of a bulky object through pivoting, tumbling, and regrasping. In the plan, the object configurations in both SP and USP were considered, enabling the robot to manipulate the object within a limited support area. The proposed approach takes advantage of the kinematic, dynamic, and gait modes of the pivoting gait published in our previous work [3], which is related to motion planning for moving the object in both SP and USP. In this study, this pivoting gait is further improved by combining regrasping and tumbling. The efficiency of the proposed method was demonstrated through experiments in which the robot moved an object to the goal configuration, avoiding the kinematic limits of the robot arms and exhibiting better collision avoidance performing compared to those reported in prior works. The proposed approach can be employed to automate the manipulation of large objects by dual-arm manipulators. In future, we intend to manipulate more complex-shaped objects, using one EEF to perform pivoting, and planning motions in more complex environment for testing the planner's performance. Quantitative analysis based on computational costs, stability, number of gaits, tumbles and regrasps to finish a task will be researched.

**Author Contributions:** The authors' contributions are reported as follows: Original concept, A.Z. and K.H.; Methodology, A.Z., K.K., W.W. and K.H.; Software implementation, A.Z. and W.W.; Technical implementation, A.Z.; Writing—Original Draft Preparation, A.Z.; Writing—Review and Editing, A.Z. and K.H.; Supervision, K.K., W.W. and K.H.; Project Administration, K.H. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Design and Implementation of Composed Position/Force Controllers for Object Manipulation**

**Sergio Hernandez-Mendez 1,\*, Elvia Ruth Palacios-Hernandez 2, Antonio Marin-Hernandez 1,\*, Ericka Janet Rechy-Ramirez <sup>1</sup> and Hector Vazquez-Leal <sup>3</sup>**


**Abstract:** In the design of a controller for grasping objects through a robotic manipulator, there are two key problems: to find the position of the object to be grasped accurately, and to apply the appropriate force to each finger to handle the object properly without causing undesirable movement of it during its manipulation. A proportional-integral-derivative (PID) controller is widely used to grasp objects in robotics; however, its main shortcomings are its sensitivity to controller gains, sluggish response, and high starting overshooting. This research presents three coupled (position/force) controllers for object manipulation using an assembled robotic manipulator (i.e., a gripper attached to a robotic arm mounted on a mobile robot). Specifically, an angular gripper was employed in this study, which was composed of two independent fingers with a piezoelectric force sensor attached to each fingertip. The main contributions of this study are the designs and implementations of three controllers: a classic PID controller, a type-I controller, and a type-II fuzzy controller. These three controllers were used to find an object to be grasped properly (position) and apply an equivalent force to each finger (force).

**Keywords:** robotics; manipulation; intelligent control

#### **1. Introduction**

Due to uncertainty on localization, mobile robot manipulation faces two key problems when trying to manipulate objects: firstly, computing the correct position at which the mobile robot needs to be for grasping an object, and secondly, calculating the specific position at which the object to be grasped is.

Specifically, object manipulation through a robotic arm involves three actions: (i) moving the robot and positioning it at a place where its arm's configuration space intercepts the object's position (Figure 1a); (ii) planning and execution of the arm's trajectories—that is, to move the robotic arm towards a position where the object is inside the gripper's configuration space, i.e., where the gripper (end effector) may grasp the object [1,2] (Figure 1b); and (iii) tuning the gripping action—i.e., closing the gripper in order to take and manipulate the object properly [3–5] (Figure 1c). The first two actions are called the *pre-grasping* stage, and the last action is named the *grasping* stage.

The gripper and their controller are key components at the grasping stage in order to manipulate objects accurately. Parallel motion grippers and angular motion grippers are the most commonly used in robotics for object manipulation [6]. Parallel grippers usually have only one motor; therefore, their fingers move simultaneously in order to handle objects. Recently, the use of grippers with angular motions has increased. Thanks to their configuration, these grippers extend the gripping range of objects, and different

**Citation:** Hernandez-Mendez, S.; Palacios-Hernandez, E.R.; Marin-Hernandez, A.; Rechy-Ramirez, E.J.; Vazquez-Leal, H. Design and Implementation of Composed Position/Force Controllers for Object Manipulation. *Appl. Sci.* **2021**, *11*, 9827. https:// doi.org/10.3390/app11219827

Academic Editor: António Paulo Moreira

Received: 30 September 2021 Accepted: 16 October 2021 Published: 21 October 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

configurations may be calculated to handle an object; therefore, these grippers might provide dexterous object manipulation similarly to human fingers. Detailed reviews regarding several types of grippers can be found in [7–9].

**Figure 1.** Actions prior to the object's manipulation: (**a**) moving the robot so the object's position is within the arm's configuration space, (**b**) object detection and position estimation, and (**c**) planning and execution of trajectories to drive the arm close to the object.

Focusing on controllers, the PID controller has been widely used in control systems [10–13] due to its simplicity and robustness. In the last few years, there has been increasing interest in using fuzzy logic in different control systems [14–20]. Moreover, fuzzy logic has been used for implementing several applications. For instance: in a multicriteria decisionmaking process [21], energy consumption for bipedal walking robots [22], and search engine systems [23]. Fuzzy logic uses fuzzy values and rules to cope with uncertainty, just as humans do. These fuzzy values and rules may be implemented based on the user's experience instead of using complex mathematical models.

In the present work, an angular gripper with two independent fingers was used. Prior to the manipulation stage, the actions described in Figure 1 were done. The object's position, initially unknown, was obtained by a perception system and was used to bring the arm close to the object. However, at this stage of the work, no visual surveying has been used. Instead, a tactile feedback control is performed. The object's dimensions were used to estimate the gripper's initial opening. Once the object was between gripper fingers, angular position and (when object do contact) pressure forces were used to control the motion of each finger independently. The pressure forces were obtained by piezoelectric sensors on the fingertips.

The contributions of this work can be summarized as follows: The implementation of three switching force and position controllers (a PID, a fuzzy type-I and a fuzzy type-II) for grasping tasks with a two-fingered gripper (independent fingers). The evaluation and comparison of those controllers, via simulation and a real arm manipulator. A communication protocol to control and switch between controllers over an ROS. Finally, a control process over a two-fingered angular finger gripper that centers the object in order to apply equivalent forces with both fingers.

This paper begins by analyzing some relevant related work (Section 2). Next it presents the details of the system (Section 3). The communication of the gripper with an ROS is described in Section 4. The grasping stage and the design of the proposed controllers are described in Section 5. Experiments and results are presented in Section 6. Finally, conclusions are given in Section 7.

#### **2. Related Work**

Controller design for grasping objects in robotics has been an important research topic in the last few years. In this context, proportional-integral (PI) controllers and proportionalintegral-derivative (PID) controllers have been designed and implemented to be used in the grippers of robotic arms. For instance, a two-finger gripper for the manipulation of deformable objects was implemented [24]. This gripper used a PI parallel force controller

with prediction models in order to regulate the force exerted by the robot on the object; thus, the risk of damaging the objects during the grip was reduced.

In order to manipulate objects, a force sensor and a flex sensor can be placed in a gripper [25]. This gripper should be controlled only by a motor that opens and closes the gripper. An algorithm and a control strategy are used to apply force to the object without damaging it.

In [26], a griper with two parallel fingers with two phalanges each and a fixed base was used to manipulate objects. The gripper was controlled only by a motor that provided different torques, and a touch sensor was placed on the fixed surface to measure the pressure applied to the object. In order to manipulate objects, a force sensor can be placed on each finger of a parallel gripper controlled by a single motor [27].

Other studies have tested the designs of controllers using either simulated grippers or simulated robotic hands. For instance, the authors of [28] simulated and manufactured a prototype of a robotic hand with five fingers. To control the kinematic and dynamic movements of each finger, a PID controller was used. They generated a path for each finger in the configuration space, achieving fast motion towards the target angle with a small error signal and little overshooting. However, as described by the authors, when they tested it in a real prototype, the overshooting signal was larger than in the simulation, and was mainly produced by disturbances in some inputs. Moreover, when the reference value was set to a long period of time, small oscillations occurred, again only in the real prototype. Furthermore, the authors of [29] made a mathematical analysis of the contact forces for a robotic finger of three degrees of freedom. The authors proposed two simulated PID controllers; one was used to achieve position and the other to regulate strength. Both controllers were tuned using fuzzy logic.

Regarding using fuzzy logic in control design, the effects of membership functions (triangular, trapezoidal, and Gaussian) on a fuzzy control scheme for a three-finger system have been analyzed [30]. Moreover, another study [31] used a fuzzy control scheme in order to manipulate objects. This control scheme was composed of two fuzzy controllers. The first controller was employed to ensure a stable grip and the second to prevent slipping. In [32], a three-finger gripper to manipulate strawberries without damaging them was developed. A fuzzy controller was designed to compute the force to be exerted on the strawberry. The inputs for this controller were the readings of a capacitive sensor placed on each finger of the gripper. Due to a single motor on the mechanical gripper, the three fingers might be operated simultaneously. Likewise, Reference [33] proposed a fuzzy controller with a gripper using only a single servo motor in order to grasp unknown objects. The authors showed the effectiveness of the controller with three different types of object hardness: soft, moderate, and hard.

Similarly to PI and PID controllers, simulations have been used to test fuzzy controllers. For example, in [34] simulated a fuzzy controller with tactile information to manipulate objects with a two-finger gripper. This simulation was effective; nevertheless, the controller needs to be implemented in a gripper for real applicability. Another study [35] implemented a fuzzy controller to reduce the impacts of forces during the object manipulation. Moreover, the authors developed a fuzzy controller of conformity, in which a mass-spring-damper system determines a desired level of conformity. A simulated robotic hand was used to test the performance of the fuzzy conformance controller.

As can be seen, previous works have had at least one of the following shortcomings: most of them used parallel grippers, not requiring specifically centering the object but limiting the dexterity possible while grasping, and most of the works were tested only in simulated environments or had high variations while testing with real grippers. Consequently, in this work is presented an approach to control the position and force of each individual motor over a two-independent finger gripper in order to increase dexterity; in other words, a robot can adjust grasping by only adjusting the forces and positions of its fingers. Three controllers have been implemented and tested, a PID controller, a fuzzy

type-I controller, and a fuzzy type-II controller. In the following sections, implementations and tests are described.

#### **3. System Description: The Robotic Manipulator**

The robotic manipulator assembled in this research is shown in Figure 2. As can be noticed, this robotic manipulator is composed mainly of the following components:


**Figure 2.** (**a**) An arm with three degrees of freedom, placed on a mobile robot (iRobot Create). On the back of the robot is a data acquisition board, as well as a 12 V battery to power the system. (**b**) Force sensors placed on each fingertip of the gripper.

#### **4. Communication with an ROS**

In this work, an ROS was used to communicate with all different systems of the robot. Software packages called *nodes* run independently from each other. In order to exchange information, the *nodes* send messages, enclosed in the form of *topics*. Messages are received from publishers and sent to subscribers directly from the master server of the ROS. Therefore, nodes do not directly communicate with each other.

Figure 3 shows a reduced scheme of the nodes and their communication messages on the robotic platform. Two nodes are used to communicate with the hardware: (a) the *serial\_node* receives signals from piezoelectric-force sensors on each finger and encapsulates the signal in a message, which is sent to the ROS; and (b) the *dynamixel\_manager* reads from the ROS the commands for each motor and returns messages about the state of these. A node called *Force\_Position\_controller* is in charge of reading "sensor signals" topics and writing commands to "motor-speed" topics. It is at this node where all control computation is performed.

**Figure 3.** Communication structure of the gripper in the ROS. Nodes at the left are connected directly to hardware, whereas the node in the right (the controller) only communicates with hardware through messages sent to the public memory of ROS (at the center of the figure).

#### **5. Grasping Stage**

As has been stated in Section 1, the grasping stage refers to the action of closing the gripper to take and manipulate the object properly. In order to arrive at this stage, some considerations have to be taken into account.

Generally, mobile robotic manipulators dispose of a perception system that provides the object's information (position, shape, color, etc.) to the arm control system. The perception system can be composed of cameras, laser range finders, depth cameras, or a combination of these sensors. Once the perception system recognizes an object, it sends its information to the position and arm controllers to do the pre-grasping stage. In this work, those stages were realized. In other words, an object's size and position were calculated and the arm was driven to reach the object. However, due to uncertainty in: (a) the relative position of the mobile base, (b) associated with onboard sensors, and (c) due to the control and motion of the arm's joints, the position of end-effector can be slightly different than planned. Therefore, the object's position relative to the gripper can be one of those shown in Figure 4. Namely, the object could be placed in the center, shifted to the right, or shifted to the left between the two fingers. To avoid perception obstruction, as commonly happens with mobile robots, it was decided to not use visual surveying, and instead we used gripper sensor grasping control. The robotic arm used in this work has two-independent fingers with angular motion. Each finger was provided with a piezoelectric force sensor, imposing then, the use of one controller for each finger.

#### *5.1. Problem Statement*

Since the initial position of the object is practically known, the first thing that the gripper must do is to find and center the object. Three cases could be present: (i) the initial position of the object is in the center between the fingers; the (ii) the initial position of the object is near to the left finger; (iii) the initial position of the object is near to the right finger (Figure 4).

Then, before the gripper fingers can apply force to the object, it must be centered. If not, the differences in torques while exerting force with unbalanced fingers position can: damage the object or the motors, or make the object slide. Since, at the beginning, none of the force sensors are in contact, it was decided to divide the control system into two different objectives. The former, solving angular positions of the fingers about the object (angular position), and the latter, taking into account the force sensors when they are active.

To deal with these two objectives, it was decided to design a PID position controller and a PID force controller, commuting between them the issues regarding position errors. Once we designed and applied those controllers, and considering the global task of firm and safely handling, the performances of these controllers were not very satisfying. Thus, to compare results, but above all, to increase performance, it was decided to design two fuzzy controllers (one Mamdani type and another type-II). Their advantage over PID controllers is that it is possible to have a different output for each fuzzy-controller, and to avoid sensors' noise, which affects system performance.

**Control objectives.** Considering that the robotic arm has its gripper near to the object's position, and taking into account the structure of the gripper, the objective of the gripper control system is to achieve firm and delicate object manipulation with the measurement of the angular position and force exerted by each finger. Then, (see Figure 4), it was necessary to design two-feedback laws, position control and force control, such that:

$$\lim\_{t \to \infty} e\_p(t) \le h = \lim\_{t \to \infty} (\text{Ref}\_{\text{pos}} - \text{Pos}\_{\omega(t)}) \le h\_\prime h = \pm 2 \text{Ref}\_{\text{pos}} \tag{1}$$

$$\lim\_{t \to \infty} e\_f(t) = 0 = \lim\_{t \to \infty} (Ref\_{force} - F(t)) = 0\tag{2}$$

**Figure 4.** Initial conditions for the object's position being: (**a**) at the center with respect the fingers, (**b**) near the left finger, and (**c**) near to the right finger.

#### *5.2. Design and Implementation of the Controllers*

To manipulate an object, the forces applied by the fingers generally are sufficient to counteract gravitational and inertial components of the load force acting on the fingertips [37]. Due to the architecture of the two-independent-finger gripper used on the robotic manipulator, the controllers should perform two main tasks: (i) to place and keep the object at the desired position (i.e., position control), and (ii) to regulate the force applied to the object (i.e., a force control). To accomplish these tasks, three controllers are proposed: a hybrid PID controller (position–force), a type-I fuzzy controller (Mamdani), and a type-II fuzzy controller.

These controllers might use the following inputs provided by the robotic manipulator: two readings from angular positions (right *PosRω*(*t*) and left *PosLω*(*t*)) and two readings, one each from the force sensor of each finger (right *FR*(*t*) and left *FL*(*t*)).

On the other hand, the controllers produce a speed command as an output, i.e., two speeds, *MR*(*t*) and *ML*(*t*), which will be applied to the right and left servo motors, respectively.

#### *5.3. Position–Force Hybrid PID Controller*

As mentioned previously, PID controllers are widely used in the industry for implementing control systems. Due to its operational ease and low cost, the first implemented controller is based on a hybrid PID scheme; i.e., it consists of two PID controllers: a position PID and a force PID operating in a switching mode as shown in Figure 5. It is important to remark that either of the gripper fingers uses this control scheme.

**Figure 5.** Control scheme of the hybrid PID controller.

The first controller is responsible for placing and keeping the object at desired position. This controller works as follows: it switches on when the object is not centered, and it switches off when the object is centered. On the other hand, the force controller ensures that the fingers will apply enough force to manipulate an object firmly. This controller works when the position controller is switched off; i.e., the position error is zero.

#### 5.3.1. Position Control

A discrete PID algorithm is used to implement the position PID in the ROS. This algorithm replaces the derivative term using a backward difference method and the integral term using a rectangular integration method. The discrete form of position algorithm for PID is given as:

$$\begin{split} u\_p(t) = u\_p(t-1) + [K\_{p\_p} + \frac{K\_{p\_p}T\_{s\_p}}{T\_{i\_p}} + \frac{K\_{p\_p}T\_{d\_p}}{T\_{s\_p}}]e\_p(t) \\ + [-K\_{p\_p} + \frac{K\_{p\_p}T\_{s\_p}}{T\_{i\_p}} + \frac{2K\_{p\_p}T\_{d\_p}}{T\_{s\_p}}]e\_p(t-1) + \frac{K\_{p\_p}T\_{d\_p}}{T\_{s\_p}}e\_p(t-2) \end{split} \tag{3}$$

where *up*(*t*) is the control input, *Kpp* is the proportional gain, *Tip* is integral time, *Tdp* is derivative time or rate time, and *e*(*t*) stands for the error between the reference and process output. The values of *Kpp* <sup>=</sup> 0.000001, *Kip* <sup>=</sup> *Kpp Tsp Tip* <sup>=</sup> 0.0001, and *Kdp* <sup>=</sup> *Kpp Tdp Tsp* <sup>=</sup> 0.001 were chosen using the Ziegler-Nichol tuning rule. *Tsp* is the sampling period and *ep*(*t*) = *Refpos* − *Posω*(*t*) is the position error.

In this case, the position reference is the center of the gripper. When the position error is sufficiently small *ep*(*t*) ≤ *h* with *h* = ±2%*Refpos*, a change is sent to the force PID controller.

#### 5.3.2. Force Control

The force PID controller has a similar structure to the position PID controller.

$$u\_f(t) = u\_f(t-1) + [K\_{pf} + K\_{if} + K\_{df}]e\_f(t) + [-K\_{pf} + K\_{if} - 2K\_{df}]e\_f(t) \tag{7.1}$$

$$(t-1) + K\_{df}e\_f(t-2)$$

where *Kp f* <sup>=</sup> 0.5, *Ki f* <sup>=</sup> *Kp f Ts f Ti f* <sup>=</sup> 0.005, and *Kd f* <sup>=</sup> *Kp f Td f Ts f* = 0.001 are the proportional, integral, and derivative gains, respectively. *uf*(*t*) is the control input, *Ti f* is integral time, *Td f* is derivative time or rate time, and *Ts f* is the sampling period. *ef*(*t*) = *Ref f orce* − *F*(*t*) is the force error. The *Ref f orce* is computed according to the object's weight and material; thus, damage on the object might be reduced.

#### *5.4. Type-I and Type-II Fuzzy Controllers*

Fuzzy control provides a formal methodology for representing, manipulating, and implementing human heuristic knowledge about how to control a system. There are type-I fuzzy systems and type-II fuzzy systems.

Generally, in a type-I fuzzy controller, crisp input values are translated to fuzzy input values (fuzzification). These fuzzy input values are computed using rules to produce fuzzy output values (inference process). Finally, these fuzzy output values are mapped to crisp output values (defuzzification). It can be noticed that there are two conversions from crisp values to fuzzy values in a fuzzy controller. To achieve these conversions, it is necessary to define fuzzy sets. A type-I fuzzy set is an extension of a classical set (see Figure 6a). In a classical set, an element can (membership value = 1) or cannot (membership value = 0) be a member of a set (see blue lines in Figure 6a). Conversely, an element can be a member of a type-I fuzzy set at different membership values; i.e., the membership value can range from 0 to 1 (see red lines in Figure 6a). There are four basic membership functions: singleton, triangular, trapezoidal, and Gaussian distribution curve.

**Figure 6.** Membership functions of (**a**) type-I FLS and (**b**) interval type-II FLS.

Frequently, these membership functions can be designed using knowledge from experts; nevertheless, other studies have used genetic algorithms [38,39] when the defining parameters is challenging. Consequently, type-I fuzzy sets might deal with more uncertainties than classical sets due to the definitions of membership functions.

Regarding the inference process, rules are implemented in order to map the fuzzy inputs to fuzzy outputs. These rules are expressed as IF–THEN statements. The "IF" part is composed of antecedents, which connect the fuzzy inputs, whereas the "THEN" part is composed of consequents, which connect the fuzzy outputs. Generally, the antecedents are defined using two basic operators: and; or. During the inference process, it is frequently the case that more than two rules are fired according to their strengths; therefore, an aggregation operation is performed to join the rules that were fired. These fired rules will define the fuzzy outputs. Finally, the fuzzy outputs are translated to crisp output values using a defuzzification method.

According to [40], type-I fuzzy controllers might face uncertainties in: (i) their control inputs due to noise affecting the sensors; (ii) their control outputs because of a change in the characteristics of the actuators; (iii) linguistic uncertainties because experts might have different meanings for the linguistic labels. Therefore, they implemented rules using the same antecedents and by changing the consequents. Interval type-II fuzzy controllers might cope with these uncertainties. Fuzzy inputs and fuzzy outputs of an interval type-II fuzzy controller are implemented using interval type-II fuzzy sets. These sets add a second level of membership functions; therefore, these interval type-II fuzzy sets have upper and lower membership functions (see Figure 6b). Additionally, a footprint of uncertainty can be seen in these sets, which is the area between the upper and lower membership functions.

Interval type-II fuzzy controllers perform fuzzification, inference, and defuzzification processes. Furthermore, the interval type-II fuzzy controllers execute the fuzzification and evaluation of rules, as the type-I fuzzy controllers do. The key differences between interval type-II and type-I fuzzy controllers are: (i) interval type-II fuzzy controllers use interval type-II fuzzy sets in the antecedents and consequents of the rules; and (ii) after evaluating the rules, interval type-II fuzzy controllers perform a reduction from type-II fuzzy sets to type-I fuzzy sets using a type-reducer. Once this reduction is carried out, a defuzzifier is applied to obtain the crisp output.

#### 5.4.1. Type-I Fuzzy Controller for Position and Force Control

Figure 7 presents the design of the type-I fuzzy controller. As can be seen, the controller works as follows:


**Figure 7.** Block diagram of the type-I fuzzy controller with the gripper.

Eight fuzzy input sets were designed and implemented using trapezoidal membership functions for transforming the angular positions and force readings into linguistic values. Focusing on force readings, four trapezoidal membership functions were implemented to transform *FR*(*t*) and *FL*(*t*) into the following four possible linguistic values: *Sensor X Of f* (*SXO*), *Sensor X So f t Touch* (*SXTS*), *Sensor X Touch Hard* (*SXTH*), and *Sensor X Squeeze* (*SXSQ*), where *X* is *R* for the right sensor and *L* for the left sensor.

Regarding the angular positions of the motor, four trapezoidal membership functions were implemented in order to transform *PosRω*(*t*) , *PosLω*(*t*) into the following linguistic values: *Very Open X Motor* (*VOXM*), *Open X Motor* (*OXM*), *Center X Motor* (*CXM*), and *Closed X Motor* (*CLXM*), where *X* is *R* for the right motor and *L* for the left motor.

On the other hand, in terms of fuzzy output sets, sixteen singleton membership functions were used for the two outputs of the fuzzy inference, eight for each finger. Moreover, it can be seen from Figure 8 that the linguistic labels are named *R*1, *R*2, *R*3, *R*4, *R*5, *R*6, *R*7, and *R*8 for the right finger and *L*1, *L*2, *L*3, *L*4, *L*5, *L*6, *L*7, and *L*8 for the left finger. These linguistic values are ordered from the lowest speed value to the highest possible speed value of the command motor.

**Figure 8.** Type I membership functions for the inputs *FR*(*t*) and *PosRω*(*t*) and for the output *MR*(*t*).

Table 1 presents the fifty fuzzy rules that were created using th fuzzy sets. The rules have the following structure: IF *FL* AND *FR* AND *PosLω*(*t*) AND *PosRω*(*t*), THEN *ML*(*t*), *MR*(*t*). As can be seen, these rules correspond to the type of Mamdani fuzzifier using minimum implication, i.e., the "AND" operator for connecting the antecedents.

In order to design the set of fuzzy rules, two basic scenarios were considered: the gripper has an object between its fingers, and there is no object between the gripper fingers (see Figure 9). In the first scenario, the gripper might be located at different positions with respect to the center of the object to be grasped. Specifically, there are three basic cases that the gripper might face in terms of the position of the object to be grasped:


**Figure 9.** Cases for the design of the fuzzy rules.


**Table 1.** Fuzzy rules for the type-I fuzzy controller.

On the other hand, in the second scenario, there are two cases:


Additionally, there are other cases in which the object might be located at a position which is between two cases (Table 1: rules 14–50); i.e., the object might be located at a position between the left region (case 1) and the center point (case 2).

It is important to note that during the evaluation of rules, several rules might be fired; therefore, "maximum aggregation" is applied in this case. Finally, to obtain the two speed values, *MR*(*t*), *ML*(*t*), "center of gravity" was used as the defuzzifaction technique.

#### 5.4.2. Type-II Fuzzy Controller for Position and Force Control

Figure 10 shows the interval type-II fuzzy input sets for the four inputs < *FR*(*t*), *FL*(*t*), *PosRω*(*t*), and *PosLω*(*t*) >; and interval type-II fuzzy output sets for the two outputs < *MR*(*t*) and *ML*(*t*) >. Both types of fuzzy sets are for the right finger. It can be seen that upper

membership functions were added to both types of fuzzy sets (fuzzy input sets and fuzzy output sets) defined in the type-I fuzzy controller.

**Figure 10.** Type II (upper and lower) membership functions for the inputs *FR*(*t*) and *PosRω*(*t*), and for the output *MR*(*t*).

The fuzzy rules for the type-II fuzzy controller are the same as those implemented for the type-I controller. These rules correspond to the Mamdani fuzzifier and employ minimum implications. However, the type-II fuzzy controller uses 50 fuzzy rules for the upper membership functions and 50 rules for the lower membership functions. Table 2 shows a summary of these rules. As mentioned earlier, a key difference between type-I and type-II controllers is that type-II uses a type-reducer to transform type-II fuzzy sets to type-I fuzzy sets.

The type-II fuzzy controller employs the Karnik–Mendel (KM) algorithm as the typereducer. The KM algorithm [41,42] computes the lower and upper intervals of the type-II fuzzy outputs for the left and right motors. For instance, the command for the left motor *Y<sup>i</sup>* (output) is composed of two intervals [*y<sup>i</sup>* , ¯ *yi* ], where *y<sup>i</sup>* is the lower left command motor, and ¯ *y<sup>i</sup>* is the upper left command motor. Therefore, the KM algorithm corresponding to the left motor works as follows:

	- (a) *y<sup>i</sup> <sup>l</sup>* is sorted in ascending order, where *<sup>y</sup><sup>i</sup> <sup>l</sup>* is the lower left motor.
	- (b) *yl* is computed as

$$y\_l = \frac{\sum\_{i=1}^{M} f\_l^i y\_l^i}{\sum\_{i=1}^{M} f\_l^i} \tag{4}$$

where *f <sup>i</sup> <sup>l</sup>* <sup>=</sup> ¯ *f <sup>i</sup>*+*f <sup>i</sup>* <sup>2</sup> and ¯ *f i* , *f <sup>i</sup>* are the firing intervals. Let *y <sup>l</sup>* = *yl* .


$$Y\_{final} = (\frac{y\_u + y\_l}{2}).\tag{5}$$

These steps are repeated using the lower and upper intervals for the right motor to compute the crisp value for the right motor.

**Table 2.** Fuzzy rules for interval type-II fuzzy controller. *F<sup>i</sup>* stands for lower membership, *F i* stands for upper membership, *ML*(*t*) is [*y<sup>i</sup> L* , *yi <sup>L</sup>*], and *MR*(*t*) is [*y<sup>i</sup> <sup>R</sup>*, *<sup>y</sup><sup>i</sup> R*].


#### **6. Results and Discussion**

The robotic manipulator performs a set of actions in order to grasp an object firmly. The configuration of sensors placed at the fingertips, i.e., a sensor surrounded by a foamy surface, permits the grasping of different objects with diverse shapes. For instance, Figure 11 shows the process of grasping a solid plastic box. The set of actions can be divided into three stages:


**Figure 11.** Sequence of actions to grasp an object during the experiments: (**a**–**e**) approaching and centering the object; (**f**–**j**) grasping and moving the object; (**k**,**l**) object's release.

Figure 12 presents the results from grasping a solid plastic box with a width of 6 cm. This action involved the three stages explained earlier. It is important to recall that the hybrid PID is composed of a position PID and a force PID; consequently, these two PIDs were switched on and off according to the action being performed. These changes are indicated at the bottom of Figure 12. Data were continuously collected for 18 s at a sampling rate of 100 Hz.

It can be seen that the controllers coped with two key control tasks to grasp an object firmly: positioning and force. Regarding the positioning, the three controllers computed position references for each finger of the gripper at stage 1. It can be noticed from Figure 12 that to locate the object at the middle of the gripper, type-I and type-II fuzzy controllers decreased their initial position reference values for the left finger and increased their initial position reference values for the right finger. The three controllers obtained 1.8 rads as the reference position for the left finger, whereas the three controllers obtained a position reference of 3.4 rads for the right finger. In terms of force, the force regulation reference was 60 mV for both fingers. This reference value avoided damage to the object and excessive forces being exerted by the gripper motors. It can be seen that the type-I controller achieved the reference value for the left finger faster than type-II and PID controllers. Moreover, type-II and PID controllers did not achieve the force reference for the right finger. It is important to remark that from 3.62 s to 6.073 s, the hybrid PID switched from a positioning control task to a force control task, whereas the fuzzy controllers performed both control tasks (positioning and force regulation) simultaneously.

**Figure 12.** Performance of each controller at each stage. In the above chart are the responses in positioning of motor left and motor right, ML and MR, respectively. In the below chart are the responses of sensor left and sensor right, SL and SR, respectively.

Focusing on stage 2, it can be seen that the three controllers achieved the reference values in terms of position for both fingers. However, regarding force, the three controllers achieved values for the left finger close to the reference, but only the type-I fuzzy controller obtained values for the right finger close to the reference. Additionally, the hybrid PID controller generated various oscillations during this stage. This fact might be related to the switching between the PIDs.

Particularly, as is shown in Figure 12, there are some spikes in the PID controller's force graph. These spikes correspond to a commutation between PID controllers from force to position. In other words, while the force controller was handling the object, it moved out of the centered position, making commutation necessary to center the object again. Therefore, force sensors measure the inverse force response in relation to the finger motion; i.e., when a spike in force is downward for the right finger, this spike is upward for the left finger, corresponding to an action to center the object.

Finally, the fuzzy controller achieved the reference values in terms of position for both fingers. Regarding the force, the fuzzy controllers obtained values for the left finger close to the reference. It is important to note that the hybrid PID controller generated oscillations at this stage.

For comparison purposes between the PID, fuzzy type-I and fuzzy type-II controllers, 20 experiments were carried out, obtaining the average values of: rise time *tr* and settling time *ts*. Table 3 shows the values of the rise time, the settling time, the setpoint, and the

type of response. In the same table, it can be observed that all responses were overdamping and the PID had the higher values of *tr* and *ts*, indicating poor performance.


**Table 3.** Performances *ts* and *tr* in a hybrid PID controller, and in type-I and type-II fuzzy controllers.

In order to perform a preliminary comparison of these controllers, the integral absolute error (IAE) and integral time of absolute error (ITAE) were computed. It can be seen from Table 4 that the type-I fuzzy controller obtained the lowest IAE and ITAE values, whereas the hybrid PID controller resulted in the highest IAE and ITAE values.

**Table 4.** Comparison among the hybrid PID controller, and the type-I and type-II fuzzy controllers.


#### **7. Conclusions**

In this paper, a hybrid position-force PID controller, a type-I fuzzy controller, and a type-II fuzzy controller were designed and implemented to grasp an object through a threefold arm with an independent two-finger gripper. From the results, it can be concluded that the hybrid PID controller generated oscillations in stages 2 and 3 of manipulating the object due to switching between the position PID and the force PID. On the other hand, few oscillations were generated over the three stages of object manipulation with the type-II fuzzy controller. Moreover, it was able to calculate values close to the references in terms of position for both fingers; but in terms of force, it was able to calculate the appropriate value only for the left finger. As for the type-I fuzzy controller, it obtained the best performance, because few oscillations were generated in the three stages of object manipulation, and its *tr* and *ts* values are practically the same. Despite the type-II fuzzy controller having a lower *tr* in the position controller, it had a higher *ts* in the force controller and it had more oscillations. The type-I fuzzy controller was able to calculate values close to the reference values in terms of position and force for both fingers.

**Author Contributions:** Conceptualization, S.H.-M., A.M.-H. and E.R.P.-H. Formal analysis, E.R.P.-H., E.J.R.-R. and H.V.-L. Investigation, S.H.-M. Supervision, A.M.-H. and E.R.P.-H. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare that they have no conflict of interest.

#### **References**


### *Article* **Haptic Teleoperation of Impact Hammers in Underground Mining**

**Mauricio Correa, Daniel Cárdenas, Diego Carvajal and Javier Ruiz-del-Solar \***

Advanced Mining Technology Center, Department of Electrical Engineering, Universidad de Chile, Santiago 8370451, Chile; mauricio.correa@amtc.cl (M.C.); daniel.cardenas@amtc.cl (D.C.); diego.carvajal@amtc.cl (D.C.)

**\*** Correspondence: jruizd@ing.uchile.cl

**Abstract:** Impact hammers are used to reduce the size of blasted ore in mining operations. In underground mines, tele-operated impact hammers are used to reduce the size of boulders placed on the orepass's grizzlies. An impact hammer consists of a hydraulic arm with 4 degrees of freedom, powered with a hydraulic impact hammer as an end-effector. The tele-operation of impact hammers is difficult due to the latency of communications, the poor visibility of the environment, and the used 2D interfaces. This may result in a collision with the hammer and the infrastructure, idle strokes, and non-optimal operation. To address these issues, this paper proposes the haptic tele-operation of impact hammers. The proposed haptic tele-operation system is based on a 3D model of the environment, which is used to estimate repulsion forces that are transferred to the operator via a haptic device, so that the hammer does not collide with the structures of the mine. The system also allows identifying the oversized boulders deposited on the grizzly and notifying the operator every time the orepass is blocked, as well as providing different 3D views of the environment. A proof of concept is presented using a scaled setup, where it is validated that the use of the proposed system allows for providing a better and more efficient tele-operation experience.

**Keywords:** impact hammers; industrial robotics; haptic teleoperation; underground mining

Carvajal, D.; Ruiz-del-Solar, J. Haptic Teleoperation of Impact Hammers in Underground Mining. *Appl. Sci.* **2022**, *12*, 1428. https://doi.org/10.3390/

Academic Editors: António Paulo Moreira, Pedro Neto and Félix Vidal

**Citation:** Correa, M.; Cárdenas, D.;

Received: 31 December 2021 Accepted: 24 January 2022 Published: 28 January 2022

app12031428

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

#### **1. Introduction**

Underground mines commonly use orepass systems as a safe and economic method to transport blasted ore, i.e., broken rocks, between production levels. In each orepass, a steel grate, called a grizzly, is used to control the size of the material entering the orepass. Material that is too large cannot pass through the grizzly and has to be broken using impact hammers. Impact hammers, also known as rock breakers, rock-breaking manipulators, rock-breaking hammers, or pedestal-mounted breaker booms, consist of an hydraulic arm with 4 degrees of freedom (rotation, lift, tilt, and breaker joints) powered with a hydraulic impact hammer as an end-effector. Figure 1 shows a diagram of the operating environment of an impact hammer. It shows a possible arrangement of sensors to obtain data from the material over the grizzly.

In modern mining facilities, impact hammers are usually tele-operated from a control room located in a safe place, away from the operation area. Under this schema, the operator actuates the electro-valves that control the hydraulic system using joysticks and buttons via a data network. To compensate for this lack of direct vision of the environment, cameras and video transmission equipment are often installed in the operation area. However, the data network and video stream encoding and decoding introduce a delay, or latency, between what the operator sees, and what is actually happening, which adds difficulties to teleoperation [1,2]. Due to the mentioned latency, the poor visibility of the environment, and the used 2D visualization interfaces, hammer tele-operation requires highly skilled operators that are trained to deal with these two phenomena. A typical operating environment can be observed in Figure 2. This figure shows a frontal loader dumping material onto the grizzly, and a tele-operated impact hammer fragmenting rocks.

**Figure 1.** A rock breaking hammer environment with sensors, which includes: A hydraulic rock breaker, a grizzly with material, and sensors pointing to the working area.

**Figure 2.** Real impact hammer in operation, while material is dumped onto the grizzly.

Normally an operator handles more than one impact hammer, and due to the intermittent nature of this operation, sometimes the operator does not notice immediately that an orepass is being obstructed. This often causes the fragmentation process to take longer than it should because it is more difficult to clear a stoked orepass than a few rocks over the ore pass's grizzly, generating a delay in the mine's production chain. In addition, the described, standard tele-operation of impact hammers does not allow the operator to properly perceive the actions of the impact hammer in the environment because its sensors are limited to visual cameras that provide a 2D representation of the environment. This issue may cause damage to the impact hammer, for example, when the tool is activated without being in contact with the rock, which causes idle strokes in the air, or when the hammer collides with the environment (e.g., with the grizzly). In order to address these drawbacks, assistive tele-operation technology can be used for: (i) Enhancing the operator's perception by adding other sensing modalities (e.g., range sensors), (ii) notifying the operator every time that an orepass is blocked to have a shorter reaction time, and (iii) giving haptic feedback to the operator to avoid collisions of the hammer's end-effector

with the surrounding environment. Using these improvements, the operator could make better decisions that result in a more precise operation, extended lifetime of the equipment, and higher throughput of the mineral transport system.

There is very little literature related to the automation or tele-operation of impact hammers. According to [3], previous works concerning the automation or modernization of impact hammers are few. First attempts to automate impact hammers date back to 1998 [4], and the first tele-operated impact hammer was reported in 2000 [5].

Most of the reported work addressing tele-operation of impact hammers uses 2D cameras, and in some cases time-of-flight cameras (TOF) or stereo cameras. According to [3], TOF's low resolution is insufficient for this task, and stereo cameras obtain a single view of the scene. To the best of our knowledge, there are no previous works using 3D LIDAR (Laser Imaging Detection and Ranging) in this application, nor works reporting the use of haptic devices for the tele-operation of impact hammers.

Regarding the autonomous operation of impact hammers, the reported results are very poor. In [3], an average success rate of 34% is obtained in the task of breaking rocks, which does not make it possible for the application of this technology in a real mining environment. For this reason, we believe that a good intermediate step to full-automation, which improves the current technology used for the tele-operation of impact hammers, is haptic tele-operation.

In this context, this paper proposes the haptic tele-operation of impact hammers for improving the efficiency of the fragmentation process. The proposed haptic teleoperation system is based on a 3D model of the environment, which is used to estimate repulsion forces that are transferred to the operator via a haptic control device, so that the hammer does not collide with the structures of the mine. The repulsion forces are a function of the distances between the hammer's end-effector and the environment. The 3D model of the environment is built using point clouds acquired using range sensors, and updated continuously.

The proposed system also allows identifying the oversized boulders deposited on the grizzly and notifying the operator every time the ore pass is blocked. Moreover, thanks to the use of 3D sensors and cameras placed on opposite sides of the hammer (see Figure 1), it is possible to show the operator the environment from different viewpoints and a 3D model of the rocks to be crushed. With this information, the operator can make better decisions.

We do believe that the use of point clouds for improving tele-operation in mining applications will increase in the following years, thanks to the popularization of 3D scanning sensors, and the availability of libraries for processing this data (e.g., [6,7]). Field applications, as the one described here, will use this haptic technology.

In other industries, the use of point clouds (obtained using range sensors) for obtaining haptic feedback in tele-operation applications is not new, although most reported papers describe systems that operate in controlled or semi-controlled environments, which is not the case for underground mining. In ref. [8], a method of haptic feedback in real time, using streaming point clouds from RGB-D cameras, without using contact sensors and without preprocessing the data, is proposed. The concept of virtual world is used, in which the simulated robot can move freely. In this virtual world, the tip of the virtual effector or HIP (haptic interface point) is related to the movement desired by the user. This effector, in turn, is wrapped in a zone of influence called proxy, which is forbidden to interact with the point cloud. Therefore, if the effector tries to enter the point cloud (or be in a position very close to it), a difference between the position of the virtual effector or HIP with the center of the proxy will be produced. This difference in positions will result in a feedback force vector, which will cause the effector's motion to change direction. The work described in ref. [9] is based on the described feedback system, and introduces the concept of delay in tele-operation, where the motion is first performed in virtual space and then executed in the real world. This implementation and testing of the system were performed by teleoperation of a KUKA industrial robot, where the virtual world was created using point clouds obtained with an RGB-D camera. This work confirms that haptic feedbacks

increase the operator's accuracy by reducing the errors in the desired trajectory of the end effector, decreasing the collisions with the environment, and reducing the operation time. In ref. [10], a haptic system based on distances measured using a stereo camera, is evaluated on a robotic arm used to transport an object without colliding with obstacles. In ref. [11], a 3D virtual environment is recreated by a Kinect sensor used to calculate haptic forces based on potential fields. The paper evaluates the path generated when operating a robot arm with 7 degrees of freedom.

This paper is organized as follows. In Section 2, the proposed haptic tele-operation of impact hammers is described. Section 3 presents results, both in simulation and in reality, which are discussed in Section 4. Finally, in Section 5, conclusions of this work are drawn.

#### **2. Haptic Teleoperation of Impact Hammers**

#### *2.1. Overall System Architecture*

The proposed haptic tele-operation system is composed of eight sub-modules (see Figure 3). The *Point Cloud Server* acquires and merges the data from the two LIDARs, and segments it in three different point clouds, each one corresponding to measurements associated to reflections on the infrastructure, hammer, and material. These point clouds, and the acquired images, are sent to the *User Interface* where the different visualization are generated for the user. The *Collision Point Cloud Generation* module receives the point clouds and generates a so-called *collision point cloud*, which represents the points in the space to be avoided during tele-operation. The *Haptic Feedback Calculation* module estimates possible future collisions using the collision point cloud, the current commands on the end-effector (*ueef*), and the state of the hammer's joints (*x*), and it generates haptic feedback information (*f*) for the haptic controller and haptic control interface. The *Haptic Controller* receives this feedback and generates the haptic force to be applied on the *Haptic Device* to alert the user of the limitation of the commands. At the same time, the *Haptic Control Interface* receives the control command on the end effector, which is the result of the operator force on the *Haptic Device* and the haptic force on the joystick, and the haptic feedback to limit the commands in case of any risky situation, for example, when the operator in spite of the haptic feedback decides to perform a risky task. Finally, these commands are converted from the Cartesian space (*ueef*) to the joint space (*ujoints*) using the *Inverse Kinematics* of the machine to handle the articulation actuators directly.

#### *2.2. Sensors*

Sensors are located above the grizzly, on opposite sides, to generate complementary points of view of the working space (see Figure 1), and to provide a better representation of the material over the grizzly when both views are combined. In each side, a 3D LIDAR and camera are used. These sensors provide a streaming of point clouds and images. Point clouds are used to build a 3D model of the environment, and images are used for visualization purposes.

#### *2.3. Point Cloud Server*

This module first merges the point clouds acquired by the different 3D LIDARs into a single cloud. This point cloud is down sampled to generate a voxel grid representation of the space. Then, all the voxel grid points are separated in three different categories (see Figure 4):


3. Material points that correspond to any other point detected, which should correspond to the material over the grizzly.

**Figure 3.** Overall view of the haptic tele-operation framework.

**Figure 4.** Point cloud categorization. Environment points (in gray) and material points (in red). Hammer points in blue, showed over the end-effector model (in yellow).

During the system's initialization, the environment model is generated by successively integrating point clouds from the environment, without considering the ones corresponding to material or the impact hammer. This results in a very dense point cloud that is first filtered for deleting isolated points that could be generated by noise on the sensors. Once the noise has been eliminated, the cloud is voxelized in 3D square voxels, reducing the number of point and maintaining the occupancy information of the environment in the space. Finally, this cloud is saved as the environment model. Figure 5 shows the main modules of the described process. Sensors 1 and 2 correspond to the 3D LIDARs used to capture the model of the environment through point clouds. The *Point Cloud Integration* module corresponds to the process of joining the point clouds of both sensors and performing the task of integrating these successive point clouds over time. The *Noise Filtering* module is responsible for removing noise. The *Voxelization & Occupancy Grid* module is in charge of voxelizing the point cloud in order to eliminate repeated points and to show the occupancy of the space. And finally, the *Save Model in Memory* module stores the resulting point cloud in memory for later use.

**Figure 5.** Processes to obtain the point cloud of the environment.

During operation, the environment model is used for determining the points that belong to the material model and hammer model. In each iteration, each new point cloud is voxelized and contrasted with the point cloud of the environment model. This comparison allows to subtract the points belonging to the environment. Once subtracted from the environment points, the resulting point cloud contain the hammer-points and materialpoints. Hence, to obtain the material-points, a new filtering process should be performed to subtract the hammer-points. This is done by first estimating the spatial region where the hammer is located, which is obtained using the current state of the hammer's joints and the kinematic model of the hammer. This way, all those points within this space will be classified as belonging to the hammer and therefore eliminated. Figure 6 shows the process used to obtain the point cloud of the material. Sensors 1 and 2 correspond to the 3D LIDARs used to capture the working environment. The *Point Cloud Concatenation* module is in charge of joining the point clouds coming from both sensors. The *Voxelization* module is in charge of voxelizing the point cloud resulting from the union of both sensors, thus eliminating repeated points. The *Environment Model Points* corresponds to the point cloud stored in memory, which only contains points corresponding to the grizzly. The *Subtraction of Environment Points* module compares the current point cloud with the point cloud stored in memory, and removes the points that are found in both models. Finally, the *Material Points* module is in charge of publishing the points resulting from the previous module, which correspond to the material on the grizzly.

**Figure 6.** Processes to obtain the Point Cloud of the material.

#### *2.4. Collision Point Cloud Generation*

The generation of a collision point cloud has the goal of using it for avoiding collisions between the impact hammer and surrounding environment.

The collision point cloud always includes the environment point cloud, and the material cloud, depending on if the user wants to avoid the material or not. For instance, when the hammer's end-effector needs to be placed in the rock to be broken, the user chooses not to include the material cloud. This is done by just pressing a button on the haptic device.

The generation of collision points starts with a dilation operation over the environment point cloud, and applies over the material point cloud, which is implemented in each case using as structuring element cube of 3 × 3 × 3 points. The dilation of the input point cloud by the cube of 3 × 3 × 3 (which corresponds to a cube of 27 points), can be understood as the locus of the points covered by the cube of 3 × 3 × 3 point when the center of it, moves within the input point cloud. The dilation operation is defined as follows:

$$A \bigoplus B = \cup\_{b \in B} A\_b \tag{1}$$

where *A* is the initial point cloud, and *B* is the structuring element (cube 3 × 3 × 3).

This means that for each initial point, 26 extra points are created, which will surround the initial point (27 points in total). Figure 7 shows the dilation operation, and the result of this operation on a single voxelized point.

**Figure 7.** (**A**): Single voxelized point; (**B**): 3 × 3 × 3 structuring element with origin in central point; and (**C**): result of dilatation operation.

At the end of the morphological dilation, a new point cloud is obtained, with multiple repeated points because the implemented dilation algorithm does not check if there is already a point in the space where the new points are generated. To eliminate this redundant data and to increase the algorithm's performance, the resulting cloud is voxelized again. Figure 8 shows the process to obtain the collision points.

**Figure 8.** Processes to obtain the collision point cloud.

As mentioned, the collision point cloud is composed of two parts. The first part will come from the processing of the point cloud belonging to the environment model. As this model is static and was previously calculated in a previous module, it is only processed once to generate the cloud of collision points belonging to the environment. The second part will come from the processing of the point cloud belonging to the material. As this point cloud is dynamic (it is expected to change frequently), the process of obtaining the collision point cloud is performed in each execution cycle. In this case, for performance reasons, the dilatation operation is performed once per cycle.

In the case of requiring to generate a thicker collision point cloud, several dilation iterations should be performed in the same cycle. This would result in a larger safety distance between the real object and end effector. Figure 9 shows a comparison between the point cloud of the generated grizzly model and the point cloud resulting from the dilation process.

**Figure 9.** (**a**) The original collision cloud generated with the environment points. (**b**) The resulting collision point cloud after applying the dilation process.

Therefore, the resulting collision cloud will be the concatenation of points from the static processing (performed only once at the beginning of the program) of the environment model, with that of the dynamic processing of the material (performed in each cycle). This point cloud will result in a more homogeneous cloud, filling the empty spaces due to the imperfection of the sensors.

It must be emphasized that at any moment, the user can choose using just the environment model for the generation of the collision point cloud, not including the point clouds associated to the material. This is normally made just before placing the hammer on the rock to be broken. The user makes the selection of this operation mode by just pressing a button in the haptic device.

#### *2.5. Haptic Feedback Calculation*

This module generates the haptic feedback that will restrict the movement of the joystick axes and the movement of the impact hammer. For this purpose, a so-called *virtual collision region* is generated around the end effector, which represents the space of the possible future positions of the end-effector, considering the direction in which it moves. The intersection of the virtual collision region with the collision point cloud will indicate possible collisions of the end-effector with the environment.

For simplicity, in this work two different geometries are used for representing the virtual collision region: A rectangular cuboid and a semi-sphere, which are positioned at the end-effector point that is located on the tip of the chisel. Figure 10 shows these geometries, which are oriented with respect to the direction of movement of the end-effector. In this simulation, the yellow arrow corresponds to the direction of motion of the end-effector.

**Figure 10.** The yellow arrow shows the direction that the end-effector is moving. The yellow cuboid and green half-sphere are oriented according to the direction of motion of the effector.

As mentioned, the intersection of the virtual collision region with the collision point cloud will indicate how close is the effector to collide if the current direction and velocity of movement are maintained. The orientation of the virtual collision region has a direct relation to the direction of displacement of the end-effector, and the region it represents indicates the possible future positions in which the effector will find itself in future time steps.

The length of the cuboid is selected as the maximum distance that the effector will move before stopping (inertia of the system). The height and width of the cuboid are slightly larger than the height and width of the end-effector. These values make it possible to distinguish when the end-effector is in an apparent collision state with the collision point clouds, and therefore perform subsequent actions on the effector to avoid the collision. However, it could happen that a sudden movement in the actuation of the controls would produce an abrupt change on the movement direction of the end-effector, resulting in a collision that cannot be detected by the cuboid. To avoid this situation, the semi-sphere region will act as a safety region, detecting these particular cases, where the effector is not in a collision direction, but very close to the objects.

Thus, depending on the positions of the virtual collision region with respect to the collision point cloud, three different cases can be identified:


**Figure 11.** (**a**) Rectangular cuboid and semi-sphere regions without intersection with a collision point cloud. (**b**) Rectangular cuboid intersects the collision point cloud. (**c**) Semi-sphere region intersects the collision point cloud and the cuboid does not. The yellow rectangle represents the cuboid. The green semi-circle represents the semi-sphere. The white arrow corresponds to the hammer direction.

The output of haptic feedback (*f*) has three values: A Boolean indicating if the cuboid intersects the collision point cloud or not, a Boolean value indicating if the semi-sphere intersects the collision point cloud or not, and a normalized value that indicates a level of safety of executing a command.

The level of safety parameter (*S*) is calculated using the distance from the end-effector to the collision point cloud (*DistEP*), as the percentage of the cuboid region lying outside the intersection with the collision point cloud. To calculate this percentage, a max distance (*DistMAX*) is predefined. This distance is considered as a deceleration distance by the end effector when it moves with the maximal allowable speed and it stops. Thus, *S* is calculated as:

$$S = \min\left\{\frac{Dist\_{EP}}{Dist\_{Max}}, 1\right\}.\tag{2}$$

#### *2.6. Haptic Control Interface*

This module limits the end-effector's actuation orders. Depending on the distances between the virtual collision region and collision point cloud, the maximum allowed speed will change, so that the effector does not collide and is not damaged. According to the cases defined in the former sub section:


$$
u\_{\rm cf\,\,limitel} = \boldsymbol{u}\_{\rm cf} \cdot \mathbf{S}^2.\tag{3}$$

The use of the square of *S* is inspired by the fact that the kinetic energy depends on the square of the speed.

Thus, the magnitude of the commands on the end-effector will decrease its maximum allowable speed as it gets closer to the collision point-cloud, arriving with velocity tending to zero as it touches the collision point cloud.

3. Semi-sphere region intersects the collision point cloud and the cuboid does not. In this case the end effector speed is limited to 50% of the maximum speed. In other words, if a higher effector speed is attempted under this condition, the system will automatically limit the speed. In case the effector movement is attempted at a lower speed, the system will not make any adjustment to the speed.

#### *2.7. Inverse Kinematics*

This module adapts desired velocities on the end-effector to the corresponding joint velocities. To do this task, the Jacobian pseudo inverse of the kinematic model is calculated using the current state of the machine to perform this conversion as:

$$
u\_{joints} = J\_{psundo\ in\ uv} \cdot \mathbf{u}\_{ref} \tag{4}$$

where *ueef* corresponds to a vector with the velocities on some axis of the Cartesian space, {x, y, z, pitch} in this case; *ujoints* corresponds to the speed on each articulation on the machine {q0, q1, q2, q3} related to its respective joint, and *Jpseudo inv* corresponds to the pseudo-inverse of the Jacobian of the kinematic model of the arm, which is shown in Figure 12.

**Figure 12.** Simulated kinematic model of the impact hammer. J1, J2, J3, and J4 correspond to the hammer joints. The *X*-axis is shown in red, the *Y*-axis in green, and the *Z*-axis in blue.

The pseudo inverse method is implemented to reduce the dimension of the Jacobian, which is originally declared in Cartesian space {x, y, z, roll, pitch, and yaw}. The kinematic model of impact hammer is generated by a kinematic tool (*moveit*) and allows for simplifying the number of the states on the Cartesian space to {x, y, z, pith}.

#### *2.8. Haptic Controller*

The haptic control has the task of alerting the user of possible collisions using the haptic device. The form of warning that has been implemented is to restrict the range of movement of haptic device (control axes). In this way, as the effector gets closer to collision, the restriction to the movement of the axes will be greater, exerting a counter force to the user if they want to pass the calculated limits.

Therefore, the restriction of movements to the axes of the haptic device will allow on the one hand to alert the user of possible collisions (and how close the hammer is to collision), and on the other hand, it will be able to slow down the hammer by limiting the possible movement actions.

In the same way as the haptic control interface, the limitation sent to the haptic controller depends on three states of haptic operation:

Rectangular cuboid and semi-sphere regions without intersection with the collision point cloud: The haptic control does not exert any counteracting force to the operator's actuation. In this case, the user will not feel any restriction on the movement of any of the control axes;

4. Rectangular cuboid intersects the collision point cloud: In this case the movement of each axis is limited, depending on the remaining distance to the object to collide. The axel movement limitation is:

$$A\mathbf{x}\mathbf{i}\_{\text{maxprosible pose}} = D\mathbf{i} \mathbf{i}\_{\text{objec}} \cdot \left(A\mathbf{x}\mathbf{i}\_{\text{maxprosce}} - A\mathbf{x}\mathbf{i}\_{\text{initpros}}\right) + A\mathbf{x}\mathbf{i}\_{\text{initpros}}.\tag{5}$$

Thus, as the effector approaches the object to collide (distance in range {0,1}), the control will limit the movement of its axes, opposing the user's action. In case the user operates under the calculated limit, he/she will not feel the movement limitations of the control.

5. Semi-sphere region intersects the collision point cloud and the cuboid does not. When this condition is activated, the movement is limited to 50%, which means that the operator will feel that the control only allows them to move the axes, normally the first half in each axis. When trying to exceed the second half, the haptic control will exert a repulsive force opposite to the operator's direction.

#### *2.9. User Interface*

The user interface is used to provide the operator with multiple perspectives of the grid material. Here the operator can move the point cloud presented to him, and adjust the view to their liking, in order to improve the perception of the objects within the hammer workspace. The visual output of this interface is in the form of a visualization of the point clouds, previously processed by the modules described above. Figure 13 shows the interface implemented in this project. The interface shows the model of the UR5 robot, which can be rotated as desired by the user of the interface, and the cloud of collision points (in yellow). This interface includes visual signals to inform the operator of the current operation conditions.

**Figure 13.** User interface. The yellow shows the visual output of the processing of the point clouds obtained by the sensors.

#### **3. Results**

#### *3.1. Experimental Setup*

As a proof of concept, we built a realistic laboratory setup. The testing environment used for validation consists of a real, scaled grizzly with real rocks. The scales of both elements are in relation of 1:3. For the impact hammer, an UR5 robotic arm was used with a scale of 1:5, just as a real impact hammer. For having a similar kinematic model rather than a real rock breaker, two articulations of the UR5 were fixed, and the other four were tele-operated (see Figure 14). The UR5 robot was mounted, related to the grizzly, in the configuration used in a real mining operation. Figure 15 shows how the UR5 robot was installed for the experimental tests. Next to it, there is a metallic test grizzly to scale, with material (rocks) on it.

**Figure 14.** Articulated model of the adapted UR5. The green joints (J1, J2, J3, and J4) correspond to the articulated joints. The red (J5 and J6) correspond to the fixed joint of the kinematic model.

**Figure 15.** Experimental operation of the rock breaker set-up.

Two mounting-bases, each one containing a camera and a 3D LIDAR, are placed around the operation environment. This allows for the sensors to be located above the operational set-up, pointing to the grizzly and covering it completely (see Figure 16). In this work, Livox model MID-40 3D LIDARS and Arecont model AV5225 PMIR visible spectrum cameras are used.

**Figure 16.** Experimental set-up.

To test the haptic part, 2 Brunner CLS-E controls were used, which allowed up to four joints to be controlled simultaneously (each joystick has 2 degrees of freedom). For convenience, these were installed on a chair used for the tele-operation. Figure 17 shows an operator using the UR5 robot, the Brunner haptic controls (installed on a chair), and the user interface.

**Figure 17.** User operating the UR5 arm, using the haptic controls (installed on a chair), and the user interface.

#### *3.2. System Configurations to Be Tested*

The main purpose of the reported experiments is to validate that the haptic system facilitates the control of the impact hammer to perform the task of positioning the tip of the end-effector above the rock, without colliding. That means that it improves the tele-operation experience.

Several tests are recorded doing this task with different configurations of the haptic system in order to study the impact of this feature. In particular, three different configurations of the system are tested. The configuration combines a different feature of the haptic system, which are:


Finally, these features are combined in different configurations, A to D, which are defined in Table 1.


**Table 1.** Types of system's configurations used in tests.

#### *3.3. Tasks*

Predefined tasks are elaborated to generate a set of tests that can be replicated by different operators. This way, a predefined task contains the following information:


The experiment consists of generating the predefined environment test and giving the operator a target position. The operator has to move the arm using the different configurations of the system. An example of a pre-defined task is shown in Figure 18.

**Figure 18.** A pre-defined experimental test (Task 1): The arm tip is located out of the grill. The avoiding environment (the point cloud in boxes) consists of a single rock over the grill. The green circle corresponds to the target goal.

#### *3.4. Experimental Results*

Each configuration was evaluated using two different tasks, executed by two different non-expert operators. In total, 20 trials were carried out. The obtained results are shown in Table 2.



A task is considered successful if the operator was able move the end effector to the target position with a precision of at least 0.015 [m], without colliding with the grizzly or rocks. If the operator cannot reach the target position, the task is considered a failure.

Table 2 shows the average success rate, representing the percentage of successful tasks. The average reaching time shows how many seconds it takes for the operator to reach the target. The average tip path length represents the distance in meters that the tip of the end-effector has to move to reach the desired target position. It can be observed that the success rate increases as more features are integrated into the system. In Task 1 conf. A, the operators were not able to achieve the target point without colliding with the rocks. The integration of more features allows the operator to complete the task and reduce collisions. Configuration D allows complete completion of the task without collisions because the haptic system does not allow commands that generate collisions to be executed.

In order to illustrate how the system operates, Figure 19 shows the trajectories of the tip for a different configuration when executing task 2. It can be seen that the trajectory becomes simpler as more features of the system are added.

**Figure 19.** Different hammer tip trajectory paths of task 2 (same operator). The path is on a cyan line. (**a**) Conf. A (user controls each joint of the arm). (**b**) Conf. B (user controls the pose of the end-effector). (**c**) Conf. C (user controls the pose of the end-effector and 3D reconstruction of the environment) (**d**) Conf. D (user controls the pose of the end-effector, 3D reconstruction of the environment, and haptic feedback).

#### **4. Discussion**

The reported experiments show that the developed haptic tele-operation system allows for improving the tele-operation experience of the user. The processing of the different point clouds acquired using 3D LIDARs manages to correctly separate the rocks from the grizzly. The haptic controller and control interface manage to provide enough force feedback to the operator to get their attention, so that they can perceive elements close to the end-effector, thus increasing the perception of the environment in the operating area. In addition, the user interface improves the visual perception of the elements, improving the control over the robot, even if the haptic feedback part is not used.

The results depicted in Table 2 allow for comparing the different configurations, and in this way the different features that the proposed haptic tele-operation system includes.

When comparing configurations, A and B, a significant improvement of the performance of the task is observed when the end-effector controller is used (configuration B). The main reason for this, is that controlling an arm by actuating directly the joints is a difficult task that requires much preparation and practice. This way, controlling the end-effector makes the operation more affordable for new users.

When comparing configurations B and C, it can be observed a slight improvement in the case of using a 3D model of the environment (configuration C). The reason for this is that in case B, the operator cannot change the perspective, and therefore, in some cases it is not possible to have a proper visualization to avoid colliding. When the operator uses configuration C, they have the possibility of changing the viewpoint and in this way avoid collisions.

Finally, when comparing configurations C and D, a more notorious performance improvement is detected in the case of using haptic feedback. The reason for that is that the operator does not have to worry about colliding with the environment or rocks. This produces that the operator is able to increase the speed, and can move the arm without worrying about collisions.

By comparing the different configurations using the hammer tip trajectories (see Figure 19), it is possible to visualize the effect of each configuration on the end effector path. In this aspect, configuration A and B generate a larger deviation when approaching the target, due to the poor perspective that make it more comfortable to first raise the end-effector and then approaching the target from a higher position in order to get a better perspective. In contrast, configurations C and D generate a path that approach the endeffector directly to the target, as they are able to manage the perspective, but then finish with different maneuvers of positioning. In particular, configuration D presents a slight rise over the rock when approaching as a result of haptic feedback that allow the operator to perform fine maneuvers more easily, unlike configuration C which tends to perform wider maneuvers at the final stage of the positioning process.

Similar to [9–11], the results show that using a haptic system improves the performance and safety of performing a certain task using a robot manipulator. In addition to analyzing the number of collisions, our study evaluates the level of precision of the task being executed, and this type of consideration is not studied in other works and it is relevant for the context of this type of task in particular, which is positioning the chisel for breaking rock. In addition, our system uses the virtual environment to generate multi-perceptive views to aid in performing a task. This feature is not analyzed in the other studies describing haptic tele-operation based on point-clouds [9–11].

It should also be noted that although the development is intended for industrial environments with a real impact hammer, a small-scale robotic arm was used for the laboratory tests, which implies different behaviors due to the different construction of these elements (electric vs. hydraulic). This may imply that for final implementation, major adjustments may be needed for the control and haptic feedback modules of the system.

#### **5. Conclusions**

This paper addressed the haptic tele-operation of impact hammers. The proposed haptic tele-operation system is based on a 3D model of the environment, which is used to estimate repulsion forces that are transferred to the operator via a haptic device, so that the hammer does not collide with the structures of the mine. The system also allows for identifying the oversized boulders deposited on the grizzly, notifying the operator every time the orepass is blocked, as well as providing them with different 3D views of the environment.

A proof of concept was presented using a scaled setup. The reported experiments show that the use of a 3D model of the environment and the use of haptic feedback improves the tele-operation experience. When using the proposed system, operators were able to increase the success rate of the tele-operation task, and at the same time to reduce the completion time and the length of the path that the end-effector must follow.

As part of our future work, we will validate the tele-operation system using a real impact hammer operating in a real production environment of an underground mine. In that environment, it is expected that several aspects of the operation will emerge which are not reproducible or were not considered in our laboratory environment. The analysis of these new experiments allows for improving and adapting the haptic feedback algorithm to be applied in production systems. In addition, the system will be validated with expert rock breaker operators in order to be able to quantify the real operational impact of its use. Given that a professional operator has a higher level of expertise in controlling the hammer with the current limitations of traditional systems, the performance improvement could be different in relation to a non-expert operator.

Further improvement will focus on the use of haptic feedback to develop assistive tele-operation to guide the operator to follow optimal trajectories. Assistive tele-operation requires defining hammer tip trajectories for successful positioning and providing haptic feedback when the operator uses commands that move the hammer tip away from these trajectories.

**Author Contributions:** Conceptualization, J.R.-d.-S. and M.C.; methodology, J.R.-d.-S., D.C. (Daniel Cárdenas), D.C. (Diego Carvajal) and M.C.; software, D.C. (Daniel Cárdenas) and D.C. (Diego Carvajal); validation, D.C. (Daniel Cárdenas) and D.C. (Diego Carvajal); formal analysis, J.R.-d.-S., D.C. (Daniel Cárdenas), D.C. (Diego Carvajal) and M.C.; investigation, J.R.-d.-S., D.C. (Daniel Cárdenas), D.C. (Diego Carvajal) and M.C.; resources, J.R.-d.-S. and M.C.; data curation, D.C. (Daniel Cárdenas) and D.C. (Diego Carvajal); writing—review and editing, J.R.-d.-S., D.C. (Daniel Cárdenas), D.C. (Diego Carvajal) and M.C.; supervision, J.R.-d.-S.; project administration, M.C.; funding acquisition, J.R.-d.-S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Chilean National Research Agency ANID under project grant Basal AFB180004 and FONDECYT 1201170.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data presented in this study is contained in the article itself.

**Acknowledgments:** We thank Isao Parra, Mauricio Mascaro, Francisco Leiva, and Patricio Loncomilla for their valuable discussions and support.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **A Multiorder Attentional Spatial Interactive Convolutional Neural Network (MoAS-CNN) for Low-Resolution Haptic Recognition**

**Kailin Wen 1,2, Jie Chu 1,\*, Yu Chen 1,3, Dong Liang 1,3, Chengkai Zhang <sup>2</sup> and Jueping Cai 1,\***


**\*** Correspondence: chujie@xidian.edu.cn (J.C.); jpcai@mail.xidian.edu.cn (J.C.)

**Abstract:** In haptic recognition, pressure information is usually represented as an image, and then used for feature extraction and classification. Deep learning that processes haptic information in end-to-end manner has attracted attention. This study proposes a multiorder attentional spatial interactive convolutional neural network (MoAS-CNN) for haptic recognition. The asymmetric dual-stream all convolutional neural network with integrated channel attention module is applied for automatic first-order feature extraction. Later on, the spatial interactive features based on the overall feature map are computed to improve the second-order description capability. Finally, the multiorder features are summed to improve the feature utilization efficiency. To validate the MoAS-CNN, we construct a haptic acquisition platform based on three-scale pressure arrays and collect haptic letter-shape (A–Z) datasets with complex contours. The recognition accuracies are 95.73% for 16 × 16, 98.37% for 20 × 20 and 98.65% for 32 × 32, which significantly exceeds the traditional firstand second-order CNNs and local SIFT feature.

**Keywords:** haptic recognition; convolutional neural network; channel attention; spatial interactive second-order feature; multiorder feature

#### **1. Introduction**

Haptic interpretation is an important component of perception, providing real-time feedback on changes in the external environment, and has been widely used in practical applications, such as smart machines, wearable devices, and human–computer interaction [1,2]. The essence of haptic recognition is the recharacterization and feature extraction of pressure information to obtain multiple properties of the contact object, which is the basis for subsequent actions such as grasping, manipulating, and moving. The generic method is to consider the haptic information as a two-dimensional image. Therefore, visual data-based approaches are introduced for understanding haptic information. SIFT, SURF, chain code and other descriptors combined with clustering are applied for haptic recognition [3,4]. Pohtongkam et al. applied the BoW technique using SIFT for feature extraction and k-nearest neighbors (KNN) for evaluation to achieve object recognition by a tactile glove [5]. These local feature methods require manual feature design for specific task and are labor-intensive. Recent research has proved that convolutional neural networks (CNNs) are suitable for two-dimensional information and can automatically learn features and recognize objects [6,7]. Therefore, the haptic information can be ported to the CNN for automatic recognition. Gandarias et al. proposed classical transfer CNN models combined with transfer learning to identify large items [8]. Polic et al. designed a CNN encoder structure to reduce the dimensionality of the optical-based tactile sensor image output [9]. Cao et al. trained an end-to-end CNN for tactile recognition using residual orthogonal tiling and pyramid convolution ensemble [10].

Although the existing CNN methods can achieve better recognition performance than traditional artificial features, there are still the following challenges. Firstly, the size and pixels'

**Citation:** Wen, K.; Chu, J.; Chen, Y.; Liang, D.; Zhang, C.; Cai, J. A Multiorder Attentional Spatial Interactive Convolutional Neural Network (MoAS-CNN) for Low-Resolution Haptic Recognition. *Appl. Sci.* **2022**, *12*, 12715. https://doi.org/10.3390/ app122412715

Academic Editor: Rocco Furferi

Received: 30 September 2022 Accepted: 29 November 2022 Published: 12 December 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

amount of haptic image depend on the density and area of the pressure sensor, and thus they are at least two orders of magnitude lower than visual RGB images [11,12]. Current mainstream CNNs are constructed for high-resolution visual images. The nonlinear fitting capability is enhanced by deepening the network [13]. However, the low-resolution haptic image limits the configured network depth, so the features extracted by shallow CNN are insufficient. Secondly, the inevitable nonideal effects of the sensing element itself reduce the accuracy of the original information mapping. Due to the flexible requirements and complex manufacturing process of the sensing elements, the pseudo-outputs caused by elastic coupling and restricted response range are inevitable, resulting in blurred pressure images [14]. In addition, the fineness of the haptic image relies on the sensitivity and response range of the sensor, which is much lower than that of the visual image. Different objects show similar shapes, so more distinguishable features are needed. The traditional shallow CNNs are not adequate in feature extraction capability and feature utilization efficiency.

To address the low-resolution and blur problem in haptic perception, a multiorder attentional spatial interactive convolutional neural network (MoAS-CNN) is proposed and a pressure information acquisition platform is built (flowchart of the overall framework is shown in Figure 1). The former improves the nonlinear fitting capability of the network by deepening the network and adding channel responses to enhance the representation of first-order high-level features; introducing spatial interactive second-order features to enhance the representation of edges and fusing multiorder features to enhance the efficiency of feature utilization. The latter validates the proposed method by constructing haptic letter shapes with complex edges.

**Figure 1.** Flowchart of the overall framework.

The remainder of the paper is organized as follows. Section 2 discusses the nonideal characteristics about haptic images and the inadequacy of shallow CNN. In Section 3, the MoAS-CNN is proposed and each part is described. The Section 4 is dedicated to validate the proposed method using a three-scale sensor array and comparison experiments with other first-order and second-order CNNs. The discussion and conclusion are in Sections 5 and 6.

#### **2. Problem of Insufficient Shallow CNN for Haptic Images**

The original haptic information mapping of the sensing elements plays a crucial role in recognition. However, the following challenges remain. The density and area limitations as well as elastic coupling cannot be avoided due to the complex process and adhesion requirements of the sensor. In addition, the depth of field of an image depends on the sensitivity and response range of the sensing element. The resulting pixel and mapping quality of haptic images are both lower than those of visual images. The letter shape has complicated edges, which can sufficiently illustrate the issue. As shown in Figure 2a, the haptic images of 26 letter shapes are acquired by a 32 × 32 array sensor. It can be observed that the haptic images are low-resolution accompanied by blurring. Different categories show similar shapes, such as "V" and "Y," "O" and "D." The haptic image has only one channel and is grayscale, so the histogram is used as a quantitative measurement. As shown in Figure 2b, the grayscale distribution of different categories is in statistics, and the histograms of different letter shapes are easily confused. To further quantify the similarity, the Bhattacharyya coefficient is used as the proxy between each of the 26 letter shapes and is calculated as:

$$Cor(I\_{A\prime}, I\_B) = \sum\_{i=1}^{n} \sqrt{I\_B(\mathbf{x}, \mathbf{y}) I\_A(\mathbf{x}, \mathbf{y})} \tag{1}$$

where *I* (*x*, *y*) denotes the gray value of the haptic image at (*x*, *y*). A large positive number indicates a strong correlation between different pressure images and potential confusion. As shown in Figure 2c, different haptic shapes show strong positive interactions. Most of the correlation coefficients are in the range of 0.7–0.85, with certain categories reaching above 0.9, including K and X, G and Q, and O and U.

**Figure 2.** Nonideal effects of haptic images. (**a**)The letter-shape samples obtained by a 32 × 32 sensor array, (**b**) the grayscale distribution histograms of the samples, (**c**) the Bhattacharyya coefficient of the 26 letter shapes.

The current mainstream CNNs are designed for visual images that improve feature fitting by deepening the network. However, these nonideal effects of haptic images lead to a reduction in the differences between categories, so more distinguishable features are desired. Nevertheless, constrained by the sensor density and integrated area, the pixels of a haptic image are usually at 102 (RGB is above 104), making it impossible to improve nonlinear fitting by deepening the network, and only shallow networks can be configured [10,11]. Therefore, shallow CNNs with more powerful fitting ability need to be explored.

#### **3. Haptic Recognition Method**

#### *3.1. MoAS-CNN Framework*

Aiming to improve the feature description and utilization efficiency of the network, a shallow MoAS-CNN is constructed. Figure 3a illustrates the proposed structure, where spatial interactive-based second-order features enhance the nonlinear response of the network and a hybrid strategy of summing up the multiorder features makes the extracted features fully utilized. It consists of three parts: first-order feature extractor, second-order feature generation, and multiorder feature hybridization.

**Figure 3.** The MoAS-CNN framework. (**a**) The overall framework, (**b**) the squeeze–excitation channel attention module, (**c**) the spatial interactive feature generation and multiorder summation.

In particular, for low-resolution pressure inputs, the samples contain limited information and additional dimensionality reduction of the features is not expected. In our case, the pooling layers are removed and only the convolutional layers are retained to reduce the feature dimensionality loss [15,16]. An all-convolutional dual-stream neural network with the channel attention module inserted is constructed as a first-order feature extractor. The channel attention module based on squeeze and excitation operations is added for improving the first-order feature response. For second-order response, a cross-stream spatial interactive feature is generated to improve the feature nonlinear description. High-order features have been proved to be more sensitive to texture and edges [17,18]. Six {3 × 3} convolutional kernels (steam1) and three {5 × 5} convolutional kernels (steam2) are applied. Multiorder features are fused to enhance the utilization of different orders of features by summation. Features of different orders have different emphases, and complementary utilization promotes the overall nonlinearity of the network without wasting information [19]. This is beneficial for applications where the original information mapping is not sufficient or the network depth is limited.

#### *3.2. All-Convolutional Neural Network-Based First-Order Feature Extractor with Channel Attentional Module Inserted*

An asymmetric all-convolutional dual-stream CNN is configured as a feature extractor to automatically extract first-order features. CNN streams of different structures focus on respective priorities, which can fully mine image features. Mathematically, the feature maps at *i*th layer are calculated as:

$$f^i = \varphi(w^i \otimes f^{i-1} + b^i) \tag{2}$$

where *ϕ* denotes the activation function *ReLU*, and *w* and *b* represent the convolutional kernels and bias [20].

To highlight the emphasized parts of the features, the channel attention module including squeeze and excitation operations is inserted after the convolution layer [21]. As shown in Figure 3b, the feature maps are assigned scaling according to the channel importance to improve the feature representation. For a set of features *<sup>f</sup>*∈R*h*×*w*×*<sup>c</sup>* , the

squeeze operation is performed to obtain the global distribution Z{1 × 1 × *c*}, reflecting the features response over the channels. The specific mathematical expression is as follows:

$$\begin{aligned} Z &= \frac{GP(f^c)}{\hbar \times w}, \\ GP(f^c) &= \sum\_{i=1}^{h} \sum\_{j=1}^{w} f^c(i, j) \end{aligned} \tag{3}$$

Here, the global average pooling is chosen to compress the feature into real numbers by spatial dimension. For excitation operation, *Z* is fed into two fully connected layers to further learn factor *S*:

$$S = \mathfrak{q}(fc(Z))\tag{4}$$

where *ϕ* denotes the activation function *ReLU* and *fc* represents the fully connected layer. These attention scalings are assigned to the initial feature map to obtain the rescaled feature map *f* 2*i* :

$$
\overrightarrow{f}^i = S^i \times f^i \tag{5}
$$

The attentional module is capable of suppressing the 2D features with lower response in the channel domain and instead increase the 2D features with higher response. After the squeeze and excitation, the feature maps are visualized in Figure 3b, and the "light and dark" changes of some feature maps can be clearly observed.

#### *3.3. Spatial Interactive Second-Order Feature*

To increase the nonlinear expression of the network, a cross-flow spatial interactive feature is proposed as a second-order response. In contrast to traditional second-order features captured by different channels at the same location, spatial interactive features in this work are generated by convolving single-channel feature maps at different streams with each other. Different stream branches focus on different extraction priorities, so the proposed method concerns more on the intrinsic relationship between the overall features in different streams. For tiny and low-resolution pressure inputs, further exploration of the interactions between different streams is necessary in the presence of network depth limitation.

The cross-flow spatial interactive feature generation is shown in Figure 4a. The extracted first-order features of stream1 and stream2 are represented as *f* stream1 and *f* stream2. To make the interstream interaction more adequate, the original features of stream1 *f* stream1 are reconstructed without loss as *f* 1-reconstruction. Specifically, the *f* stream1 are split by interval sampling and stitched together, transforming the information on width and length to the channel dimension with no information loss (shown in Figure 5). In the case of this work, the dimension of the *<sup>f</sup>* stream1 {20 × <sup>20</sup> × 64} is adjusted to *<sup>f</sup>* 1-reconstruction {5 × <sup>5</sup> × 1024}. The reconstructed feature of stream 1 *f* 1-reconstruction and the features of stream2 *f* stream2 are subjected to an interactive operation, achieving a second-order feature.

The obtained second-order feature is performed in a sum pooling step to finally obtain the cross-flow spatial interactive feature *f* 2-order:

$$f^{\text{2nd-order}} = \sum\_{i=1}^{nc} (f^{1 \text{-reconstruction}} \odot f^{\text{stream2}}) \tag{6}$$

To further improve the cross-flow spatial interactive feature representation, *f* 2-order:is normalized by element-wise signed square root followed by *L*<sup>2</sup> regularization as *f* 2-order-norm [16]:

$$f^{\text{2nd-order-norm}} = \frac{\text{sign}(f^{2\text{-order}}) \times \sqrt{f^{2\text{-order}}}}{\left\| \operatorname{sign}(f^{2\text{-order}}) \times \sqrt{f^{2\text{-order}}} \right\|\_2} \tag{7}$$

where *sign* represent the symbolic function.

**Figure 4.** The generation of cross-flow spatial interactive feature. (**a**) The generation of proposed spatial interactive feature; (**b**) traditional second-order feature generation.

**Figure 5.** Schematic diagram of first-order feature reconstruction.

#### *3.4. Multiorder Feature Hybrid*

Different orders of features have different emphases and can be utilized in a more complementary way. Without adding additional parameters, the multiorder feature hybrid strategy is proposed to enhance the efficiency of feature utilization. The features of different orders are summed to obtain the fused features, as shown in:

$$y\_{fusion} = f^{steam1} + f^{steam2} + \mathbf{g}(f^{steam1} \odot f^{steam1}) + \mathbf{g}(f^{steam2} \odot f^{steam2}) \tag{8}$$

where *yfusion* stands for the fused feature and the *g*(•) is regularization (batch normalization here [22]). Intuitively, as shown in the upper right part of Figure 3c, only segmented linear functions are obtained when no second-order terms are added, which means that the nonlinearity appears only in a few 1D subspaces of the 2D plane R2. However, if the second-order terms are added, nonlinearity exists R<sup>2</sup> . = [0, ∞) 2. The multiorder features allow the efficiency of feature utilization to be enhanced while keeping the network depth constant, facilitating the nonlinear fitting of the network.

#### **4. Results**

#### *4.1. Experiment Setup*

As shown in the upper part of Figure 1, a data-acquisition system containing threescale pressure arrays, a force gauge, a motor driver, and a microcontroller is set up. To demonstrate the effectiveness of the proposed method, we choose letter shapes with complex contours as task targets. The stamp (0.8 cm × 1.0 cm, 1.0 cm × 1.5 cm) with raised letter shape is fixed on the force gauge to press the sensor array. The output matrix is normalized to the range 0–255 to form haptic grayscale images. The reliability of our

prototype allowed us to collect 500 samples for each letter-shape category using three scales of 16 × 16; 20 × 20; and 32 × 32 arrays, in a random pressure range of 0–5 kPa, angle and position. The data samples of each letter shape are shown in Figure 6. To avoid overfitting, data augmentation is applied to expand the sample quantity to 1500 per letter category. All the CNNs are constructed with MatConvNet framework of Matlab 2017 on an Intel Core i5-6500@3.2GHz CPU. The specific parameters of the network are set as shown in Table 1. The weights are initialized by the Xaiver initialization scheme and optimized by Adam algorithm with hyperparameters ε = 10−8, 0.9, 0.999 [23,24]. The three-scale datasets are put into the MoAS-CNN with fivefold crossover to verify the performance of MoAS-CNN. The learning rate is 0.001, batch size 50, and training epoch 120.

**Figure 6.** The dataset of 26 letter-shape samples (**A**–**Z**).



#### *4.2. Performance of MoAS-CNN*

The training and test process in each epoch are recorded. The training losses and test losses are shown in Figure 7a. It can be seen that the losses of all datasets decrease significantly and stabilize after 80 epochs. Accordingly, as shown in Figure 7b, the recognition accuracies achieved are 95.73% for 16 × 16, 98.37% for 20 × 20 and 98.65% for 32 × 32. This can be attributed to the fact that more information is captured through higher density and larger area, and thus the extracted features are more separable.

**Figure 7.** The performance of MoAS-CNN on three-scale datasets. (**a**)The losses in training and testing process, (**b**) the recognition accuracies in training and testing process.

We also record the gradients for all three datasets simultaneously, as shown in Figure 8a–c. The gradient values become larger in the backpropagation, and there is no gradient disappearance. Among them, the gradient value of Conv1 is the largest in the 32 × 32 dataset, making the loss converge quickly. There is no overfitting or underfitting, which indicates that our model and datasets are reasonable.

**Figure 8.** Gradient distribution on 16 × 16, 20 × 20, 32 × 32 datasets during training.

Figure 9 shows the test confusion matrix of the 32 × 32. Obviously, shapes with simple contours are more distinguishable, e.g., I. Conversely, complex contours are more prone to confusion, especially with similar categories, e.g., Q and G (mean Bhattacharyya correlation coefficient of 0.89).

#### *4.3. Contribution of Spatial Interactive Second-Order Feature and Multiorder Hybrid*

To further explore the contribution of the proposed spatial interactive second-order feature and hybrid order strategy, individual streams of MoAS-CNN as well as different structures are compared, and the results are shown in Figure 10. To validate the spatial interactive second-order features, we compared the recognition accuracies for *f* stream1 and *<sup>f</sup>* stream1 <sup>+</sup> *<sup>f</sup>* stream1⊗*<sup>f</sup>* stream1 with *<sup>f</sup>* stream2 and *<sup>f</sup>* stream2 <sup>+</sup> *<sup>f</sup>* stream2⊗*<sup>f</sup>* stream2, respectively. Due to the introduction of the spatial interactive features, the accuracy of stream1 increased by3.36%, 4.11%, and 1.96% on the three scales, and stream2 was 1.98%, 2.54%, and 1.75%. This fully demonstrates the effectiveness of the spatial interactive-based second-order feature.


**Figure 9.** Confusion matrix of MoAS-CNN on 32 × 32 dataset.

To verify the effectiveness of the multiorder fusion features, the multistream first-order model *<sup>f</sup>* stream1 <sup>+</sup> *<sup>f</sup>* stream2 and the multiorder hybrid *<sup>f</sup>* stream1+ *<sup>f</sup>* stream2+ *<sup>f</sup>* stream1⊗*<sup>f</sup>* stream2 are compared. The results showed that the recognition accuracies were promoted by 2.26%, 2.54%, and 1.69%, respectively. Furthermore, it can be seen from the results of stream1 and stream2 that the accuracy is improved by replacing the large convolutional kernels with multiple layers of small convolutional kernels. This can be attributed to the increased depth of the network, where high-level features are more fully exploited.

**Figure 10.** The contributions of cross-flow spatial interactive feature and multiorder hybrid strategies.

#### *4.4. Performance Comparison*

For a more comprehensive and fair comparison, a traditional first-order CNN, a bilinear based second-CNN, and a local feature SIFT method are chosen [16,25]. The bilinear CNN based on the traditional bilinear method with same parameters as the proposed method is constructed and the same for the first-order CNN (detailed in Table 1). The results are shown in Figure 11. It can be seen that the accuracies of MoAS-CNN are significantly higher than those of other methods on all sensor scales, especially for smallerscale pressure arrays. This result illustrates the effectiveness of the proposed MoAS-CNN for low-resolution haptic image. For 16 × 16, the MoAS-CNN obtained 95.73%, while the traditional bilinear CNN was 92.46%, traditional CNN 89.79%, and local feature method 67.24%. Since the inputs of both 16 × 16 and 20 × 20 scales are too small to be applied in SIFT, which can only extract 3–5 key points, the recognition effect is very limited.

**Figure 11.** Performance comparison with other first-order, second-order and traditional local feature methods.

For the samples that the model failed to identify, the true labels and predicted probabilities are shown in Figure 12, marked in orange. These haptic images are collected in a random pressure range of 0.5–3 kPa, angle and position. The inference value of some letter categories with similar shapes are very close, indicating that the differences between features are not obvious. As a comparison, 10 volunteers (7 males and 3 females, age [20,30] are invited to identify the misclassified samples and the results are marked in green. It can be observed that some samples were not recognized by the neural network or humans, as shown in Figure 12a–f, and others were recognized by humans, but not by neural networks, as in Figure 12g–i. No volunteer can completely reclassify all failed samples. The misclassification can be mainly attributed to the following: firstly, the restricted sensor density leads to fine edges not being captured; secondly, the elastic coupling of the flexible sensors causes the unpressed pixels around the pressed pixels to be deformed, producing pseudo-outputs. Thirdly, features are not sufficiently mined through the MoAS-CNN and the features of similarly shape categories are not learned separately.

**Figure 12.** Comparison of recognition results between MoAS-CNN and human for some difficult samples. (**a**) Inference results of sample with label "C", (**b**) inference results of sample with label "G", (**c**) inference results of sample with label "E", (**d**) inference results of sample with label "M", (**e**) inference results of sample with label "K", (**f**) inference results of sample with label "K", (**g**) inference results of sample with label "V", (**h**) inference results of sample with label "N", (**i**) inference results of sample with label "H".

#### **5. Discussion**

Haptic technology provides real-time feedback on external force changes through flexible sensors mounted on or inside mechanical surfaces, providing an aid to scene understanding beyond vision. Therefore, recognition based on haptic information has become important for smart devices and has wide application prospects in human–computer interaction, intelligent machinery, and biomedicine. The intelligence level of human–computer interaction is improved by adding haptic perception. In addition, distribute pressure data feedback and analysis is important in many industrial sensing fields. Generally, the haptic information is converted into a grayscale image and then transferred to image-processing methods for subsequent recognition. However, compared with visual images, haptic images are still challenging because of their small dimensions, low resolution, and blurred edges. The experimental results show that MoAS-CNN can realize accurate haptic perception, and the highest accuracy of haptic letters with complex shapes was 98.65%. The cross-stream spatial interactive feature as a second-order response and multiorder feature fusion can significantly improve the extraction of haptic shapes by the network.

Although the proposed model has proven to be accurate and effective, it does have limitations. Through the analysis of the misclassified samples, some different classes with similar shapes cannot be clearly classified by the network. This may be attributed to two factors. Firstly, the feature extraction of the proposed method for haptic images is insufficient, and some information is lost in the extraction of high-level semantic information, especially for images with lower resolution, such as 16 × 16. Secondly, the characteristics of the haptic task itself, including the low resolution of the sensor array, the small number of pixels, and the blurred edge, lead to inaccurate mapping of the original shape, and the multiple meanings of individual images, resulting in lower-than-average recognition rates for some categories.

More importantly, smallness and low-resolution inputs are also present for practical vision tasks due to factors such as suboptimal raw data and environmental interference. Therefore, we tried to validate our method on another small general-purpose dataset CIFAR-10 [26], and the recognition accuracy was 98.81%. Compared with the current state-of-the-art results, it is lower than the largest-scale visual transformer methods [27,28] and essentially on par with the deeper CNN architecture [29]. This study is based on touch recognition based on a single image after touch completion. In the future, we plan to study recognition methods based on multiple dynamic touches and further improve the recognition performance of touch perception through reasonable optimization algorithms.

#### **6. Conclusions**

A framework called MoAS-CNN is proposed to address the challenge of low-resolution haptic recognition based on a pressure sensor array. The three contributions of our recognition model are to firstly apply a dual-stream CNN integrated with the channel attention module to automatically extract first-order features and increase the response and number of features. Secondly, a spatial interactive second-order feature is introduced to depict the second-order information to improve the feature extraction capability without additional feature downscaling. Thirdly, by exploring the complementarity of features of different orders, a multiorder hybrid strategy is developed to enhance the efficiency of feature extraction. To validate the model, a self-built acquisition platform based on a three-scale pressure array was built and haptic images of letter shapes (A–Z) with complex edges were collected. The results showed that 95.73%, 98.37%, and 98.65% accuracy was achieved at the scales of 16 × 16, 20 × 20, and 32 × 32, respectively. The accuracy of the proposed method has a significant advantage over traditional second- and first-order CNN-based methods, as well as manual feature methods. Furthermore, in addition to haptics, low-resolution inputs are common in practical applications because of the inherent limitations of various types of sensing elements and nonideal factors. Our approach provides a new general framework that can be easily extended to different systems.

**Author Contributions:** Conceptualization, K.W. and J.C. (Jie Chu); methodology, K.W., J.C. (Jie Chu) and J.C. (Jueping Cai); software, Y.C. and D.L.; validation, C.Z.; writing—original draft preparation, K.W. and J.C. (Jie Chu); writing—review and editing, J.C. (Jie Chu) and J.C. (Jueping Cai). All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by the National Natural Science Foundation of China (62274123), Natural Science Basic Research Plan in Shaanxi Province of China (2021ZDLGY02-01), and Wuhu-Xidian University Industry-University-Research Cooperation Special Fund (XWYCXY-012021003).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** All data generated or analyzed during this study are included in this paper or are available from the corresponding authors on reasonable request.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Method for Robot Manipulator Joint Wear Reduction by Finding the Optimal Robot Placement in a Robotic Cell**

**Tomáš Kot 1,\*, Zdenko Bobovský 1, Aleš Vysocký 1, Václav Krys 1, Jakub Šafaˇrík <sup>2</sup> and Roman Ružarovský <sup>3</sup>**


**Abstract:** We describe a method for robotic cell optimization by changing the placement of the robot manipulator within the cell in applications with a fixed end-point trajectory. The goal is to reduce the overall robot joint wear and to prevent uneven joint wear when one or several joints are stressed more than the other joints. Joint wear is approximated by calculating the integral of the mechanical work of each joint during the whole trajectory, which depends on the joint angular velocity and torque. The method relies on using a dynamic simulation for the evaluation of the torques and velocities in robot joints for individual robot positions. Verification of the method was performed using CoppeliaSim and a laboratory robotic cell with the collaborative robot UR3. The results confirmed that, with proper robot base placement, the overall wear of the joints of a robotic arm could be reduced from 22% to 53% depending on the trajectory.

**Keywords:** robot; manipulator; robotized workplace; robotic cell; optimization; wear

#### **1. Introduction**

The design of robotized work cells or lines is a complex multidisciplinary task that is influenced by many external factors and conditions. During the designing process, it is necessary to make many decisions that greatly affect the resulting performance of the workplace and its properties, including the cycle time, velocity of individual parts, dynamic effects, vibrations, lifetime, ground area, and energy consumption.

Crucial for the design of a robotic cell is the selection of an appropriate industrial or collaborative robot and its placement in relation to other subsystems. The length of the designing stage of robotic cells is still shortening, which leads to the copying of existing layouts of workplaces with similar parameters and their adaptation to the actual requirements.

Any type of advanced optimization is almost impossible in this approach. However, this is the phase where it is possible to achieve some interesting savings of energy consumption, ground area, or the lifetime of some systems. The recent progress in complex simulation tools, such as Tecnomatix, Matlab—Robotics System Toolbox, CoppeliaSim (V-Rep), Gazebo, and Webots; methods for the creation of digital twins of the designed robotized workplace; and whole manufacturing lines [1] open up new room for the realization of complex multicriteria optimizations in the early stages of design.

Optimizations of robotized workplaces to achieve the reduction of running costs are currently mostly realized in systems that are already in operation. These typically include modifications of the end-point trajectory according to a chosen criterion, such as the manipulation time [2,3] or the overall productivity and running costs of the robot [4].

The optimization of the kinematic properties of the manipulator movement can also lead to the reduction of torques in individual joints and, thus, also the reduction of the

**Citation:** Kot, T.; Bobovský, Z.; Vysocký, A.; Krys, V.; Šafaˇrík, J.; Ružarovský, R. Method for Robot Manipulator Joint Wear Reduction by Finding the Optimal Robot Placement in a Robotic Cell. *Appl. Sci.* **2021**, *11*, 5398. https://doi.org/10.3390/ app11125398

Academic Editor: António Paulo Moreira

Received: 17 May 2021 Accepted: 7 June 2021 Published: 10 June 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

overall energy consumption. For example, the authors in [5] used interval analysis to find the global minimum jerk trajectory of a robot manipulator in a joint space using cubic splines, resulting in a smoother and vibration-free robot movement.

A similar approach was proposed in [6], where the objective function included not only the integral of the squared jerk values but also the total execution time of the trajectory. The authors in [7] used the 12-phase sine profile for trajectories of a fast-moving robot, which leads to a more stable and accurate movement without sudden changes of torques, albeit at the cost of a slightly higher overall energy consumption. Trajectory optimization with the single goal of cycle time reduction using the chicken swarm optimization (CSO) method was described in [8].

Another optimization goal is the reduction of the workplace layout size [9] because the usable area of the factory building is extremely valuable. There are also some methods for the multicriteria optimization of robotic work cells or lines for energy consumption even for lines with up to 12 robots, based on the Gurobi simplex method [10]. Another method of multi-robot cell optimization for time and energy reduction using a custom mechatronic model of the robots in Modelica/Dymola was described in [11]. The authors in [12] proposed a methodology for assembly line energy consumption optimization based on the implementation of energy-optimal trajectories, and, in [13], this method was improved by modification of the actuator brake release time for additional energy consumption reduction.

A systematic methodology for the on-site identification and energy-optimal path planning of an industrial robot is presented in [14] with a focus on a specific type of ABB industrial robot. Improved robot programming reduced the energy consumption compared to the built-in controller routines by up to 4%.

Further improvements and torque reduction for a robotized work cell can also be achieved by modification of the position and orientation of the robot base in relation to the optimized trajectory [15]. This can be interesting, especially for applications where it is not possible to modify the end-point trajectory and velocity for technological reasons (the application of adhesives, edging, welding, etc.). In these cases, it is necessary to create a complex simulation model—a digital twin of the workplace [16,17] and perform a multicriteria optimization.

#### *The Concept of Robot Wear*

Gearboxes are often mentioned as the component that is frequently responsible for a failure of rotary machines [18], including industrial robots [19]. The most critical components of a gearbox are the gears and bearings. The deterioration of a gear appears at the teeth [20,21], harmonic drive gears are damaged by fatigue fractures [22], and bearings are typically subjected to failures at the rolling elements or the inner or outer race [21]. The damage accumulates over time and is caused mainly by load (forces and velocity) [23,24], high temperature [25], bad lubrication [26], or manufacturing defects. The performance, accuracy, and lifetime of a robot relies on the good condition of these critical components [27].

The authors in [28] proposed a method for estimating the wear of a robot by monitoring the temperatures in the joints, while [29] suggested a more complex solution, where other parameters are also monitored in order to predict the lifetime of a multi-component system. Other common properties monitored to predict the lifetime of a machine include vibrations and noise [21]. A generic framework for predictive maintenance based on simulation models with degradation curves discovered from real data was proposed in [27]. Our approach, on the contrary, attempts to minimize wear by the use of a quite simple simulation in the design stage of a robotized workplace instead of monitoring an already existing one.

As mentioned above, the gearbox in the robot manipulator joint can be considered the most important source of the wear of the joint. The electric motor in the joint is subject to deterioration as well [30]. As far as the above-mentioned sources of wear are concerned, the most important is the load combined with movement velocity, because a higher velocity creates higher temperatures. This applies to both the gearbox and the motor. The influence of bad lubrication or manufacturing defects are almost impossible to anticipate or even simulate and, thus, are not considered in this work.

A typical robot joint contains structural bearings that provide the mutual rotational movement of the joints. However, obtaining the values of the reacting forces required to calculate the wear of these bearings is not an easy task, even when using a dynamic simulation. This requires an exact simulation model of the robot with a detailed representation of the inner parts and mechanisms, accurate values of the mass properties, acceptably realistic values of the friction coefficients, and a good dynamic simulation engine. Although it is usually possible to obtain 3D models of commercially available industrial and collaborative robots, and sometimes even simulation models prepared for common simulation systems, these models typically are simplified and do not contain the inner mechanisms of the arms.

On the other hand, wear of the gearbox and motor can be expressed as torque that the motor must generate (and that the gearbox must transfer to the joint) over some trajectory of motion. These values depend on physical quantities that are much easier to obtain from a simulation—the overall path of motion (kinematics) and the required driving torque required to achieve the given acceleration (dynamics).

The hypothesis of our research is that the lifetime of a robot in a robotized workplace can be improved in the design stage by designing the workplace (namely the location of the robot) in such a way to ensure lower wear of the robot joints.

#### **2. Materials and Methods**

To determine the optimal placement of a robot manipulator within a robotic cell with the goal of reducing and balancing joint wear, it is necessary to propose a suitable optimization criterion, the whole optimization process, and an experiment to verify the results.

#### *2.1. Optimization Criterion*

In this work, we consider only the wear of the robot drive chain (see the previous section). We will also restrict the following notation to rotational joints (which are much more common than translational joints in robotics) and to six degrees of freedom (the most common number in robotics); however, the principles can be applied in general.

An industrial or collaborative robot in a robotized workplace usually performs a limited set of movements. Typically, there is a given trajectory that the robot end-point should follow during the work cycle, and the robot joints are controlled using inverse kinematics to adhere to this trajectory. The idea of wear being caused by a torque acting over a trajectory corresponds with the concept of mechanical work, which can be expressed as

$$\mathcal{W} = \int\_{\phi\_1}^{\phi\_2} \tau d\phi\_\prime \tag{1}$$

where *τ* is the magnitude of the torque vector, *φ* is the angle of rotation about the vector representing the joint axis, and *φ*<sup>1</sup> and *φ*<sup>2</sup> are the starting and ending angles of rotation, respectively.

The integral (1) is path-dependent and the mechanical work is defined as the change of energy. Thus, if we assume *φ*<sup>1</sup> = *φ*2, which is true for a closed-loop trajectory of the robot end-point (all individual joints have to start and end in the same angle of rotation), the resulting value of *W* would always be equal to zero. It is, thus, more convenient to express mechanical work as the integral of mechanical power over time

$$\mathcal{W} = \int\_{\phi\_1}^{\phi\_2} \tau d\phi = \int\_{t\_1}^{t\_2} \tau \omega dt,\tag{2}$$

where *ω* is the angular velocity of motion and *t*<sup>1</sup> and *t*<sup>2</sup> are the starting and ending time, respectively. However, this integral (2) is still path-dependent, and the calculation would result in *W* = 0. Thus, it is necessary to introduce the absolute value of both the torque and the angular velocity

$$\mathcal{W} = \int\_{t\_1}^{t\_2} |\tau \omega| dt,\tag{3}$$

which allows us to calculate the work over the whole trajectory, while in fact, considering the trajectory divided into segments separated by the change of direction of movement or the change of the sign of *τ*. This modification moves away from the concept of the conservation of energy toward the concept of wear caused by mechanical work. The idea is that the drive chain of a robot joint is worn down even when the torque is negative (the motor is actively braking) or when the angle of rotation is decreasing or moving back.

The simulation is numerical with a definitive value of Δ*t* (simulation step size) instead of *dt*. Therefore, the integral (3) is replaced by a sum

$$\mathcal{W} = \sum\_{i=0}^{n} |\tau\_i \omega\_i| \Delta t\_\prime \tag{4}$$

where *i* = 0, 1, ... , *n* is the simulation step, *τ<sup>i</sup>* is the instant torque, and *ω<sup>i</sup>* is the instant angular velocity in the *i*-th simulation step. The mathematical meaning of the value *W* is shown in an example in Figure 1.

**Figure 1.** Example of the time progression of joint torque *τ* and angular velocity *ω*; the dotted area represents the meaning of the value *W* calculated as the integral of absolute value |*τω*|.

The value of *W* was calculated individually for each joint over the whole robot trajectory, yielding *W*1, *W*2,..., *W*<sup>6</sup> for the most common number of 6 degrees of freedom. We attempted to optimize two factors:


Therefore, it is necessary to consider the values *Wj* together. However, the joints of a robot manipulator are typically not equal from the mechanical point of view, and their capabilities are different—the lower joints are larger and stronger, and the joints near the end-effector are lighter. The same magnitude of *W* could, in reality, mean much higher stress and wear for a smaller joint than for a larger one. We, therefore, introduce a variable called the relative wear factor *wj*, which is calculated by normalizing the individual *Wj* values

$$w\_{\dot{j}} = \frac{\mathcal{W}\_{\dot{j}}}{\pi\_{m\_{\dot{j}}} \omega\_{m\_{\dot{j}}}}, \quad w\_{\dot{j}} \ge 0,\tag{5}$$

where *j* = 1, 2, ... , 6 is the joint number, *τmj* is the maximal permissible torque, and *ωmj* is the maximal angular velocity of the *j*-th joint (both values are specified by the manufacturer of the robot).

To minimize the overall wear of the robot, we chose to use the arithmetic mean of the relative wear factors of all joints and to find the minimal value of this mean,

$$A = \frac{1}{6} \sum\_{j=1}^{6} w\_{j\prime} \tag{6}$$

and balance is achieved by finding the minimal value of the standard deviation

$$
\sigma = \sqrt{\frac{1}{6} \sum\_{j=1}^{6} \left( w\_j - A \right)^2}. \tag{7}
$$

The chosen fitness function (optimization criterion) *f <sup>f</sup>* places the same weight on both these factors; therefore,

$$f\_f = \frac{A}{2} + \frac{\sigma}{2}.\tag{8}$$

#### *2.2. Optimization Process*

The proposed method requires a simulation model capable of evaluating the movement of a robot manipulator through a given end-point trajectory while computing the values of joint angles and torques in individual time steps. This is possible, for example, in the popular robotic simulation system CoppeliaSim (formerly known as V-Rep), which was also chosen for our work.

The scripting capability of CoppeliaSim allows programming a robot's end-point movement through a given trajectory and time, and the built-in physics engine (i.e., Bullet 2.78) is able to check for collisions and evaluate joint torque values. The inverse kinematics (IK) are calculated using the integrated pseudoinverse IK solver. The CoppeliaSim API (Application Programming Interface) framework (*RemoteAPI*) can be used to remotely configure the simulation, which, in our case, includes particularly the process of changing the robot placement relative to the trajectory, as we are trying to determine the optimal placement of the robot. A custom application was written in Visual C++ for this purpose.

Due to the long simulation times in CoppeliaSim, especially in the cases when the IK calculation fails (the robot cannot reach some parts of the given trajectory), all valid positions of the robot relative to the given trajectory were pre-calculated in the C++ application and CoppeliaSim was used only to calculate the fitness function value in those positions. The valid locations were found using a simple kinematic simulation inside the C++ application, which can quickly verify the ability of the robot to fulfill a given task from the specific location, including collision checking.

The search space is represented as a discrete 3-dimensional grid with the spacing in all three dimensions equal to *s* = 0.03 m. The system returns the valid robot positions as a list of points, where each point represents the location of the center of the robot base (see the coordinate system in Figure 2a) in the workspace. Verification of the whole robot trajectory in this simulation system took approximately 1800-times less time than the same task in CoppeliaSim, and invalid robot locations were discarded even faster.

**Figure 2.** The UR3 robot. (**a**) The simulation model in the CoppeliaSim environment with a path representing the end-point trajectory and a coordinate system in the center of the robot base. (**b**) The real robot used in the experiments.

For the reproducibility of the research, this supplementary custom simulation system is not necessary, as its main purpose is simply to shorten the overall simulation time. However, it is necessary to have an external application or a script in CoppeliaSim that successively places the robot in the locations from the above-mentioned search space (grid), executes the CoppeliaSim simulation, and calculates and stores the results.

The reason for choosing a 3-dimensional grid for the potential locations of the robot base instead of a 2-dimensional plane (representing the factory floor) is that the proposed method is intended to be as general as possible, and limiting the search to a 2-D grid could likely miss some interesting solutions. In reality, robots are commonly mounted on a stand, table or a console; therefore, the height of the robot can be chosen. However, the results can be easily limited to, for example, a 2-dimensional plane, if the actual application requires such a limit.

#### *2.3. Experiment Setup*

The proposed method was demonstrated and verified on a Universal Robots UR3 collaborative robot with 6 degrees of freedom (see Figure 2), first in a CoppeliaSim simulation configured according to the previous chapter, and finally also on a real physical robot. The values of the maximal allowed joint torque *τmj* and joint angular velocity *ωmj* (5) for the UR3 robot are listed in Table 1.

Five testing movement paths of the robot end-point were demonstrated, each in two variants with different velocities of the end-point (0.1 and 0.2 m/s), giving a total number of ten trajectories labeled A1, A2, B1, B2, ..., E2; where A–E is the path type, 1 stands for 0.1 m/s, and 2 stands for 0.2 m/s.

**Table 1.** The joint parameters for the UR3 robot; *τmj* is the maximal permissible torque, and *ωmj* is the maximal permissible angular velocity of the *j*-th joint.


The trajectories are a combination of real use-cases and artificially created simple trajectories that contain various elements (linear segments of different lengths, circle arcs, sections with frequent speed, direction changes, etc.), see Figure 3. Various velocities of the end-point cause a difference in the solution of possible robot base locations, because every robot has some velocity limits for individual joints (see Table 1), and they also represent different cases regarding the proposed wear factor calculation.

**Figure 3.** Visualization of the five testing robot end-point paths; the UR3 robot is shown as a scale reference.

The selected trajectories were of a smaller size, and there were no obstacles in the environment (except for the table that the robot was mounted onto), which leads to a larger number of different valid robot locations relative to the trajectory and, thus, also a larger data set of results.

The chosen robot UR3 is a small robot with a reach of only 500 mm and although using larger trajectories combined with several obstacles in the environment would be possible, the valid locations of the robot would be very limited and it would be difficult to make a statistical evaluation. In general, the proposed optimization method does not depend on the size of the trajectory. A longer trajectory, requiring a longer time to traverse, would produce larger values of the discrete integral (4); however, this is true for all robot locations relative to the particular trajectory, and thus the optimization remains valid.

In a real case, the simulation model would have to contain also all collision objects in the workplace, which could limit the valid robot locations considerably—the principle of the method is, however, still applicable. The concept of choosing the robot location is demonstrated on Figure 4 for the trajectory A1. The figure shows the grid of valid robot locations—each blue point represents a possible position of the robot base; the robot is able to reach the whole trajectory from each of these valid locations. For better clarity, the robot is shown in several selected locations.

**Figure 4.** Example of a grid of valid robot locations for a given trajectory (A1). Each blue point represents a possible position of the robot base center point; the robot is displayed in four distinct sample locations. GCS represents the global coordinate system.

#### **3. Results**

For each of the ten trajectories, the values of *w*0, *w*1, ... , *w*<sup>6</sup> (5), and then *f <sup>f</sup>* (8) were calculated based on the CoppeliaSim simulation for each particular possible robot location in the grid. The important values are summarized in Table 2, particularly the lowest fitness function value *f <sup>f</sup> <sup>B</sup>* in the best robot location, which should, thus, be the selected location for the robot, provided it is physically possible. The table also shows the percentage of improvement that the best location offers compared to the worst location,

$$imp\_W^B = 100 \times \left(1 - \frac{f\_f^{\circ^B}}{f\_f^{\circ^W}}\right),\tag{9}$$

which theoretically represents the greatest possible improvement in robot wear reduction. If we consider a random robot location versus the best one, the average improvement can be calculated compared to the average value

$$imp\_A^B = 100 \times \left(1 - \frac{f\_f^{\cdot B}}{f\_f^{\cdot A}}\right). \tag{10}$$

Figure 5 shows the distribution of *f <sup>f</sup>* values in all possible robot locations for each trajectory in the form of a standard box-plot diagram. The height of each shaded rectangle represents the interquartile range (third quartile minus first quartile), which indicates that 50% of all *f <sup>f</sup>* values lie inside the rectangle. The small circles in each column represent the outliers—the topmost one corresponds to the worst location with *f <sup>f</sup> W*.

Arrangements of the robot locations in the 3D space around the corresponding trajectories are displayed in Figure 6—the color coding indicates the *f <sup>f</sup>* values using the gradient red–yellow–green, where green is the best location (*f <sup>f</sup> <sup>B</sup>*), and red is the worst (*f f <sup>W</sup>*). This image also shows the positions of the best ("B"), worst ("W"), and three other robot locations ("1", "2", and "3") that will be used later in the experiments on a real robot. Note that the small cubes rendered in Figure 6 as a visual representation of the possible robot locations are shifted away from the ideal grid positions by a small random offset (less than half the grid spacing). This is to achieve better visual clarity by preventing moiré patterns and reducing the concealment of more distant cubes.


**Table 2.** Results for the 10 testing trajectories showing the number of valid robot locations *n*, the fitness function value in the best location *f <sup>f</sup> <sup>B</sup>*, in the worst location *f <sup>f</sup> <sup>W</sup>*, the average value *f <sup>f</sup> <sup>A</sup>*; and improvement of the best location against the worst *imp<sup>B</sup> <sup>W</sup>* (9) and the average *imp<sup>B</sup> <sup>A</sup>* location (10).

**Figure 5.** Statistical distribution of the *f <sup>f</sup>* values in all possible robot locations for the ten testing trajectories (standard box-plot diagram—each box is bounded by the first and third quartile, the horizontal line represents the median, the × represents the arithmetic mean, and circles represent outliers).

To better demonstrate the meaning of the proposed optimization criterion, Figure 7 shows, for the two trajectories A1 and B1, the individual values of the relative joint wear factors *w*1, *w*2, ... , *w*<sup>6</sup> (5) for the best (*wj <sup>B</sup>*) and worst (*wj <sup>W</sup>*) robot location, together with the corresponding total fitness function value *f <sup>f</sup>* (8). In the first example (trajectory A1), it is clear that, in the worst location, the third joint was extremely stressed and the six *wi W* values differ quite considerably. In the best location, joints 2, 3, and 6 have almost the same *wi <sup>B</sup>* values and, although the relative wear factor of the 6th joint increased, the overall *f <sup>f</sup> B* value was much lower.

A similar situation can be seen in the second image (trajectory B1). Here, the relative wear factor values in the best location are not so well balanced–the overall fitness function value was reduced considerably nonetheless because the average value *A* (Equation (6)) lowered.

The reason for the big improvement in the third joint relative wear factor *w*<sup>3</sup> for both these trajectories can be seen in Figure 8—the green and red solid lines show the immediate power values for the best and worst robot location, respectively. The difference in the magnitude is evident.

**Figure 6.** All valid robot locations in the grid; the color-coding indicates the *f <sup>f</sup>* values using the gradient red–yellow–green, where green is the best location (B) and red is the worst (W). Locations numbered 1, 2, and 3 are the three other locations used in the experiment with a real robot.

**Figure 7.** Detailed comparison of the components forming the fitness function value in the best (B, green) and worst (W, red) robot locations for two selected trajectories; displayed are the relative wear factors of each joint (*wj*) and the total fitness function value (*f <sup>f</sup>* ).

**Figure 8.** *Cont*.

(**b**) trajectory B1

**Figure 8.** Comparison of the angular velocity *ω*3, torque *τ*3, and immediate power *ω*3*τ*<sup>3</sup> of the third joint during the whole trajectories A1 and B1 for the best (*B*) and worst (*W*) robot locations.

#### **4. Verification on a Real Robot**

To experimentally verify the impact of the chosen fitness function on the real wear of the robot joints, it would be necessary to perform simultaneous long-term testing on multiple robots and then analyze the mechanical degradation of important components of the joint construction. This type of experiment was not feasible for us at this moment. Instead, we used experiments on a real robot to verify the accuracy of dynamical simulation in CoppeliaSim.

The real experiments were performed on the same robot as was previously used for the simulations—the Universal Robots UR3 collaborative robot. The robotic arm was mounted on a table (Figure 2), and, instead of changing the robot base location relative to the trajectory, the trajectory was appropriately shifted in relation to the robot. There were no tools nor other equipment mounted to the output interface flange, which corresponds to the simulation model described above.

The robot controller (CB3 controller, firmware version 3.10.0) provides state messages via the RTDE (Real-Time Data Exchange) protocol based on TCP/IP communication. This protocol allows reading the actual state of the robot with a frequency of 125 Hz. The RTDE messages can be configured and include the actual and target values of kinematic parameters, such as the position, velocity, and acceleration, as well as the currents of individual joints, the overall current of the whole robot, and the target torque values which are needed for our experiment.

The experiment was executed for the optimal (best) robot location found by the simulation, for the worst robot location, and for three other locations that were manually selected from the set of possible locations to cover various sections of the whole space. The locations are hereafter referred to as "B" (best), "W" (worst), and "1", "2", and "3" (the other locations). The three numbered locations are sorted from the lowest to the highest *f <sup>f</sup>* values according to the simulation results (a lower *f <sup>f</sup>* value indicates a better robot location). For each trajectory and robot location, the robot was programmed to go through the whole trajectory five times in a row. During this time, the integral (sum) was calculated according to (4), and the final value was then divided by five.

All tested real robot locations are depicted in Figure 6. Numerical values of the *f <sup>f</sup>* values in all five locations for all ten trajectories are listed in Tables 3 and 4. Table 3 shows the results from the simulation, the values here are sorted from lowest to highest ("B", "1", "2", "3", and "W"). Table 4 shows the values from real experiments—as can be seen, the

best locations were identical; the worst locations were also mostly identical, except for trajectory D2.

**Table 3.** Fitness function values in the five robot locations for individual trajectories—results from the simulation. Color gradient red–yellow–green is used for better visual representation of the value (green is the best, red is the worst).


**Table 4.** Fitness function values in the five robot locations for individual trajectories–results from the real robot. Color gradient red–yellow–green is used for better visual representation of the value (green is the best, red is the worst).


Table 5 shows the ratios between the real and simulated values; ideally, these values should all be equal to 1. In general, it can be stated that the *f <sup>f</sup>* values acquired by the real experiments were lower than the values from the simulation, and the average ratio was *r* = 0.745 with the standard deviation equal to *σ<sup>r</sup>* = 0.126. Figure 9 displays a comparison of the simulated and real values in the form of a bar graph, together with the ratio *r*. In this image, it can be observed that, for the same trajectory, the ratio *r* typically lowers (deteriorates) with increasing the *f <sup>f</sup>* values. This will be further discussed in the Discussion section.

251

**Figure 9.** Comparison of the fitness function values acquired from simulation and experiments on the real robot; the black lines represent the ratio between the values.


**Table 5.** Ratios between the real and simulated values of fitness function in the five robot locations for individual trajectories. Color gradient red–white is used for better visual representation of the value (white is the ideal ratio of 1.0, red is the ratio 0.5).

#### **5. Discussion**

The paper describes the grid approach to finding the optimal robot location, where all possible robot locations are evaluated using a simulation in CoppeliaSim, and then the location with the best (lowest) fitness function *f <sup>f</sup>* value is selected. In our case, there was an additional preprocessing step that found all valid robot locations using a simple and fast custom-made simulation system to increase the speed of the process. This step is not required, and therefore the research can be reproduced without this custom simulation system.

The grid approach can alternatively be replaced by using some optimization algorithms, for example, the Particle Swarm Optimization (PSO) [31], which could further reduce the time needed to find the optimal solution. However, PSO would not provide the same type of comprehensive analysis of the whole space around the trajectory, and could also possibly return a local minimum instead of the global one.

The method was demonstrated and tested on ten sample trajectories and the collaborative robot UR3. As can be seen from the results in Table 2, the percentage improvement in the fitness function between the worst and the best possible robot location ranged from 54% to 80.3%. This is mostly a theoretical improvement, as the worst-rated locations would likely be not chosen by the system integrator or workplace project architect for other reasons (they are typically on the edge of the working area of the robot or close to a singular configuration). If we instead compare the best robot location to the average *f <sup>f</sup>* value of all possible locations, the improvement still ranges from 22% to 53.1%; therefore, it is clear that some interesting reduction in the robot joints wear can be achieved by selecting the optimal robot location instead of a "random" one.

This method can be used in practice, provided there is a dynamic model of the selected robot available for some suitable simulation system (for example, CoppeliaSim). It is necessary to properly define the whole simulation model of the workplace, including any potential obstacles, otherwise, the returned optimal robot location could be invalid due to a collision. Nonetheless, it is still important to verify that the optimal location is feasible from the point of view of energy connections and other similar restrictions that cannot be included in the simulation.

The dynamic model should include all properties and phenomena that noticeably affect the torque values in joints of the robot, as the torques are one of the main inputs for the proposed method. It is, thus, necessary to properly define and simulate also the payload in the end-effector, including any technological forces caused by the effector during, for example, cutting, milling, and spraying. The simulation system CoppeliaSim chosen in our demonstration is capable of simulating all these effects.

Experiments on a real robot UR3 verified that the simulation model in CoppeliaSim matched the reality acceptably well. The absolute values of *f <sup>f</sup>* acquired by the simulation and from the real robot differed approximately by the scale factor of 0.745 ± 0.126.

The difference between simulation and reality was caused especially by the simulation model of the UR3 robot, which is likely not absolutely perfect as far as mass properties are concerned, and also by the dynamic simulation engine, which performed some simplifications. More important is that, in the relative evaluation of individual robot locations, the real experiments led to very similar results as the simulation—as can be seen in Tables 3–5. There were some distinct deviations only in one path of robot end-point movement (trajectories D1, D2). It can be, thus, stated that, to improve the robot joints wear, potentially expensive and time-consuming real-world experiments are not necessary; a simulation suffices.

To demonstrate and analyze the source of the difference between the values from the simulation and from the real robot, Figure 10 shows an example of the simulated torque *τsim* <sup>3</sup> and velocity *<sup>ω</sup>sim* <sup>3</sup> values directly compared with the values from the real robot (*τreal* <sup>3</sup> , *ωreal* <sup>3</sup> ), for the third joint of the robot during the B1 trajectory. The absolute value of *<sup>τ</sup>real* <sup>3</sup> was generally lower than the absolute value of *τsim* <sup>3</sup> , which was likely caused by inaccurate mass parameters of the robot links in the simulation model. This is the reason why the *f <sup>f</sup>* values from the real robot were mostly lower than the values from the simulation (see Table 5). The noise in the *ωsim* <sup>3</sup> values was caused by the discrete character of the simulation.

Figure 10b shows that, for the worst robot location, the *τreal* <sup>3</sup> values differed significantly from *τsim* <sup>3</sup> in several peaks that correspond to moments with noticeable acceleration in the movement. This is probably caused by some simplifications in the dynamics engine in the simulation system or by some advanced algorithms in the control system of the real robot. Generally, in worse robot locations (with higher *f <sup>f</sup>* values), the robot joints must perform movements with higher acceleration, which causes this additional source of difference between the simulation and reality.

(**a**) trajectory B1, the best robot location

**Figure 10.** *Cont.*

(**b**) trajectory B1, the worst robot location

**Figure 10.** Comparison of the real and simulated values of angular velocity *ω*<sup>3</sup> and torque *τ*<sup>3</sup> of the third joint during the whole trajectory B1. (**a**) Robot placed in the best location. (**b**) Robot placed in the worst location.

The proposed optimization process of a robotized workplace can be applied in the design phase when it is easy to make modifications to the workplace layout. Another option is to use the optimization for an existing workplace when other types of optimization are not possible (the trajectory is fixed, etc.).

#### **6. Conclusions**

The goal of this paper was to present a methodology for the optimization of a robot manipulator base position relative to the given trajectory of movement of the manipulator end-point. The optimization algorithm was based on the principle of minimization of the chosen fitness function, which comprises the arithmetic mean and standard deviation of the relative wear factors of all robot joints (8). The goal was to balance and minimize the wear of the drive chain in the robot joints, where the wear is approximated as the amount of mechanical work done by the joint, while also taking into account the abilities of the particular joint (the maximal permissible torque and velocity).

Future work will include verification on different types of robots and with various payloads. To verify the real impact on the wear of robot joints and, thus, the reduction of the robot lifetime, a long-term experiment would have to be performed simultaneously on at least two identical robots. One robot would be placed in a location with a good (low) *f <sup>f</sup>* value and the other—reference—robots would perform the same task from locations with higher *f <sup>f</sup>* values. Afterward, the robots would have to be partially disassembled to analyze and compare the mechanical degradation of the joints.

The cases when the workplace is used alternately for several different tasks and, thus, the robot performs more than one movement trajectory could also be addressed. The trajectories could either be combined into one for the simulation, or the method could be applied to each trajectory separately and then the results would be combined, for example by interpolation using weight coefficients given by the frequency of use of the trajectories.

Although this method does not require extremely precise dynamic simulation, this aspect could also be improved by creating a more accurate dynamic model of the robot, for example using some of the dynamic parameter identification method [32]. However, the simulation of friction and other complex phenomena will always be simplified in commonly available simulation systems.

**Author Contributions:** Conceptualization, T.K., Z.B. and V.K.; methodology, T.K., Z.B. and A.V.; software, T.K., A.V. and J.Š.; validation, T.K., A.V. and Z.B.; investigation, T.K., Z.B., V.K. and R.R.; writing—original draft preparation, T.K., Z.B. and V.K.; writing—review and editing, Z.B., A.V., V.K., J.Š. and R.R.; visualization, T.K.; supervision, Z.B.; project administration, Z.B. and V.K. All authors have read and agreed to the published version of the manuscript.

**Funding:** This article has been elaborated under support of the project Research Centre of Advanced Mechatronic Systems, reg. No. CZ.02.1.01/0.0/0.0/16\_019/0000867 in the frame of the Operational Program Research, Development and Education.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data presented in this study are available on request from the corresponding author. The data are not publicly available due to funding project restrictions.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **Abbreviations**

The following abbreviations are used in this manuscript:


#### **References**


## *Article* **Design and Verification of Multi-Agent Systems with the Use of Bigraphs**

**Piotr Cybulski \* and Zbigniew Zieli ´nski**

Faculty of Cybernetics, Military University of Technology, ul. gen. S. Kaliskiego 2, 00-908 Warsaw, Poland; zbigniew.zielinski@wat.edu.pl

**\*** Correspondence: piotr.cybulski@wat.edu.pl

#### **Featured Application: Rapid development of behavior policies for agents in a controlled environment.**

**Abstract:** Widespread access to low-cost, high computing power allows for increased computerization of everyday life. However, high-performance computers alone cannot meet the demands of systems such as the Internet of Things or multi-agent robotic systems. For this reason, modern design methods are needed to develop new and extend existing projects. Because of high interest in this subject, many methodologies for designing the aforementioned systems have been developed. None of them, however, can be considered the default one to which others are compared to. Any useful methodology must provide some tools, versatility, and capability to verify its results. This paper presents an algorithm for verifying the correctness of multi-agent systems modeled as tracking bigraphical reactive systems and checking whether a behavior policy for the agents meets non-functional requirements. Memory complexity of methods used to construct behavior policies is also discussed, and a few ways to reduce it are proposed. Detailed examples of algorithm usage have been presented involving non-functional requirements regarding time and safety of behavior policy execution.

**Keywords:** multi-agent systems; bigraphs; design; verification; modeling; non-functional requirements

#### **1. Introduction**

With the increase of computational power and its availability comes the desire to incorporate it more into our daily life. Current ideas on how to do this include the Internet of Things, multi-agent systems (in which particular cases are swarms of robots), or smart objects and places (e.g., cities, homes, cars). All of them require new ways to design large-scale (i.e., consisting of a significant number of elements) software and physical systems that consider both how individual components interact and how a system as a whole works. There are various unresolved problems related to this. There is no consensus on what elements of the real world should be modeled and which of their capabilities should be taken into account in general. What is worse, among different design methods elements of the real world are used differently. Finally, the results of these methods are often incomparable, or at least, there is no common way to evaluate multi-agent system design methods. Regardless, any method for designing complex systems must offer a specific range of capabilities to be considered useful.

The concept of agent is applied to entities that have autonomy and are placed in a changing environment. Multi-agent systems [1,2] are structures within which agents can be identified. One of the advantages of designs using agents is that they can be represented at different levels of detail, from abstract entities (like mathematical structures) to actual robots. For this reason, among others, the concept of multi-agent system is used in various contexts. This term may be used to characterize a group of machine learning methods [3,4]. It can also be used to highlight attributes of certain models and simulation approaches [5–7].

**Citation:** Cybulski, P.; Zieli ´nski, Z. Design and Verification of Multi-Agent Systems with the Use of Bigraphs. *Appl. Sci.* **2021**, *11*, 8291. https://doi.org/10.3390/app11188291

Academic Editors: Paola Pellegrini, António Paulo Moreira, Pedro Neto and Félix Vidal

Received: 2 August 2021 Accepted: 2 September 2021 Published: 7 September 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

The term also refers to a subgroup of robotics solutions [8–11] that make use of widely understood autonomous robots to perform assigned tasks. In this work, we will focus on multi-agent robotic systems (MARS). The literature [12–16] is replete with examples of various applications of multi-agent robotic systems. There are also methodologies and tools [10,17] to design such systems. There is no consensus on how to design such systems in general and current solutions come from different areas of science. The most common paradigms used to design MARS include software design patterns [16], control theory [12,13], optimization theory or combinations of the above [15]. Some examples are utilizing mathematical logic in MARS design [18], but they are much less common. Due to the lack of agreement on how to design MARS and the fact that results produced by different methodologies are difficult to compare, we will try to evaluate them based on their capabilities. In this paper, we will be interested not so much in how to design MARS but rather how the following questions can be answered about an existing project:


Verifying the correctness of a model is the simplest and most solutions can be verified using the tools they were made with. Verifying whether a designed system accomplishes a given task is much more difficult. The vast majority of methodologies in the literature use simulation for this purpose. Exceptions can be found among models that highly formalize the internals of agents, how they operate, and the course of a task itself. Verification by simulation also gets complicated as the model becomes more abstract. The simplest designs in this regard are those based on methods commonly used in other areas of science (such as differential equations or graph theory) or made using tools integrated with a simulator. Verification of non-functional requirements is a difficult part of the design. Methodologies commonly found in the literature such as RE4Gaia [19], TROPOS [20], DIAMOND [21], or Adelfe [22] take into account non-functional requirements during design process. They usually aim to enable design of multi-agent systems in general (not just multi-agent robotic systems). Successive stages in most of these methodologies are not closely coupled together. By loosely coupled process, we understand a design process where a designer's interpretation of how the system works plays a significant role the whole time. In other words, one cannot treat the results of one stage as an input that the next stage will automatically transform into a form acceptable by yet another stage. When it comes to verification of system requirements, it should be noted that none of the above methodologies offer formal guarantees regarding the system's functionality as the methods dedicated to specific tasks do. An example of a such method can be found in [13] where a formal guarantee is given for robots to move keeping at least a specified distance from each other (an example of a non-functional requirement). In [12] a guarantee of fulfillment of functional requirements is presented where a task is guaranteed to be carried out if certain conditions are satisfied.

Using bigraphs [23] to design multi-agent systems is a relatively new approach to modeling this kind of system. The bigraph theory was published by Robin Milner in 2008 but has already been extended with a notion of overlapping locations [24] and probability [25]. Bigraphs are currently found useful in areas such as system of systems design [26], IoT [27], and wireless network modeling [28]. Currently, there are a few tools that support modeling systems with bigraphs, the most notable of them are Bigraphical Model Checker [29] (discontinued), Bigraph Framework for Java [30], and BigraphER [31]. The first two of them focus on checking the reachability of certain states of a system [29,31]. At the same time, the last one provides means to analyze various aspects of a modeled system (especially useful in this regard is underlying OCaml library *bigraph*). We believe that BigprahER [31] provides the most advanced set of utilities to model systems with bigraphs available at the moment. Multi-agent systems design methodologies [32,33] involving bigraphs are scarce, and most of them do not consider generating behavior policies based on a constructed model. As an exception to this, one may point out BigActor methodology described in [34] that uses bigraphs mixed with the notion of actors [35] or our methodology [36] based on bigraphs with tracking.

In [36] we have proposed a methodology based on bigraphs with tracking [23] that enables design of multi-agent systems. We have chosen tracking bigraphs primarily because they allow for analysis of objects' activities over time without introducing another layer of abstraction (as it was done, for example, in [34]). Our methodology is devoid of some of the drawbacks we mentioned earlier, such as loose coupling between design stages or the designer's interpretation of systems internals on all stages of the design process. Moreover, successive stages of the methodology are module-like which means their implementations can be adjusted to project needs. The methodology's main disadvantages are high computational complexity, limitation of system's agents to entities that can be fully controlled, and the fact that the operation of a designed system is determined before it is started. It also does not offer universal guarantees of task successful completion as presented in [12,13,18]. Putting our work in a broader context, we can place our methodology in a group of bottom-up [37] methods of MAS design with a note that it focuses on global goals rather than individual ones. In fact, agents in our approach do not have preferences that can affect their actions. A distinguishing feature of our proposition is the lack of abstractions outside the bigraphs framework, typically agents' internal mechanics are modeled with BDI (Belief, Desire, and Intention) [32,38] or actors [34].

This work is an extension of the methodology proposed in [36]. This paper aims to demonstrate how to verify the correctness of a design, check the fulfillment of nonfunctional requirements, and visualize behavior policies. We have developed an algorithm to automatically verify the correctness of a model and construct successive simulation states. We also described how to verify whether non-functional requirements are satisfied by a behavior policy for agents in the system. An example implementation [39] of the algorithm has been prepared. We also addressed the memory complexity of operations performed during behavior policy generation. We discussed how it influences the feasibility of projects and suggested a few ways to reduce the memory complexity. Finally, a tool [40] has been implemented that incorporates all of the mentioned memory complexity reduction strategies and a tool [41] to illustrate constructed behavior policies.

#### **2. Methods and Materials**

In this section, we will introduce all terms and definitions that are necessary to understand examples presented in Section 3. Section 2.1 is devoted to basic informal definitions that will be used throughout the rest of this article. Sections 2.2–2.4 aim to quickly acquaint the reader with the methodology described in detail in [36] and for that reason micro-examples are included at the end of each of these subsections. Section 2.5 is dedicated to an algorithm for verification and visualization of behavior policies. Since the algorithm is the key of this article, examples of its usage are presented in Section 3.

#### *2.1. Basic Concepts*

Before formal definitions, we will introduce the following concepts:

• Task—A collection of objects from the real world along with the actions they can perform, the initial state, and the target-desired (final) state(s). An example of a task might be:

"In an area that is a 3 × 3 grid, there are two robots in opposite (diagonally) cells. Each robot can move to vertically and horizontally adjacent cells and connect to a second robot if both are in the same cell. The goal of the task is for both robots to connect with each other."

• Mission—a realization of a task.

	- **–** Implementing a behavioral policy solves a given task;
	- **–** All agents start the mission at the same time;
	- **–** Agents can complete a mission at different points in time;
	- **–** All agent activities must be performed continuously (without time gaps);
		- **–** All agents that participate in a cooperative activity must start performing it at the same moment.

#### *2.2. Bigraphs*

Through this article we will extensively use bigraphs, *concrete bigraphs* to be precise. Concrete bigraphs allow identifying its nodes and edges with *support* (more about that later). In contrast, *abstract bigraphs* lack the mentioned identifiers. In the rest of this article, whenever we refer to a bigraph, we will have a concrete bigraph in mind. A bigraph consists of two graphs: a place graph and a link graph. Place graph is intended to model spatial relations between system elements. A link graph is a hypergraph that can be used to model interlinking between the elements.

Formally a bigraph is defined as:

$$B = (V\_{B\prime}E\_{B\prime}ctrl\_{B\prime} G\_{B\prime}^P G\_B^L) : I \to O^+$$


An example of graphical representation of a bigraph is presented in Figure 1.

**Figure 1.** An example of a bigraph and its constituents. The right part represents a place graph (the upper part of the figure) and a link graph (the lower part of the figure). They share a signature which defines control types (letters in nodes) and arity of each control (number of unique links that can be connected to a node with specified control). Ports and inner names can be attached to either edges or outer names, that is why there are only three edge identifiers in the link graph. On the left there is the bigraph made from the superposition of them both. (**a**) A bigraph. (**b**) A place graph and a link graph.

(**b**)

Reaction rules are used to model dynamics in bigraphical systems. In this paper, we will use (simplified) tracking reaction rules. Reaction rule consists of a pattern (redex) to be found in an input bigraph that shall be replaced with another bigraph (reactum).

Formally, a tracking reaction rule is a quadruple:

$$(B\_{redex} \colon m \to O, B\_{reactor} \colon m' \to O, \eta, \tau)$$

where:


*Bigraphical Reactive System* (BRS) is a tuple (B, R) where B denotes a set of bigraphs with empty inner face and R is a set of reaction rules defined over B. If R consists of rules with tracking then a pair (B, R) makes a *Tracking Bigraphical Reactive System* (TBRS).

Having a TBRS we can generate a Tracking Transition System (TTS). A *Tracking Transition System* is a 7-tuple: LT = (*Agt*, *Red*, *Lab*, *Apl*, *Par*, *Res*, *Tra*) where:

• *Agt*—a set of bigraphs;


As we said at the beginning of this section, we will use a simple example to illustrate how the formal definitions can be used in practice. The system for our example consists of two areas and two agents (we do not care whether they are humans, robots, or other autonomous entities). Areas will be denoted by controls *A* and *B* while agents will be represented with controls *U*. We assume that agents can move from an area of type *A* to an area of type *B* in two ways, which differ in execution speed. Thus Tracking Bigraphical Reactive System of the system above consists of three bigraphs and two reaction rules. The elements of B set are described in Table 1 and the reaction rules are defined in Table 2. The Tracking Transition System of this TBRS is defined in Table 3.

**Table 1.** Elements of the B set for the introductory example.


**Table 2.** Elements of the R set for the introductory example. The *η* function for the first rule and both *τ* functions are identities. The first rule represents an action that allows a single agent to move between areas. The second rule is for an action where two agents move both at once. The second rule is only reasonable if underlying mechanism differs to that of the first rule.


**Table 3.** The Tracking Transition System for the introductory example. Each row defines a single transition in the system.

#### *2.3. State Space*

Having a Tracking Transition System we can transform it into a state space of the modeled system. A state space can be later used to generate a behavior policy for agents (as defined in Section 2.1) in the system.

We assume the following about modeled systems:


A state space *SS* of a system consisting of *no* objects and *ns* states is defined as:

$$\mathcal{SS} = (\mathcal{S}, E, L, I, \mathcal{C}, T, M\_f)$$

where:


Going back to our introductory example, we will now convert the Tracking Transition System from Table 3 into a state space of the system. We will not define all of the formal elements and rather focus on the key ones. The S consists of three elements S = {0, 1, 2} that correspond to bigraphs s0,s1 and s2 respectively. The *L* consists of two elements that correspond to reaction rules in TBRS i.e., *L* = {*r*1,*r*2}. Knowing that there are only two agents in the system (so there are two objects in total) elements of the set *I* will be of the form *i*1, *x*,*i*2, *y*. The elements *i*1, *i*<sup>2</sup> of a tuple correspond to identifiers of objects (in this case *i*1, *i*<sup>2</sup> ∈ {1, 2}) and *x* and *y* elements indicate a moment of time at which each object is at. We will clarify how to utilize the *C* set in the next subsection. As it was mentioned earlier, the action represented by the *r*<sup>1</sup> reaction rule takes 2 units of time while the *r*<sup>2</sup> reaction takes only 1 unit of time. How these values are obtained depends on a project and may be subject to many factors such as resolution of time need to be considered (whether these are minutes, seconds or hours) or variability (or lack thereof) of time needed to execute actions represented by reaction rules. Knowing this, the elements of the *T* set are listed in Table 4. Subsequent elements of this set correspond to transitions in TTS. The permutation being a result of application of a transition function corresponds to permutation of vertices corresponding to objects in *res* function. It is also worth noting that *f*<sup>3</sup> function requires both agents to be at the same time (variable *z*) in order to return something other than 0.

**Table 4.** Mission progress function definitions for the state space presented in Figure 2. The action represented by *r*<sup>1</sup> reaction rule is assumed to take 2 units of time while the action *r*<sup>2</sup> takes only 1 unit of time.


**Figure 2.** The state space generated from Tracking Transition System defined in Table 3. Mission progress functions definitions are defined in Table 4.

#### *2.4. Behavior Policy*

We define a behavior policy as a schedule of actions for each object from the beginning of a mission to its end that meets all the requirements listed in Section 2.1.

Having a state space, we can view a behavior policy as a walk (in graph theory sense) indicating what changes (and who did them) are required in order to reach a desired state.

To construct a proper policy behavior based on a state space, we need to define the following elements. Please note that by series we will understand a finite sum of elements.


$$\begin{array}{cccc} \bullet & \mathsf{M}\_{F}^{t} = \begin{bmatrix} F\_{0,0}(\mathbf{x},t) & \cdots & F\_{0,n\_{s}-1}(\mathbf{x},t) \\ \cdots & \cdots & \cdots \\ F\_{n\_{s},0}(\mathbf{x},t) & \cdots & F\_{n\_{s}-1,n\_{s}-1}(\mathbf{x},t) \end{bmatrix} \text{--a matrix of transitions between states.} \end{array}$$

Furthermore, we define two operations:


$$K\_s^{t+1} = \sum\_{k=0}^{n\_s - 1} K\_k^t \circ F\_{k,s}(x, t)$$

In order to generate all walks consisting of a specified number of steps from an initial state to a final state one must define the initial state, as a M<sup>0</sup> *<sup>K</sup>* matrix and multiply subsequent results by M*<sup>i</sup> <sup>F</sup>* the specified number of times. The result will be a <sup>M</sup>*<sup>x</sup> <sup>K</sup>* matrix, whose summands in the *i*th column will indicate all possible walks with *x* steps that end in the *ith* state of the state space. If the element in the specified column is equal to 0, it means there is no such walk.

Summarizing our introductory example, we will demonstrate how to use the state space from Figure 2 with transition functions definitions listed in Table 4 to determine all sequences of actions that lead to the state denoted as s2. Each sequence is equivalent to behavior policy that, when applied, results in moving both agents to the area of type *B*.

To determine such sequences, we create two matrices, a matrix of transitions M*<sup>t</sup> <sup>F</sup>* and matrix of initial state M<sup>0</sup> *<sup>K</sup>*. Having both of them, we can multiply subsequent <sup>M</sup>*<sup>t</sup> <sup>K</sup>* matrices by corresponding M*<sup>t</sup> <sup>F</sup>* matrices and check whether the third state (recall that numbering starts from 0) is reachable. By reachable, we understand having a value other than 0 in the specified column of the M*<sup>t</sup> <sup>K</sup>* matrix.

Definitions of both matrices are listed below:

$$\mathbb{M}\_F^t = \begin{bmatrix} f\_{mll} & f\_1 + f\_2 & f\_3 \\ f\_{mll} & f\_{mll} & f\_4 \\ f\_{mll} & f\_{mll} & f\_{mll} \end{bmatrix}$$

$$\mathbb{M}\_K^0 = \begin{bmatrix} [\langle (1,0), (2,0) \rangle, \mathbb{Q} ] & 0 & 0 \end{bmatrix}$$

The (1, 0),(2, 0) tuple in the first column of <sup>M</sup><sup>0</sup> *<sup>K</sup>* matrix denotes that we have two objects. The zeros in both tuples indicate that each object starts the mission at the same moment.

Subsequent M*<sup>t</sup> <sup>K</sup>* matrices allow us to determine how a system changes when a specified number of actions occur. For example, M<sup>1</sup> *<sup>K</sup>* gives us information about how the system evolves when one action occurs (analogously M<sup>2</sup> *<sup>K</sup>* for two actions etc.).

In our example M<sup>1</sup> *<sup>K</sup>* and <sup>M</sup><sup>2</sup> *<sup>K</sup>* are of the form:

$$\begin{split} \mathbb{M}^1\_K &= \mathbb{M}^0\_K \cdot \mathbb{M}^0\_F = \begin{bmatrix} [\langle (1,0), (2,0) \rangle, \mathcal{Q}] & 0 & 0 \end{bmatrix} \cdot \begin{bmatrix} f\_{\text{null}}(c,0) & f\_1(c,0) + f\_2(c,0) & f\_3(c,0) \\ f\_{\text{null}}(c,0) & f\_{\text{null}}(c,0) & f\_4(c,0) \\ f\_{\text{null}}(c,0) & f\_{\text{null}}(c,0) & f\_{\text{null}}(c,0) \end{bmatrix} \\\\ \mathbb{M}^1\_K &= \begin{bmatrix} 0 & [\langle (2,0), (1,2) \rangle, \{r1\} \} + [\langle (1,0), (2,2) \rangle, \{r1\} \} & [\langle (1,1), (2,1) \rangle, \{r2\} \}] \end{bmatrix} \\\\ \mathbb{M}^1\_K &= \mathbb{M}^1\_K \cdot \mathbb{M}^1\_F = \mathbb{M}^1\_K \cdot \begin{bmatrix} f\_{\text{null}}(c,1) & f\_1(c,1) + f\_2(c,1) & f\_3(c,1) \\ f\_{\text{null}}(c,1) & f\_{\text{null}}(c,1) & f\_4(c,1) \\ f\_{\text{null}}(c,1) & f\_{\text{null}}(c,1) & f\_{\text{null}}(c,1) \end{bmatrix} \\\\ \mathbb{M}^2\_K &= \begin{bmatrix} 0 & 0 & [\langle (1,2), (2,2) \rangle, \{r1, r1\} \} + [\langle (2,2), (1,2) \rangle, \{r1\_1, r1\_2\} \rangle] \end{bmatrix} \end{split}$$

The interpretation of each of the above M*<sup>t</sup> <sup>K</sup>* matrices is as follows. The <sup>M</sup><sup>1</sup> *<sup>K</sup>* matrix indicates that with just one action there are two ways for the system to be in the state where one of the agents move to the area of type *B* and the other one will not take any action (as it is pointed out by the fact that its time is equal to 0). Both ways require specified agent to carry out the action represented by the *r*1 rule. The same matrix also gives us information that with one action there is a possibility to reach s2 state if both agents engage in cooperative execution of *r*2 rule. Finally, the M<sup>2</sup> *<sup>K</sup>* points out two walks in the state space that lead to the s2 state. Both involve performing the action associated with *r*1 rule two times (each time by a different agent).

It is worth pointing out that in a software implementation of the above algorithm labels should denote specific transition functions rather than reaction rules. While for this particular example it was sufficient to indicate what "kind" of changes (i.e., reaction rules) need to occur in the system for automated generation of behavior policies it is necessary to distinguish exactly what transformation (including who participated in a specific transformation) is required.

For more detailed examples we refer to [36].

#### *2.5. Verification and Visualization of Behavior Policies*

Below we will describe the algorithm to verify and illustrate the behavior policy. It consists of 4 phases. At the beginning of the discussion about each phase formal elements not introduced so far will be defined. Subsequent phases will be discussed so that newly introduced definitions will be directly used in the discussed phase. A diagram of relationships between phases is presented in Figure 3, from which it can be seen that the implementation of all the other phases is necessary for the execution of Phase 1. In contrast, Phases 4 and 2 are independent of the others.

**Figure 3.** Diagram of relationships between phases of the algorithm. The direction of an arrow indicates the phase required by the phase from which the arrow emerges.

2.5.1. Phase 4—Applying a Single Transformation to Constructed State and Checking Correctness Beforehand

Phase 4 of the algorithm is responsible for verifying the correctness of the model and for expanding the scenario's state at a particular point in time. Input:


Output:

	- Newly constructed state—bigraph;
	- Mapping of unique identifiers to vertices of the newly constructed state;
	- First new unique identifier.
	- Information about the failed transformation. Whether the given reaction rule could not be applied to the state at the previous point in time or to the currently constructed state (given the mappings of unique identifiers to vertices).

Formal definitions:


The flowchart of the Phase 4 algorithm is shown in Scheme 1. The input arguments of this algorithm and its results are described in Tables 5 and 6 respectively.

**Table 5.** Input data for the Phase 4 algorithm.



**Table 6.** Output data of the Phase 4 algorithm.

**Scheme 1.** Flowchart of the Phase 4 algorithm. The purpose of this phase is to check if a reaction rule extended by a map of unique identifiers to its vertices can be applied to the scenario state for the previous moment in time and the currently constructed one. If it is impossible to perform either of the mentioned operations it means that the model is incorrectly constructed. If both operations are feasible, the currently constructed state is modified based on the given reaction rule and the map of unique identifiers to its vertices.

2.5.2. Phase 3—Constructing Scenario State at a Given Moment of Time

Phase 3 of the algorithm is responsible for constructing the state of the scenario at a given point in time.


Output:

	- A subset of the walk elements (given as input) that have not been used to construct the state at the given point in time;
	- State at the given moment in time;
	- Mapping of unique identifiers to state elements at the given point in time;
	- State-at-time configuration at the set point in time.
	- A currently constructed state with its UIs mapping that could not be transformed (if it is the cause of the Phase 4 error);
	- The state from the previous moment in time with its UIs mapping that could not be transformed (if it is the cause of the Phase 4 error);
	- Reaction rule with UIs mapping to its redex vertices, which was not successfully applied.

Formal definitions:

	- 1. A positional number;
	- 2. A transition function;
	- 3. A map of UIs to redex vertices. The redex is associated with the reaction rule corresponding to the above transition function;
	- 4. A map of UIs to vertices of the output state of the extended walk element;
	- 5. First new UI assigned to a new task element created by applying the reaction rule (useful only if the reaction rule corresponding to the transition function creates new environment elements);
	- 6. A set of object identifiers involved in the walk element along with the duration of that transformation. In other words, it is information about which objects are involved in the transformation represented by the walk element and how long it will take.

$$\begin{aligned} \forall \mathbf{e}\_1 &= (l\_{1\prime} f\_1, m\_{1,r\prime} m\_{1,\text{in}}, n\_1, (A\_1, d\_1)), \mathbf{e}\_2 = (l\_{2\prime} f\_2, m\_{2,r\prime} m\_{2,\text{in}}, n\_2, (A\_2, d\_2)) \in \mathcal{W}\_M\\ &\mathbf{e}\_1 <\_{\mathcal{W}\_M} \mathbf{e}\_2 \leftrightarrow l\_1 < l\_2 \end{aligned}$$


• *Corr*<sup>R</sup> : *Tra* → R—a function assigning reaction rules to transitions from TTS.

The flowchart of Phase 3 of the algorithm is shown in Scheme 2. The input arguments for the algorithm are described in Table 7. The auxiliary variables, some of which are also outcomes of Phase 3, are described in Table 8. The outcome of Phase 3 is described in Table 9.

**Table 7.** Input data for the Phase 3 algorithm.


**Table 8.** Auxiliary variables of Phase 3 algorithm.


**Table 9.** Output data of Phase 3 algorithm.


Noteworthy are the conditions checked in the subsequent steps of Phase 3 of the algorithm. Comments for each of them are given below.


**Scheme 2.** Flowchart of the Phase 3 algorithm. The goal of this phase is to construct the state of a scenario at a given point in time. This phase runs in a loop until there are no available walk elements or when an execution of Phase 4 ends with an error. It takes subsequent elements of the input walk and updates both the current SAT configuration and a scenario state. If the mission objects will not have finished the activity represented by the currently processed walk element before or at the specified moment of time then the SAT configuration and state updates are not performed. The same thing happens if an activity involves objects participating in other activities that would end in a future and that have already been skipped.

2.5.3. Phase 2—Extending a Previously Constructed Walk

Phase 2 of the algorithm is responsible for extending a walk to the form acceptable by Phase 3.

Input:


Output:

• Extended walk.

Formal definitions:

	- As in the case of the set *WM*, we define the order relation by the following rule:

$$\forall \varepsilon\_1 = (n\_1, f\_1), \varepsilon\_2 = (n\_2, f\_2) \in \mathcal{W} \quad \varepsilon\_1 <\_{\mathcal{W}} \varepsilon\_2 \leftrightarrow n\_1 < n\_2$$


The flowchart of the Phase 2 algorithm is shown in Scheme 3. Input arguments are described in Table 10; auxiliary variables and the result of this phase are discussed in Table 11.

**Table 10.** Input data for the Phase 2 algorithm.


**Table 11.** Auxiliary variables of the Phase 2 algorithm.


**Scheme 3.** Flowchart of the Phase 2 algorithm. The goal of Phase 2 is to expand each element of a provided walk to the form acceptable by Phase 3. Each element of the walk is coupled with the duration of its corresponding activity along with the identifiers of the objects (not unique identifiers of task elements) that participate in the activity and two bijections. The first function maps unique identifiers to vertices of the redex of the reaction rule associated with the currently processed walk element. With this function, we know exactly who is participating in the activity. The second function maps unique identifiers to the output state of a processed TTS transition (derived from the walk element). With this function, we know exactly which task element corresponds to which vertex after applying the reaction rule. The second function is used in the next iteration of Phase 2.

2.5.4. Phase 1—Constructing All Scenario States and Checking the Correctness of a Given Walk

Phase 1 of the algorithm is its entry point. It is responsible for verifying a model and constructing scenario states at successive moments in time. Input:


Output:

	- A set of scenario states at consecutive moments in time with corresponding mappings of unique identifiers to the vertices of these states and SAT configurations;
	- The moment of time for which the scenario state could not be generated;
	- The element that could not be transformed (constructed state or state at some point in time);
	- The reaction rule corresponding to the unsuccessful transformation;
	- The UIs mapping of the element that could not be transformed and the redex of the above reaction rule.

Phase 1 input parameters are described in Table 12. The auxiliary variables along with the outcome are discussed in Table 13. The flowchart of the Phase 1 algorithm is shown in the Scheme 4.

**Table 12.** Input data for the Phase 1 algorithm.


**Table 13.** Auxiliary variables for the Phase 1 algorithm.


**Scheme 4.** A flowchart of the Phase 1 algorithm. The goal of Phase 1 is to verify a model and construct the subsequent states of a scenario using a provided walk. In the first step the walk is extended to the form acceptable by Phase 3. Then the model verification and construction of successive scenario states is performed in a loop. The loop is executed until Phase 3 ends with either an error or when there are no more elements of the walk to further construct states of the scenario from.

#### **3. Results**

This section will provide example use cases of the algorithm discussed in the previous section. The first two examples show in detail how the algorithm detects errors in a model and how it constructs successive scenario states. The next examples present how to check the fulfillment of non-functional requirements for systems designed with our methodology. Finally, the problem of memory complexity of convolution operation performed during a construction of walks in a state space is discussed. We also provide a few propositions how to address this issue.

#### *3.1. Model Verification Example*

#### 3.1.1. Introduction

The first example will demonstrate how the algorithm can detect that a system is incorrectly designed.

A task (as defined in Section 2.1) for this example consists of six elements, two actions that can be performed, and one goal. The task elements comprise three areas with two robots and an object to be carried between the areas. The goal of the task is for the robots (denoted by vertices with the control *B*) to move the object (denoted by a vertex with the control *O*) from the area *AT1* to the area *AT3*. The initial state of this system is shown in Figure 4. We will use two reaction rules to generate a tracking bigraphical reactive system: *mov1* and *mov2* depicted in Figure 5a,b, respectively.

**Figure 4.** The initial state of a system in the example of verifying model correctness.

**Figure 5.** Reaction rules for the example of verifying a model. All residue functions are identities. (**a**) Reaction rule *mov*1. (**b**) Reaction rule *mov*2.

The elements of a tracking transition system for this example are shown in Table 14. If we categorize the task elements as presented in Table 15 then we can transform the TTS from Table 14 into the state space as in Figure 6. However, this will not be a valid state space because no time is taken into account for the object being moved (i.e., it is not treated as a passive or active object as defined in Section 2.1).

#### 3.1.2. Using the Algorithm for Model Verification

Walk *S*<sup>0</sup> *<sup>f</sup>*<sup>1</sup> −→ *<sup>S</sup>*<sup>1</sup> *<sup>f</sup>*<sup>2</sup> −→ *<sup>S</sup>*<sup>2</sup> can be represented as *<sup>W</sup>* <sup>=</sup> {(0, *<sup>f</sup>*1),(1, *<sup>f</sup>*2)}. Assuming that both actions associated to the reaction rules take 1 unit of time to complete, in Phase 2 both elements of set *W* will be transformed to form:

1. (0, *f*1, {(0, 0),(1, 1),(2, 2),(3, 3)}, {(0, 0),(1, 1),(2, 2),(3, 3),(4, 4),(5, 5)}, 6,({1}, 1)) 2. (1, *f*2, {(2, 0),(3, 1),(4, 2),(5, 3)}, {(0, 0),(1, 1),(2, 3),(3, 2),(4, 4),(5, 5)}, 6,({2}, 1))

The method of constructing *mr* and *m f ull* functions that result from *Trans* function in Phase 2 is shown below.

The rule of constructing *mr* function:

$$\forall \mathbf{x} \in \{0, \dots, n\_x - - - 1\} \quad m\_r(\mathbf{x}) = par^{-1}(m\_{full}(\mathbf{x})),$$

where *par*−<sup>1</sup> is the inverse function to *par* being an element of *ttra*. In this case, the functions *f*1, *f*2,..., *f*<sup>8</sup> correspond to the subsequent rows in Table 14.

The rule of constructing *m f ull* function:

$$\forall \mathbf{x} \in \{0, \dots, n\_{\mathbf{x}} - - - 1\} \quad m'\_{full}(\mathbf{x}) = r \mathbf{res}^{-1}(m\_{full}(\mathbf{x})) $$

*res*−<sup>1</sup> is the inverse function of *res* which is an element of *ttra*.

Table 16 lists the successive steps of the algorithm that will lead to a detection of an error in the model. The reason why this model is incorrect is not because the redex of the rule *mov2* is not in the *0* state but because the moved object is categorized as an element of the environment, thus we do not take into account the passage of time for it. As a result, the reaction rules create the appearance of being independent of each other when in fact the execution of *mov2* rule is dependent on the execution of the rule *mov1*. To fix the model, the relocated object needs to be categorized as a passive object and one need to add a reaction rule allowing a robot that is in *AT3* area to wait until the object being moved is in *AT2* area.

**Table 14.** Tracking transition system for the first example.


**Table 15.** Categorization of task elements for the first example. Note that this produces an incorrect model because the moved object is considered an environment element.


**Figure 6.** Incorrect state space for the task from the first example.


**Table 16.** Subsequent steps of the algorithm in the model validation example.

#### *3.2. Example of Scenario States Visualization*

#### 3.2.1. Introduction

The second example will demonstrate the problem of visualizing a scenario and how our algorithm can help in solving it. A task for this example is composed of three areas and two robots of the same type. The initial state of the system is presented in Figure 7. The tracking bigraphical reactive system for the purpose of this example consists of two reaction rules, *r1* and *r2*, shown in Figure 8a,b, respectively. The goal of the task is to move the two robots from the area *AT1* to the area *AT3*.

**Figure 7.** The initial state of a system for the scenario visualization example.

**Figure 8.** Reaction rules for the example of scenario visualization. (**a**) Reaction rule *r*1. *τ* = {(0, 0),(1, 2),(2, 1)}. (**b**) Reaction rule *r*2.*τ* = {(0, 0),(1, 1),(2, 2)}.

Tracking Transition System generated from this TBRS is defined in Table 17. The tracking Transition System from Table 17 can be transformed into a state space as in Figure 9. Now, suppose that a walk chosen for the behavior policy is of the form:

$$S\_0 \xrightarrow{f\_1} S\_1 \xrightarrow{f\_3} S\_2 \xrightarrow{f\_5} S\_4 \xrightarrow{f\_8} S\_5$$

The above walk never "passes" through a state where both robots are in *AT2* area. (that is, through the state *S*3). Such a situation must occur for the following reasons. For a walk representing four activities (because it consists of four arcs), that can correspond only to reaction rules from Figure 8 a course of a mission for each robot must take the form of moving from an AT1 area to an AT2 area and then from AT2 to AT3 area. Since the activities represented by the reaction rules are not cooperative (each of the reaction rules involve only one agent) the movements will be performed in parallel. We also know that the time required to perform both activities will be the same for both agents (because agents are of the same type and perform the same type of activity) so the successive movements will end at the same moment. Because of all that, during a mission there must occur a situation where both robots are at an AT2 area at the same time. Therefore, the algorithm for constructing subsequent scenario states must be able to construct states that are not "on" a provided walk.


**Table 17.** Tracking Transition System for the second example.

**Figure 9.** The state space generated from Tracking Transition System from Table 17.

3.2.2. Using the Algorithm to Construct Scenario States

The walk *S*<sup>0</sup> *<sup>f</sup>*<sup>1</sup> −→ *<sup>S</sup>*<sup>1</sup> *<sup>f</sup>*<sup>3</sup> −→ *<sup>S</sup>*<sup>2</sup> *<sup>f</sup>*<sup>5</sup> −→ *<sup>S</sup>*<sup>4</sup> *<sup>f</sup>*<sup>8</sup> −→ *<sup>S</sup>*<sup>5</sup> can be presented as:

*W* = {(0, *f*1),(1, *f*3),(2, *f*5),(3, *f*8)}

A linear order relation on the set *W* has the form:

$$<\_W = \left\{ \begin{array}{l} ((0, f\_1), (1, f\_3)), ((0, f\_1), (2, f\_5)), ((0, f\_1), (3, f\_8)), \\ & ((1, f\_3), (2, f\_5)), ((1, f\_3), (3, f\_8)), \\ & & ((2, f\_5), (3, f\_8)) \end{array} \right\}$$

Assuming that execution of each reaction rules takes one unit of time, in Phase 2 the consecutive elements of set *W* will be transformed to the following form:


Knowing the above, we can define an extended walk.

$$\mathcal{W}\_{\mathcal{M}} = \{e\_1, e\_2, e\_3, e\_4\}$$

The linear order relation remains unchanged between elements, i.e,:

<*WM* = {(*e*1,*e*2),(*e*1,*e*3),(*e*1,*e*4),(*e*2,*e*3),(*e*2,*e*4),(*e*3,*e*4)}

Steps of the algorithm to construct the subsequent scenario states are presented in Table 18.

**Table 18.** Successive steps of the algorithm in the example of visualizing a scenario.



**Table 18.** *Cont.*


**Table 18.** *Cont.*


**Table 18.** *Cont.*

The result of the algorithm contains a state where both robots are in *AT2* area despite the fact that the walk has not passed through such state. A Gantt diagram for this scenario is shown in Figure 10. Both robots are performing actions *r1* and *r2* in parallel.

**Figure 10.** A Gantt diagram for the scenario from the second example. Activities marked as *t* in the row preceded by *Ox* denote involvement of the element *x* (*x* is the unique identifier of a task element given at its first appearance or at the beginning of a scenario) during the activity *t*. Only elements that are active objects are included in the diagram.

Functions tupled with each state allow to "track" task elements between states. For example, the function *ms* = (0, 0),(1, 1),(2, 2),(3, 3),(4, 4) for the state at time 0 indicates that the object tagged with the unique identifier 2 (the argument of *ms* function) is represented by the vertex with identifier 2 (the value of *ms* function for argument 2). The support of a bigraph itself does not track its elements between transitions, as can be seen by comparing the state of the system at time 0 and time 1. For example, knowing that there is one area of each type, we have no doubt that a vertex with the control *AT1* represents the same object in both states even though the support element assigned to each vertex is different between states. However, we do not have such certainty for vertices with controls

of the type *B*. Unique identifiers point to unique objects between states, even if those objects have changed the controls representing them.

Here is an example based on the elements of set *Sr* from Table 18 how to use a unique identifier mapping. For the state at time 1, the UI with the value of 3 points to the vertex with identifier 1. This means that it is the same task element that in the state at time 0 is represented by the vertex with identifier value of 3 and the same element that at time 2 is represented by the vertex with support 0.

#### *3.3. Example of Verifying the Fulfillment of Non-Functional Requirements*

The last example is intended to demonstrate how non-functional requirements can be defined for systems designed using our methodology and determine whether these requirements have been satisfied.

For this example, we will define a task of relocating items in a warehouse. The goal of this task is for two robots to deploy items of different types from the warehouse to unloading areas. The initial state of the task is depicted in Figure 11. The interpretation of each control is shown in Table 19. Six reaction rules are defined for this system; all of them are listed and described in Table 20. For this example, the graphical representation of reaction rules is omitted because it will not be relevant.

**Figure 11.** The initial state of a system in the example of checking whether non-functional requirements are met.

**Table 19.** Interpretation of controls in the example of checking whether non-functional requirements are satisfied.



**Table 20.** System reaction rules for the example of checking whether non-functional requirements are satisfied. A value in the third column is the amount of time required to execute a rule.

The state space for the system consists of 666 states (vertices) and 5325 transitions (arcs). Due to the size of this example, the graphical representation of the state space and elements of the tracking transition system will not be presented. It is worth discussing here the increase in the size of a state space as the number of system elements increases. If one were to expand the current system to three robots, two type 1 objects, and three type 2 objects, the number of states increases to 5765 and the number of transitions to 70,701. Such a significant increase in the size of a system suggests that it is reasonable to consider ways of limited construction of a state space that will remain useful in later stages of the development of behavior policies.

Moving on to behavior policies for the agents in the task above. First walks solving the task are 15 steps long. However, these solutions are using only one robot, as can be observed in the action schedule presented in Figure 12. A mission performed using behavior policy based on such a walk takes 21 units of time.


**Figure 12.** Schedule of actions for a scenario based on a walk of the length of 15 arcs.

3.3.1. Non-Functional Requirement—Length of a Mission

Now let us assume that one of the non-functional requirements imposed on the task is to limit the length of a mission to the maximum of 20 units of time. There is no walk of the length 15 that satisfies this requirement. Knowing that the current solutions use only one robot we can try to improve them by extending the walks to 18 steps. This way the second robot can move one of the objects to an unloading area. A schedule of actions constructed with a walk of 18 steps is presented in Figure 13.


**Figure 13.** A schedule of actions for a scenario created with a walk of the length of 18 arcs.

It is important to note that simply lengthening a walk does not guarantee an improved result. For example, if the walk underlying the schedule in Figure 12 is lengthened by three transition functions, all corresponding to the reaction rule *stay*, it will not yield any improvement in the quality of a solution.

Checking whether non-functional requirements are fulfilled should be done in Phase 1 after Phase 3 has been successfully completed. This step is not shown on Scheme 4 but this is how it was implemented in [39].

#### 3.3.2. Non-Functional Requirement—Collision Avoidance

Another example of a non-functional requirement will be related to safety of mission execution. This time we impose a requirement that there should be no collisions between robots that are in the process of moving objects.

One of the advantages of using bigraphs is that they allow one to define patterns to be found in other bigraphs. These patterns are of "minimal satisfying phenomenon" type. One cannot define an "all but" type pattern in bigraph notation. In other words, you can define a pattern like "minimum three people in a room" but you cannot define a (single) pattern that detects "less than three people in a room".

Let us assume that a collision-free mission will be guaranteed if the robots moving the objects are not in the same area. Such a requirement can be defined as "if two robots, at least one of which is moving an object, are in the same area then the scenario is unacceptable". Bigraph patterns able to detect such a situation are shown in Figure 14.

Identical to the previous non-functional requirement, this requirement can be verified in Phase 1 after a successful completion of the Phase 3.

**Figure 14.** Bigraph patterns to detect whether a collision between robots may occur during a scenario. The two patterns differ only in type of the relocated object. (**a**) The first pattern. (**b**) The second pattern.

#### *3.4. Memory Complexity*

As we have already mentioned, the size of a system grows much faster than the number of task elements. The same is true for the memory complexity of matrix multiplication operations described in Section 2.4. We have tested how limiting the number of results of convolution operation affects memory usage of the tool [40]. All measurements were done using multi-threaded F# implementation on a PC with 64 bit Ubuntu 20.04 operating system installed and the previous example regarding non-functional requirements was used for testing. We carried out three different tests reducing the number of results to 500, 10,000, and leaving the number of results unlimited. In the first case, the peak memory usage was about 700 MB before walks of the length of 15 arcs were found. The second case resulted in memory consumption around 15 GB before similar walks were found. The last case did not succeed on a machine with 64GB of RAM.

To deal with the memory complexity, we propose three methods to reduce the number of results:


In the case of *best N* method, there is a need is to define an evaluation function for partial solutions. We propose a SAT configuration evaluation function based on the involvement of task objects. The evaluation function returns a higher score the more objects are involved equally. The formula for calculating the evaluation function value can be expressed as below:

$$E(i) = m = 0, \forall\_{(\circ\_{id}, t) \in i} m = m + \frac{t}{t\_{\max}} \qquad i \in I$$

*tmax* −−− The largest engagement of any object.

Table 21 shows the values of the proposed evaluation function for a few example SAT configurations.


**Table 21.** Partial solution evaluation function values for random SAT configurations.

The prepared tool [40] for walk construction offers six strategies for finding solutions:


We summarized all of the above strategies in Table 22.


**Table 22.** Summary of the proposed strategies for finding walks in a state space. The second column denoted as MNoR stands for Maximum Number of Results. The value of *N* is equal to a number of results of the same length. The value of *L* is equal to the maximum length of a result.

#### **4. Discussion**

In this paper, we presented an algorithm to verify multi-agent system models based on tracking bigraphical reactive systems. Our algorithm can detect incorrectness of a model and unfulfillment of non-functional requirements. The algorithm considers a model to be incorrect if activities planned to be executed in parallel are not independent of each other. In this article, we presented two examples of utilizing the algorithm to check if a behavior policy meets non-functional requirements regarding time and safety of task execution. We also demonstrated how to generate successive states of a scenario, which is a task realization using a selected behavior policy, based on the the behavior policy. Finally, we discussed memory complexity of operations essential to behavior policies generation and proposed a few ways to reduce it. One of the suggested methods is to limit results to a certain number of the best ones. We gave an example of an evaluation function that allows ranking partial results (in our case, these are behavior policies that when executed do not meet functional requirements). The evaluation function is applicable to tasks of any kind and size.

This work complements our previous publication, which focused solely on designing multi-agent systems with tracking bigraphs. The methodology enables the design of a broad range of systems from warehouse robots to drone swarms performing a task without human intervention. One can also consider designing software systems where programs act as agents and operations performed by these programs could represent transition functions. The functional programming paradigm intuitively fits this kind of design.

The main drawback of our methodology is the lack of adaptability of behavior policies. This means there can be no deviation from scheduled actions when executing a behavior policy. It also means that agents in a modeled system have to be fully controllable in the real world. The biggest drawback of the algorithm presented in this article is that it verifies the correctness of a model looking for errors in a single behavior policy. Thus, the more behavior policies that are checked, the more confident we are that the model is correct.

As for directions of further development, the primary goal should be to improve the generation speed of tracking reactive systems as it is the main limitation of the methodology right now. One way to achieve it is to develop a method of partial construction of a tracking bigraphical reactive system that consists of bigraphs necessary to manufacture a good quality walk in state space. If the method of reducing the number of states is automatic, i.e., it will not require the designer to specify bigraphical patterns, it is going to significantly speed up the development of behavior policies. Right now our method can only be applied to relatively small systems because the explosion of states makes it impossible to efficiently search for walks in the state space of a modeled system.

**Author Contributions:** Conceptualization, P.C. and Z.Z.; methodology, P.C. and Z.Z.; software, P.C.; validation, P.C.; formal analysis, P.C.; investigation, P.C.; resources, P.C.; data curation, P.C.; writing original draft preparation, P.C.; writing—review and editing, Z.Z.; visualization, P.C.; supervision, Z.Z.; project administration, Z.Z. Both authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Data sharing not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Review* **UX in AR-Supported Industrial Human–Robot Collaborative Tasks: A Systematic Review**

**Riccardo Karim Khamaisi 1, Elisa Prati 1, Margherita Peruzzini 1,\*, Roberto Raffaeli <sup>2</sup> and Marcello Pellicciari <sup>2</sup>**


**\*** Correspondence: margherita.peruzzini@unimore.it

**Abstract:** The fourth industrial revolution is promoting the Operator 4.0 paradigm, originating from a renovated attention towards human factors, growingly involved in the design of modern, human-centered processes. New technologies, such as augmented reality or collaborative robotics are thus increasingly studied and progressively applied to solve the modern operators' needs. Humancentered design approaches can help to identify user's needs and functional requirements, solving usability issues, or reducing cognitive or physical stress. The paper reviews the recent literature on augmented reality-supported collaborative robotics from a human-centered perspective. To this end, the study analyzed 21 papers selected after a quality assessment procedure and remarks the poor adoption of user-centered approaches and methodologies to drive the development of humancentered augmented reality applications to promote an efficient collaboration between humans and robots. To remedy this deficiency, the paper ultimately proposes a structured framework driven by User eXperience approaches to design augmented reality interfaces by encompassing previous research works. Future developments are discussed, stimulating fruitful reflections and a decisive standardization process.

**Keywords:** User eXperience; human–robot interaction; human–robot collaboration; human-centered design; augmented reality; human factors

#### **1. Introduction**

The creation of intelligent, assisted, and automated machines is characterizing the modern factory aiming at two main aspects: a more conscious distribution of roles between machines and humans, and a more flexible process control to achieve an efficient and optimized production. In this context, high standards of quality, production flexibility, and innovation push towards human-centered design (HCD) approaches, focused on the centrality of the human factors (HF). HF refers to environmental, organizational, and job-related aspects, as well as human individual characteristics, which can highly affect health and safety during the interaction with current technologies. Introducing HF in the design process is the scope of HCD, which is defined as "*an approach to systems design and development that aims to make interactive systems more usable by focusing on the use of the system and applying human factors/ergonomics and usability knowledge and techniques*" [1]. Today, HCD can be generically used for any type of applications to guarantee the satisfaction of user needs and the coherence with the ergonomics principles while designing any type of human–system interaction. HCD enables new ways to define requirements and recommendations to properly design complex systems according to a user-oriented approach. The final goal is to guarantee a valuable User eXperience (UX), which involves "*the user's perceptions and responses that result from the use and/or anticipated use of a system product or service*" [1], including usability in terms of "*the achievement of specified goals with effectiveness,*

**Citation:** Khamaisi, R.K.; Prati, E.; Peruzzini, M.; Raffaeli, R.; Pellicciari, M. UX in AR-Supported Industrial Human–Robot Collaborative Tasks: A Systematic Review. *Appl. Sci.* **2021**, *11*, 10448. https://doi.org/10.3390/ app112110448

Academic Editors: António Paulo Moreira, Pedro Neto and Félix Vidal

Received: 12 October 2021 Accepted: 5 November 2021 Published: 7 November 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

*efficiency and satisfaction in a specified context of use*", but also considering users' emotions and affections [2].

The current frameworks related to the application of HF and HCD in system design need to be further developed with the advent of Operator 4.0 (O4.0) concept, framing a smart and skilled operator performing highly specialized tasks aided by emerging technologies as and if needed [3], in order to reshape the industrial tasks based on the human-machine partnership and to renovate the industrial systems according to Industry 4.0 paradigm. Indeed, the O4.0 idea is introducing new assistive technologies, such as augmented reality (AR), virtual reality (VR), or mixed reality (MR) in modern industries, making them enabling technologies for the design and development of an effective human– machine cooperation. However, to achieve such challenging objectives, technologies must be centered on the figure of the modern Operator 4.0 according to new framework, able to focus on the interface design for collaborative tasks, involving humans and robots. Primarily, a precise distinction among such technologies can be summarized as follows:


Secondly, attention has to be paid to the technological development of modern companies, where novel forms of support and training can be introduced to enrich the operator's knowledge and encouraging the proper use of new, emerging tools, such as robots [7]. Considering all that, the proper design of AR and VR interfaces becomes crucial to promote the new-born paradigm of the Operator 4.0. In order to achieve higher task precision and market responsiveness, industrial collaborative robots and AR devices are gradually entering the shop floor level to assist operators [8]. Contextually, designing a collaborative working environment for O4.0 requires the adoption of the HCD approach in order to consider the O4.0 know-how and know-to-cooperate: the former refers to human capability to run the process, whilst the latter deals with their attitude to cooperate with other agents [9]. Hence, agents' intentions, action's adaptability, and safety concerns are steadily part of human–robot interaction (HRI). The latter is a field of study concentrated on the design of robotic systems for use by or with humans which seeks to improve the human–machine collaboration while developing innovative and usable user interfaces. Finally, the analysis of the state of the art highlights the need for defining a new HCD framework tackling the new O4.0 requirements to improve the design of AR interfaces, according to UX interface design, but applied them specifically for HRI scopes. Therefore, this review also proposes a framework to design AR interfaces as a natural outcome of this review work, due to the lack in the existing literature.

The review moves from the analysis of the different levels of HRI, namely coexistence, cooperation, and collaboration [10]. Coexistence refers to humans and robots sharing common workspace and time, but using different resources. Cooperation is characterized by a common workspace, time, and shared aim, with sequential or simultaneous tasks, on the same resources, but does not involve a direct contact between humans and robots. Finally, collaboration is the highest level of interaction that envisages common workspace, time, and shared aim, with sequential or simultaneous tasks on the same resources, involving a direct physical contact between humans and robots.

This distinction demonstrates how specific UX issues can be identified for each level of HRI. Thus, as shown in Figure 1 and according to the distinction just made, technologies such as AR could be selectively used to support specific targeted tasks of the human–robot interaction. Moreover, regardless of HRI nature, in [10] the authors suggest to integrate a human-centered view to the robot-centered and robot cognition-centered views, meaning to harmonize the HF and human–machine interaction principles with technological and decisional capability aspects. Based on [11], it could be stated that:


**Figure 1.** Main AR applications areas according to the three levels of HRI.

Such considerations are valid even if multiple humans and robots are involved in the interaction. All these terminological and conceptual distinctions demonstrate the intrinsic complexity of a HRI task and the need of a structured approach to appropriately encompass all its peculiarities.

However, in practice, collaborative robots (i.e., "*cobots*") are currently regulated by the ISO 10218 technical specification document [12], providing a precise interpretation of their roles and natures, and defining the safety requirements for industrial collaborative systems. The main focus is actually for safety as addressed by ISO 15066 [13], neglecting other human implications. We could say that the focus is on the robot-centered view, and only marginally on the robot cognition-centered view where the human-centered aspects are only considered for safety purposes.

In this context, the use of AR technologies as task support tools force us to pay attention to the human-centered view by supporting HRI at different levels (i.e., coexistence, cooperation, and collaboration), thanks to the creation of specific, contextual human knowledge on the ongoing process, and the promotion of know-how development and know-to-cooperate abilities [9]. Indeed, an effective AR interface should define their contents according to the level of interaction realized between the operator and the robots, since each level is characterized by a different form of interaction that requires specific features in the AR interface to properly support the tasks. Indeed, AR introduces digitized information into the real working environment to augment the UX, promoting system usability and object visibility, while reducing the operator's physical and cognitive workload. Current AR applications are provided mainly through tablets since head-mounted display are still far from being industrially reliable, especially as for ergonomic aspects. The state of art in literature regarding AR-supported HRI highlighted different application areas: visualizing robot actions or faults to support troubleshooting [14]; controlling robots combining head and eye gaze; visual simultaneous localization and mapping algorithms [15]; understanding the impact of AR cues on human attention [16]; supporting human–robot

collaborative assembly [17]; providing workspace and robot's volume monitoring [8,18]; improving interaction efficiency by reducing the physical strength (especially in heavy-duty industries) or letting older people to continue working in production facilities [19]; and by helping operators to have an immediate comprehension of the robot intentions in a quick and intuitive way (e.g., making visible the robot's planned motion and task state) [20,21] or adapting the AR contents to the specific environmental or task conditions [22,23].

However, the majority of existing AR solutions looks at technology and robots, while neglecting the human aspects [10]. The main problem in AR-supported HRI is the lack of user friendly and intuitive interfaces implemented in accordance with the interaction design principles to guide users. While there are some attempts to design user-centered AR interfaces for different applications [24–28], for robotics applications, AR interfaces are usually developed by technology experts and not by UX designers. As a result, interfaces are technology-driven and not user-driven and they usually appear not fully centered on the users' perspective [10].

Only recently, a limited number of papers focused on the need to apply structured HCD methods to the design of HRI, focusing on the understanding and satisfaction of human needs. For instance, a UX-oriented methodology has been recently defined to investigate the human–robot dialogue and map the interaction with robots in performing shared tasks, eliciting the requirements for a valuable HRI design [10]. Similarly, another study has considered the role and relevance of UX in HRI and defined the actual trends concerning the inclusion of UX related to socially interactive robots [29]. Another work also proposes an innovative user-centered design tool to design AR platforms for maintenance operations [30]. Despite this, they did not specifically focus on the design of AR interfaces to support human–robot collaborative tasks, where a limited attention is paid to user perception, ergonomics, and usability issues. Contrarily, the nature of AR and the role that such applications can assume in the context of O4.0 requires great attention to human aspects. Interdisciplinary research is also advisable to achieve high-quality HRI.

In this context, the paper provides two main contributions:


#### **2. Methodology**

#### *2.1. Systematic Literature Review*

A systematic literature review (SLR) approach has been adopted to investigate the literature relevance of HCD and UX-based methodologies applied to HRI in collaborative tasks. Replicability and objectivity have been considered as basic principles in carrying out the research: the review follows the PICOC framework proposed by [18], thanks to its systematicity and completeness. PICOC [31] stands for a list of items to consider in the analysis, respectively *population, intervention, comparison, outcomes* and *context.* It has been chosen to outline the key concepts of the research. For this review, hereafter the considered items:


#### *2.2. Research Questions*

The goal of the study is to provide a comprehensive overview of how UX has been used in the field of AR-supported collaborative applications for industry. Bearing this in mind, the authors formulated three research questions (Qi) according to the PICOC results:


#### *2.3. Search and Selection Process*

The search was conducted on the Scopus database, since it encompasses different digital libraries, such as IEEE or ACM, and provides high quality, indexed papers. The inclusion criteria are:


According to [18], the following exclusion criteria have been defined:


Seeing the high specificity of the "*user experience*" and "*user interface*" keywords, the initial search returned 27 papers. No secondary documents nor patents were found. No further papers' selections in terms of field of application of AR-supported collaborative tasks were provided, since the main interest is analyzing the current general sensibility towards the UX approaches.

After the above-mentioned selection process, only 21 papers were admitted. The results from inclusion and exclusion criteria, and the number of papers found at each step, are shown in Table 1.


**Table 1.** Search and selection results.

Afterwards, the selected papers have been evaluated by a structured quality assessment procedure using three quality criteria (QC) similarly to [18]:

• QC1: It reflects the quality of the journal on which the paper is published, where Qi refers to the quartile score, according to Scimago Journal Ranking [32]. A score of 1 was assigned to Q1 journals, 0.5 to Q2, and 0.25 to Q3. If the journal belongs to Q4, or if it does not belong to a specific quartile yet or it is part of a conference proceeding, a "/" is assigned counting as 0;


#### QC3(*i*) = *c*(*i*)/*mc*

The final aim is to focus the review attention on papers which match at their best the intentions of the authors and to allow the reader to select the best referenced articles according to such criteria.

#### **3. Review Results**

The results of the quality assessment are shown in Table 2, listed from the highestquality paper to the lowest-quality paper: the final quality score is the direct sum of the results obtained according to the three considered criteria. The table highlights how there is still little attention given to the current topic, which has been tackled starting from the last few years. As for publications in the last year, it must be considered that stillongoing studies have to be published and papers' impact in terms of citations need more time to be evaluated. The majority of collected studies are quiet below the half of the maximum allowed quality score, indicating that, according to quality criteria, there is still need of further research in this field. Considering QC2 scores, nearly half of the selected research papers do not include any reference to any of the HCD keywords, remarking the abovementioned lack of a UX sensibility.

As shown in Figure 2, in the last five years the attention towards AR solutions applied to robotics has noticeably increased. It can be related to the introduction on the market of several proprietary software development kits (e.g., ARKit, ARCore) from major vendors in 2017, which stimulated a general enthusiasm in AR applications development, and the contemporary commercialization of the Microsoft Hololens, from late 2016. These two facts pushed the academic and industrial interest on AR topics and potentials. In fact, previous tools did not provide effective augmented–cognitive interaction and lack in proactively supporting operators on receiving only the relevant information at their smart devices from nearby machines [33].

**Figure 2.** Papers by publication year (**on the left**) and total citations per year (**on the right**).


**Table 2.** Quality assessment on selected papers.

Figure 3 depicts the main subject areas dealt by the selected papers. One can infer that the design of AR applications supporting collaborative tasks do not merely involve engineering considerations on technologies or infrastructure's deployment, but also other field of study, such as psychology, neurosciences, and social sciences (i.e., tackling interaction issues from the human point of view or determining the most useful physiological parameters to consider in the evaluation of a specific interface).

**Figure 3.** Papers by subject's area.

In conclusion, the relevant works being selected have been carefully considered against the research questions (Q1, Q2, and Q3) as presented in the following sections.

#### *3.1. What Are the State of the Art UX Approaches in AR-Supported Collaborative Solutions?*

The current trends in the design of collaborative tasks supported by AR technologies do not systematically show a great attention to UX topics. Hietanen et al. [8] proposed an interactive user interface to assist O4.0 in performing robot-assisted tasks comparing two separate implementations of the same system: a projection-mirror setup, and a wearable device (i.e., Microsoft Hololens). No prior UX assessment was proposed; as part of the subjective evaluation, a final questionnaire including 13 questions divided into six categories (respectively: safety, information processing, ergonomics, autonomy, competence, and relatedness) was submitted to roughly understand mental and physical stress. Comments from users were collected to deepen the subjective impression, without using any structured method to collect the perceived workload, as used for instance in different contexts. A more structured approach is presented by Papanastasiou et al. [34]: the paper emphasizes the need of a seamless integration between the human operator and his robotic counterpart by monitoring both working entities through sensors and wearable devices. This led to the re-design of the workplace from the human point of view to promote both the robot's operability and operator's mobility, without any barrier to separate them; a multi-stage iterative process has been followed, starting from technical and functional specifications as well as safety requirements. A digital simulation is included for supporting cell setup and risk assessment.

De Pace et al. [18] placed attention on AR devices' usability as enabling tools. The authors reported how usability, workload, and likability can be investigated thanks to the standardized questionnaire (e.g., NASA-TLX [52], System Usability Scale (SUS) [53], AttrakDiff [54]). The same intention is expressed by Huy et al. [35], who introduced a novel AR handheld device inspired by the abovementioned multimodality perceptive interface, incorporating hand gesture mapping, haptic buttons, and laser pointers. The system can suggest available options to the operator and wait for a response instead of traditional inputs by keyboard or mouse; a usability investigation is eventually foreseen to improve the interface effectiveness with the help of user's feedbacks.

Materna et al. [36] evaluated the idea of spatial augmented reality (SAR) through a UX study. The outlined approach works towards avoiding continuous switching of attention during demanding tasks, thanks to a correct distribution of information along the operations and a shared workplace, to be usable also for non-expert users. Process simplification was also addressed by Aschenbrenner et al. [37] to reduce the installation time of hybrid robot–human production lines, and by De Tommaso et al. [38] that defined a

new process of skill transfer between human workers and robots. Similarly, Fuste et al. [49] presented a holistic UX framework (called "Kinetic AR") for visual programming of robotic motions using AR: the goal was to guarantee a low entry barrier to intricate spatial hardware programming. The UX approach was achieved through interviews to robotic system integrators, manufacturers, and end-users with different expertise, to finally identify the goals and requirements to be accomplished. Communications and interactions were also investigated by Bazzano et al. [39], using 3D immersive simulation to support the design and validation of natural HRI in generic usage contexts, comparing an AR interface and a non-AR one. Among others, subjective observations were gathered through the SASSI methodology [55] to evaluate speech interaction in both interfaces. Information on completion times, overall satisfaction, ease of use, perceived time requested, and support information were collected, and their statistical relevance was given by running an independent sample *t*-test.

As a result of the review, one can state that there are few preliminary attempts to include UX in the design of AR applications for HRI purposes, as summarized in this paragraph, but a reference, ready to use model that is able to integrate the users' subjective evaluation and the analysis of the quantitative human–robot performance is still missing. The main weaknesses of the current attempts are:


These results highlight the need of a structured framework to design AR interfaces for HRI and pushes towards its definition.

#### *3.2. What Are the Main Benefits of Adopting UX Approaches in Designing AR-Supported Collaborative Solutions?*

After the first analysis, the review focused on the analysis of the benefits related to the adoption of UX-based approaches in the design of AR applications for HRI: these approaches generally turn into a detailed evaluative UX phase, where subjective questionnaires represent the main source of information. Table 3 summarizes the most significant papers dealing with such an aspect, also reporting the main areas of applications.

Within the context of laboratory object manipulation tests, Frank et al. [45] focused on the user interaction effectiveness of a mobile augmented interface and on virtual graphics appearing as task's visual cues to reduce cognitive burden on end-users. The proposed system can automatically intercept an operator's intention on virtual objects (i.e., drag and drop of models in the space), thus reducing the human involvement while operating with the collaborative companion. No defined UX approach was adopted: a revision of the overall interface was conducted through a final questionnaire after the user-testing phase to identify possible criticalities. A concurrent interface simplification without losing its functionalities in the human–robot collaboration is indeed of extreme importance, in opposition to what has been defined by [42], where high cognitive functionalities are purposely omitted from the proposed interface.

A further critical point in AR-supported collaborative tasks is the choice of the correct interface to use, which is usually conducted without a precise validation tool or methodology. In De Pace et al. [18], a series of interesting UI studies resembling HCD approaches were collected concerning whether exocentric or egocentric interfaces are the best in limiting the level of mental and physical involvement in controlling the manipulator. Another study by Chan et al. [50] reconsidered AR-based interfaces for human–robot collaboration on large-scale labor-intensive manufacturing tasks (carbon-fiber-reinforced-polymer production) where the accent is on the perceived workload and efficiency. Indeed, as stated in other studies [18], the lowest physical and temporal demand is registered with appropriately designed AR solutions, reducing user's effort and sense of frustration while cutting down operational time. Such an approach does not explicitly make reference to a structured and systematic HCD methodology, but it relies on NASA-TLX questionnaire results. Similar conclusions were reported by Diehl et al. [51], where application circumstances for the choice of best device are examined, starting from users' feedback on robot's time and area of manipulation up to user sense of safety.

In Xin et al. [48], a collaborative task concerning playing board games was explored and evaluated by examining various interaction opportunities arising when humans and robots collaborate. This interesting analysis was related to two contrasting robotic behavioral conditions which have been tested: a human-centric condition where robot behavior is more accustomed to human obedience, and a robot-centric one where suggestions coming from the operator are neglected. Statistical results on the final user testing phase related to a custom questionnaire allows for a reinforced idea of the centrality of a human-centric condition to increase the sense of collaboration of O4.0.

Moreover, Palmarini et al. [56] stressed that safety is deemed as one of the most relevant aspects in human–robot collaborative systems and context-awareness information is unavoidably important to enhance user perception. Analogously, Quintero et al. [57] proposed two separate approaches to draw AR paths, respectively, a free space and a surface trajectory one. Such proposals could be effectively integrated to optimize robot's programming phases with a UX sensibility, reducing programming time, and allowing the worker to selectively inspect different robot trajectories and to work on them in a user-friendly interface. For an optimal collaboration, robot intention is another source of essential information within a HCD approach: a general indifference on the topic emerges from actual selection, although Liu et al. [58] described a temporal and-or graph (T-AOG) to allow the human understanding of the robot's internal decision-making process, to supervise its action planner, or to monitor its latent states (i.e., forces and moments exerted while interacting).


**Table 3.** Papers focusing on added value related to adoption of UX approaches in HRI.

As emerged from the review findings, several benefits derived from a UX-based approach when implementing AR-supported collaborative tasks, both objective and subjective: a systematic cognitive and physical relief on the operator, an increased working efficiency, a reduction in operational time and sense of frustration when interacting with shop floor interfaces, and an improved sense of safety and inclusiveness while collaborating with the robotic counterpart. Such conclusions were mainly reported after a user testing campaign in which standard or customized questionnaires were designated to collect final tester impressions to be subsequently reanalyzed by the papers' authors.

#### *3.3. What Are the Main Challenges in Designing AR-Supported Collaborative Solutions?*

The literary review highlighted that the design, development, and use of AR technologies to improve HRI in industrial contexts is a hot topic from a technological point of view; however, there is a lack of models to deepen the UX and only a limited number of papers have proposed the adoption of UX methods to support the design of AR application in this field, according to user-centered principles. As reported by recent market forecasts, the mixed reality market size (including both augmented and virtual reality technologies) is expected to grow by USD 125.19 billion during 2020–2024 [59], up to USD 1.45 trillion to by 2030 [60]. This rapid growth entails big challenges from both a technical and technologies viewpoint and a human viewpoint. Some issues, just considered so far, need to be investigated and faced: from privacy problems to safety requirements. This means that the design of AR-supported applications in the context of HRI will consider how to manage the robots' and operators' data collected and how to assure the proper privacy and safety levels. Considering current applications [61], one can reflect on both critical success factors and challenges related to future robust industrialization. If compared to industrial software systems, current AR hardware readiness seems to still be far from a mature adoption in industry. Thus, human-centered design methods are required to balance industrial system requirements with human needs and social concerns; in this sense AR is so close to human abilities, also affecting and empowering them. Another challenge is the integration of AR devices within modern manufacturing systems: data exchange to and from the AR application should be compliant with robotics and automation standards to assure a full adoption in industry. In regard to this topic, only few research attempts have been made (e.g., AutomationML [62]) which are still far from the inclusion of AR data.

Moreover, a proper UX evaluation framework for AR-supported collaborative tasks needs to be defined. A first attempt has been made considering UX analysis in the design of HRI applications using a structured approach [10], but not including AR tools. On this topic, the main challenge is to define a systematic and coherent way to interpret data coming from different equipment and returning AR digital contents to the O4.0, in an adaptive and intelligent way, considering the UX, and further enhancing the human physical, sensorial, and cognitive capabilities by means of human cyber–physical system integration [63]. In this direction, a further challenge is promoting socially embedded human–robot collaborations where human communications can be used to adapt service robots to the user needs accordingly: it consists of giving the robots the concept of emotional tuning and to emphatically communicate with machines [64].

Moreover, the estimation of those variables affecting trust in HRI is necessary to design new, effective AR interfaces providing situational awareness and spatial dialog, and to determine functional elements to improve human confidence in robots. This evaluation should be included in a comprehensive approach considering validated metrics for an overall UX assessment [65]. Contextually, the assessment of human cognitive and physical efforts in developing collaborative tasks has an absolute relevance.

Finally, in the context of AR-supported human–robot collaborative operations, user testing needs a more statistically reliable base, including both academic and industrial studies and increasing the number and typology of people involved to assess the effectiveness of AR in HRI tasks [18]. The results mainly imply the definition of new UX-based methods to design AR interfaces from a multiple users' point of view, involving novice and expert users, and the benchmark of the most suitable wearable interfaces to be used together with industrial robots.

#### **4. Discussion on Review Results**

From the current analysis, a substantial lack of structured methods to design usercentered AR applications for HRI have emerged. Despite several attempts in other various contexts (tourism, mobile application games, etc. [24–28]), UX-driven methodology are poorly adopted in HRI, and this led to the design of interfaces which are far from real users' needs.

On this base, the authors believe that it is crucial to promote UX-driven design processes to develop successful AR interfaces, especially for collaborative tasks involving humans and robots, to fully support the O4.0. Only the adoption of a proper HCD framework allows considering the real needs of the operators in a specific context of use, focusing not only on safety and ergonomics issues, but on the overall UX aspects (e.g., usability, intuitiveness, satisfaction, cognitive load, and emotional response) with the final aim to have a renovated human–robot relationship where both subjects are actively participating and transmitting knowledge to the equivalent counterpart, according to a win–win approach.

For these purposes, the review results suggest defining a structured UX-driven process to design AR interfaces for HRI tasks. The review represents a nonlinear and iterative process aiming at assisting the AR interface designer in implementing a valuable communication with robots, during the execution of HRI tasks. As depicted in Figure 4, the process must be made up of three main steps:


**Figure 4.** The need of a UX-driven process for AR interface design to support HRI tasks.

The process starts with the need to define a new AR interface for collaborative tasks, and brings to the interface definition, before the AR interface development. Such a process goes a step further with respect to a few recent studies [10,34].

The first step to be investigated consists of requirement gathering from user research. It is a vital part of any UX-driven processes because it is the act of understanding the users and their needs, making sense of who the user is, what he/she wants, and how he/she will perform a certain task. It mainly consists of a deep, accurate user analysis based on the context research to be carried out in a not invasive way using a set of UX design techniques. The most suitable techniques for user understanding are user observation, focus groups, and interviews [66]. User observation consists of observing users in their natural environment without affecting their normal behaviors and performance, with the aim to understand the users' needs and widely used when users belong to specific categories. It can also be done using video analysis, but in some industrial contexts video recording is not always possible. Such analysis is frequently combined with focus groups or interviews, which can provide a deeper insight about the users' habits and behaviors since they actively involve users to collect, in different ways, qualitative data about their

needs, expectations, or fears. The mixed approach combining different techniques is very useful since it allows combining quantitative and qualitative data. After that, the list of executed tasks can be easily defined, reporting also the actors involved for each task (e.g., humans, robots) and the time span. These actions finally led to the definition of design specifications considering functional, technical, and safety aspects. A different set of UX design techniques can be validly adopted, from user scenarios and personas to experience maps, as proposed by [10]. User scenarios are stories to show how users might act to achieve a goal in a system or environment. They are valuable aids for designers to visualize aspects of their solutions which users might appreciate most in their contexts of use and with their unique needs and motivations. Personas represent the target users by a set of probable users and flesh out their experiences to reflect realistic situations. Finally, experience maps represent a synthetic visualization of an entire end-to-end experience that generic users (i.e., personas) go through to accomplish a certain goal, and they allow a better understanding of human behaviors.

The second aspect to be investigated is the AR interface design and prototyping. Design consists of the definition of the interface functions as well as the items, while prototyping presents the design in a concrete way by representing the interface in action with the simulation of the final interaction between the user and the system. In this context, wireframes are very powerful as a visual representation of the interface pages; clickable wireframes are the simplest form of interactive prototype, created by linking static wireframes together. Moreover, using digital tools, wireframes can be updated and easily reused and layouts can be easily changed based on user feedback to repeat the testing process. Low-fidelity prototypes allow one to easily define the following information architecture, also comparing possible alternative solutions, and to optimize the design itself in an iterative way.

Finally, the third aspect to include consists of the UX assessment. Different evaluation techniques exist to investigate more closely the user's behavior and perception on a final prototype or products. One of the most spread is user testing, that can lead to both quantitative and qualitative data [67]. Quantitative tests carry out measurements (e.g., execution time, number of errors, and number of tasks completed) while performing a specific task on the user interface. User testing sessions often include post-test evaluation questionnaires (e.g., System Usability Scale (SUS) [53], UEQ Questionnaire [68], or meCUE questionnaire [69]) and allow the gathering of many opinions in a short time, as well as being adaptable to multiple application areas. In addition, physiological measurements allow us to investigate in real time the level of physical and cognitive workload as well as stress of the operator during the interaction. Examples of adoption of these tools for the design of modern systems is provided by [70].

Based on the review results, it is also possible to identify some trends of future implementation of a successful UX-driven design process, as depicted above. First of all, the UX assessment needs to be included in the analysis of the real human activity during collaborative task execution, to understand the level of attention and physical and cognitive responsiveness through wearable devices, in order to understand the real UX. Secondly, non-intrusive sensors and smart interfaces are required to carry out a bi-directional communication flow from-to machines and robots and create a synergic collaboration, in order to overcome the current vision based on separate entities with incompatible characteristics. Further, flexible data management is necessary to manage the crescent complexity and the larger amount of data coming from the shop floor and the operator, in order to successfully integrate AR-supported collaborative tasks in the overall productive chain. Finally, it is worth understanding the impact of AR on user perceptions, ergonomics, and human–robot interactions.

#### **5. Conclusions**

This paper reviews the overall condition of industrial collaborative tasks supported by AR technologies, pushed by the growing interest of industry in the AR market registered

in the last 5 years and the need to define new ways to make the Operator 4.0 successfully work in the factories of the future, interacting with robots. In this context, AR interfaces can help to improve human–robot communication and interaction at different levels, thanks to the possibility to show contextual and digital information and data when and where needed. However, there is a lack of a proper framework to design AR interface, including the operators' UX, specifically designed for HRI. The paper starts with a review of the state of the art, focusing on the inclusion of HF and UX design principles in the design of AR interfaces to support HRI. In the paper, a SLR approach was used to collect the most interesting papers in this field, considering the recent scientific literature. After identifying the focus of the current study, 27 papers were gathered and assessed according to proper quality control parameters previously defined. At the end of the selection process, 21 papers were deemed suitable to answer the three research questions: *Q1: What is the state of art related to UX approaches in AR-supported collaborative solutions?; Q2: What are the main benefits of adopting UX approaches in designing AR-supported collaborative solutions?; Q3: What are the main challenges in designing AR-supported collaborative solutions?*

As a result, the research highlighted the lack of reliable and systematic user-centered methodologies to design AR applications for human–robot collaborative tasks. This fact is limiting the acceptance of such solutions and slowing down the technological integration of smart devices within the Operator 4.0 paradigm. Several added values of AR application to Operator 4.0 scenario are then presented, starting from the reduction of the worker 's cognitive workload thanks to the interface simplification and adequate usability tests, up to the realization of a shared workplace where a synergic collaboration could take place, in which both actors can reciprocally be understood and learn from the corresponding counterpart. From the discussion of the review results, the paper finally highlights the need of structured UX-driven processes to design successful AR interfaces for human–robot collaborative tasks, made up of different phases organized in an iterative cycle, including typical UX design tools and techniques for interface design, that are not currently used in AR interface design for industrial purposes. The research also defined the main trends of development for future applications, considering the need of non-intrusive human monitoring devices and smart tools to enable fruitful communication between operators and the on-going process at the shop floor.

**Author Contributions:** Conceptualization, M.P. (Margherita Peruzzini) and R.K.K.; methodology, M.P. (Margherita Peruzzini) and E.P.; formal analysis, R.R.; data curation, R.K.K.; writing original draft preparation, R.K.K. and M.P. (Margherita Peruzzini); writing—review and editing, M.P. (Marcello Pellicciari) and R.R.; supervision, M.P. (Marcello Pellicciari). All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** All available data is contained within the paper.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Applied Sciences* Editorial Office E-mail: applsci@mdpi.com www.mdpi.com/journal/applsci

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel: +41 61 683 77 34 www.mdpi.com

ISBN 978-3-0365-6555-2