*Letter* **A Framework for Human-Robot-Human Physical Interaction Based on N-Player Game Theory**

### **Rui Zou †, Yubin Liu \*, Jie Zhao and Hegao Cai**

State Key Laboratory of Robotics and Systems, Harbin Institute of Technology, Harbin 150001, China; 18b908014@stu.hit.edu.cn (R.Z.); jzhao@hit.edu.cn (J.Z.); hgcai@hit.edu.cn (H.C.)


Received: 31 July 2020; Accepted: 1 September 2020; Published: 3 September 2020

**Abstract:** In order to analyze the complex interactive behaviors between the robot and two humans, this paper presents an adaptive optimal control framework for human-robot-human physical interaction. N-player linear quadratic differential game theory is used to describe the system under study. N-player differential game theory can not be used directly in actual scenerie, since the robot cannot know humans' control objectives in advance. In order to let the robot know humans' control objectives, the paper presents an online estimation method to identify unknown humans' control objectives based on the recursive least squares algorithm. The Nash equilibrium solution of human-robot-human interaction is obtained by solving the coupled Riccati equation. Adaptive optimal control can be achieved during the human-robot-human physical interaction. The effectiveness of the proposed method is demonstrated by rigorous theoretical analysis and simulations. The simulation results show that the proposed controller can achieve adaptive optimal control during the interaction between the robot and two humans. Compared with the LQR controller, the proposed controller has more superior performance.

**Keywords:** physical human-robot interaction; game theory; adaptive optimal control; robot control

### **1. Introduction**

In the past decade, physical human-robot interaction has attracted the attention of the research community due to the urgent requirement for robot technology in unstructured environment [1–4]. Physical human-robot interaction combines the advantages of humans and robots, which means that humans are good at reasoning and problem solving with high flexibility, while robots perform well in terms of execution as well as guaranteeing the accuracy of task execution [5,6]. The combination of these advantages has led to the wide application of physical human-robot interaction, such as teleoperation [7,8], collaborative assembly [9,10], and collaborative transportation [11–13].

Two types of specific human robot interaction strategies have been widely studied: co-activity type of interaction strategy and master-slave control strategy [14,15]. Co-activity type of interaction strategy is used in typical rehabilitation robots that help limb movement training or intelligent industrial systems that support heavy objects to resist gravity, where robots completely ignore human users' behaviors [16,17]. In contrast, the master–slave control strategy is used in the teleoperated robots or force extender exoskeletons use where robots completely follow the control of human users [18]. However, these strategies can only be used for specific interactive behaviors, the general framework for analyzing various interactive behaviors between robot and humans is still missing [19,20].

It has been pointed out that game theory can be used as a general framework to analyze complex interactive behaviors between multiple agents because different combinations of individual cost functions and different optimization objectives can be used to describe various interactive behaviors in game theory [21]. In [22], the human and the robot were been regarded as two agents and game theory was used in order to analyze the performance of the two agents. In [23], the optimal control was obtained for a given game with a linear system cost function by solving the coupled Riccati equation. In [24], an optimal control algorithm was developed for human-robot collaboration by solving the Riccati equation in each loop. In [25–28], policy iteration was used to solve the Nash equilibrium solution in order to improve the calculation speed. In [29], cyber-physical human systems was modeled via an interplay between reinforcement learning and game theory. In [30], haptic shared control for human-robot collaboration was modeled by a game-theoretical approach. In [31], human-like motion planning was studied based on game theoretic decision making. In [32], cooperative game was used for human-robot collaborative manufacturing. In [33], a bayesian framework was proposed for nash equilibrium inference in human-robot parallel play. In [19], non-cooperative differential game theory was used to model human-robot interaction system that results in a variety of interaction strategies. However, the above studies only consider two agents, that is, the interaction between one human and one robot. Therefore, the aforementioned methods are not suitable for human-robot-human physical interaction where more than one human interact with one robot physically. It is worth noting that the physical interaction between one robot and two humans will bring greater advantages such as operating larger loads, improving the flexibility and robustness of the system [28,34–37]. These greater advantages are brought by the team collaboration between the robot and two humans. To the authors' acknowledgment, no literature have researched the problem of the physical interaction between one robot and two humans based on game theory.

In the paper, a general adaptive optimal control framework for human-robot-human physical interaction is proposed based on N-player game theory. Accordingly, the robot and two humans can interact with each other optimally by learning each other's control. N-player differential game theory was used to model the human-robot-human interaction system in order to analyze the complex interactive behaviors between the robot and two humans. In N-player differential game theory, humans' control objectives are assumed to be knowledge [38,39]. However, N-player differential game theory can not be used directly in actual scenerie since the robot cannot know humans' control objectives in advance. In order to let the robot know humans' control objectives, the paper presents an online estimation method to identify unknown humans' control objectives based on the recursive least squares algorithm. Subsequently, the Nash equilibrium solution of the multi-human robot physical interaction is obtained by solving the coupled Riccati equation to achieve coupled optimization. Finally, the effectiveness of the proposed method is demonstrated by rigorous theoretical analysis and simulation experiments. This paper makes the following four contributions.


The remainder of this paper is organized, as follows: Section 2 models the human-robot-human physical interaction system based on N-player differential game theory. Section 3 establishes an adaptive optimal control law, and the control performance of the system is analyzed theoretically. Section 4 verifies the effectiveness of the proposed method through simulation experiments. Finally, Section 5 concludes this work.

### **2. Problem Formulation**

### *2.1. System Description*

The system considered contains two humans and one robot. An example scenario is shown in Figure 1, where the robot and the humans collaborate to perform an object transporting task. In this shared control task, when the control objectives of humans' change, the robot should recognize the humans' control objectives and response adaptively and optimally. The forces exerted by the humans on the object are measured by force sensors at the interaction point. It is worth noting that the humans' control objectives are unknown to the robot.

**Figure 1.** A scenario where the humans and the robot collaborate to perform an object transporting task.

The forward kinematics of the robot are described as

$$
\varphi(t) = \phi(q(t))\tag{1}
$$

where *<sup>x</sup>*(*t*) <sup>∈</sup> <sup>R</sup>*<sup>m</sup>* and *<sup>q</sup>*(*t*) <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* are the positions in Cartesian space and joint space respectively, *<sup>m</sup>* and *n* are degrees of freedom. Derivation of Equation (1) with time can be obtained

$$
\dot{x}(t) = f(q(t))\dot{q}(t) \tag{2}
$$

where *<sup>J</sup>*(*q*(*t*)) <sup>∈</sup> <sup>R</sup>*m*×*<sup>n</sup>* is the Jacobian matrix.

The following impedance model is given in Cartesian space

$$M\_d \ddot{x}(t) + \mathbb{C}\_d \dot{x}(t) = u(t) + f\_1(t) + f\_2(t) \tag{3}$$

where *Md* <sup>∈</sup> <sup>R</sup>*m*×*<sup>m</sup>* is the desired inertial matrix, *Cd* <sup>∈</sup> <sup>R</sup>*m*×*<sup>m</sup>* is the damping matrix, *<sup>u</sup>*(*t*) <sup>∈</sup> <sup>R</sup>*<sup>m</sup>* is the control input in the Cartesian space [40–42], *<sup>f</sup>*1(*t*) <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* is the contact force between object and human 1, *<sup>f</sup>*2(*t*) <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* is the contact force between object and human 2.

To track a common and fixed target *xd* <sup>∈</sup> <sup>R</sup>*<sup>m</sup>* (*x*˙*<sup>d</sup>* <sup>∈</sup> <sup>R</sup>*m*) in cooperative object transporting task, Equation (3) can be transformed, as following

$$M\_d(\ddot{\mathbf{x}}(t) - \ddot{\mathbf{x}}\_d) + \mathbf{C}\_d(\dot{\mathbf{x}}(t) - \dot{\mathbf{x}}\_d) = u(t) + f\_1(t) + f\_2(t). \tag{4}$$

In order to ease the design of the control, Equation (4) can be rewritten as the following state-space form

$$\begin{aligned} \dot{z} &= Az + B\_1 u + B\_2 f\_1 + B\_3 f\_2\\ \dot{z} &= \begin{bmatrix} \mathbf{x}(t) - \mathbf{x}\_d\\ \dot{\mathbf{x}}(t) \end{bmatrix}, A = \begin{bmatrix} 0\_m & 1\_m\\ 0\_m & -M\_d^{-1} \mathbb{C}\_d \end{bmatrix} \\ B\_1 = B\_2 = B\_3 = B = \begin{bmatrix} 0\_m\\ M\_d^{-1} \end{bmatrix} \end{aligned} \tag{5}$$

where 0*<sup>m</sup>* and 1*<sup>m</sup>* denote *m* × *m* zero and unit matrices, respectively.

### *2.2. Problem Formulation*

According to non-cooperative differential game theory, in the paper, the interaction between the robot and the humans is described as a game between N players (in this paper, *N* = 3) [43]. In the game, each player will minimize their respective cost function

$$\begin{aligned} \Gamma & \equiv \int\_{t\_0}^{\infty} z^T Q z + u^T u d\_t \\ \Gamma\_1 & \equiv \int\_{t\_0}^{\infty} z^T Q\_1 z + f\_1^T f\_1 d\_t \\ \Gamma\_2 & \equiv \int\_{t\_0}^{\infty} z^T Q\_2 z + f\_2^T f\_2 d\_t \\ Q & = \begin{bmatrix} Q\_{01} & 0\_{n \times n} \\ 0\_{n \times n} & Q\_{02} \end{bmatrix} \\ Q\_1 & = \begin{bmatrix} Q\_{11} & 0\_{n \times n} \\ 0\_{n \times n} & Q\_{12} \end{bmatrix} \\ Q\_2 & = \begin{bmatrix} Q\_{21} & 0\_{n \times n} \\ 0\_{n \times n} & Q\_{22} \end{bmatrix} \end{aligned} \tag{6}$$

where Γ, Γ1, Γ<sup>2</sup> are cost functions of the robot, human 1, and human 2, respectively, *Q*, *Q*1, *Q*<sup>2</sup> are state weights matrices of the robot, human 1 and human 2, respectively. Each player achieves the cooperative object transporting task by minimizing the error to the target while minimizing their own costs. *Q*, *Q*1, *Q*<sup>2</sup> contain two components corresponding to position regulation and velocity, respectively. *Q*01, *Q*11, *Q*<sup>21</sup> correspond to position regulation and *Q*02, *Q*12, *Q*<sup>22</sup> correspond to velocity.

In [27], the N-player game has been studied if the cost functions are known. However, Γ1, Γ<sup>2</sup> are unknown to the robot because they are determined by the humans. Therefore, a method is proposed in the paper to estimate Γ1, Γ<sup>2</sup> in order to achieve adaptive optimal control and, thus, the human-robot-human cooperative object transporting task.

### *2.3. N-Player Differential Game Theory*

Based on the differential game theory of linear systems, for *N*-player game the following linear differential equation [43] is considered:

$$z = Az + B\_1 u\_1 + \cdots + B\_N u\_N,\\ z(0) = z\_0. \tag{7}$$

Each player has a quadratic cost function that they want to minimize:

$$
\Gamma\_i = \int\_0^\infty z^T Q\_i z + u\_i^T u\_i d\_{t'} i = 1, \cdots, N \tag{8}
$$

Different types of multi-agent behaviors are defined in game theory, which can be achieved through different concepts of game equilibrium [44,45]. In this paper, Nash equilibrium is considered. In the sense of Nash equilibrium, each player minimizes their cost function:

$$\boldsymbol{\mu}\_{i} = -\eta\_{i}\boldsymbol{z}\_{\prime}\,\eta\_{i} = \boldsymbol{B}\_{i}^{T}\boldsymbol{P}\_{i}$$

$$\boldsymbol{P}\_{i}(\boldsymbol{A} - \sum\_{j \neq i}^{N} \boldsymbol{B}\_{i}\eta\_{i})^{T}\boldsymbol{P}\_{i} + \boldsymbol{P}\_{i}(\boldsymbol{A} - \sum\_{j \neq i}^{N} \boldsymbol{B}\_{i}\eta\_{i})\_{i} + \boldsymbol{Q}\_{i} - \boldsymbol{P}\_{i}\boldsymbol{B}\_{i}\boldsymbol{B}\_{i}^{T}\boldsymbol{P} = \boldsymbol{0}, i = 1, \cdots, N\tag{9}$$

where *N* is equal to 3 in this paper. In the sense of Nash equilibrium, the humans and the robot minimizes their own cost function:

$$\begin{aligned} \mu &= -a z \\ \alpha &= B^T P\_r \end{aligned} \tag{10a}$$

$$\begin{aligned} f\_1 &= -\beta z \\ \beta &= B^T P\_1 \end{aligned} \tag{10b}$$

$$\begin{aligned} f\_2 &= -\gamma z \\ \gamma &= B^T P\_1 \end{aligned} \tag{10c}$$

$$A\_r^T P\_r + P\_r A\_r + Q - P\_r B B^T P\_r = 0\_{2n}, \\ A\_r = A - B\beta - B\gamma \tag{10d}$$

$$A\_1^T P\_1 + P\_1 A\_1 + Q - P\_1 B B^T P\_1 = 0\_{2n\prime} \\ A\_1 = A - B a - B \gamma \tag{10e}$$

$$A\_2^T P\_2 + P\_2 A\_2 + Q - P\_2 B B^T P\_2 = 0\_{2n}, \\ A\_2 = A - B a - B \beta \tag{10f}$$

where *α* ≡ *αe*, *α<sup>v</sup>* is the feedback gain of the robot, *β* ≡ *βe*, *β<sup>v</sup>* is the feedback gain of the human 1, *γ* ≡ *γe*, *γ<sup>v</sup>* is the feedback gain of of the human 2. *αe*, *βe*, *γ<sup>e</sup>* are the position error gains, *αv*, *βv*, *γ<sup>v</sup>* are the velocity gains, *Pr*, *P*1, *P*<sup>2</sup> are the solutions of the above well-known Riccati equation consisting of Equation (10d–f). The robot and the humans influence each other through *Ar*, *A*1, and *A*<sup>2</sup> in order to achieve the interactive control and the coupling optimization.

*β*, *γ* are unknown to the robot. Therefore, we aim to propose a method to estimate them in the following section.

### **3. Adaptive Optimal Control**

A recursive least squares algorithm with forgetting factors is used in this paper to get the estimate *β*ˆ, *γ*ˆ of *β*, *γ* in order to estimate the feedback gains of the humans in real time and avoid the data saturation phenomenon caused by the standard least squares algorithm [46]. Subsequently, the estimate *Q*ˆ 1, *Q*ˆ <sup>2</sup> of *Q*1, *Q*<sup>2</sup> can be obtained using Equation (10e,f).

Equation (10e) is used as the model for identification. For convenience, we let *<sup>θ</sup>*<sup>1</sup> <sup>=</sup> <sup>−</sup>*βT*, *<sup>y</sup>*<sup>1</sup> <sup>=</sup> *<sup>f</sup> <sup>T</sup>* 1 , *W* = *zT*. Subsequently, Equation (10b) can be rewritten as

$$y\_1 = \mathcal{W}\theta\_1.\tag{11}$$

The feedback gain of the human 1 are estimated by minimizing the total prediction error

$$J\_1 = \int\_0^t \exp(-\lambda\_1 t) \| y\_1(s) - \mathcal{W}(s)\hat{\theta}\_1 \|^2 ds \tag{12}$$

where *λ*<sup>1</sup> is the constant forgetting factor. The update rule of the parameter *θ*<sup>1</sup> can be obtained as

$$\begin{aligned} \dot{\theta}\_1 &= -P\mathcal{W}^T \boldsymbol{e}\_1 \\ \dot{P} &= \lambda\_1 P - P\mathcal{W}^T \mathcal{W} P \\ \boldsymbol{e}\_1 &= \mathcal{Y}\_1 - \mathcal{Y}\_1. \end{aligned} \tag{13}$$

The estimated error of ˆ *θ*<sup>1</sup> is

$$
\varepsilon\_{\theta\_1}(t) = \exp(-\lambda\_1 t) P(t) P^{-1}(0) \varepsilon\_{\theta\_1}(0). \tag{14}
$$

Thus, the estimate *β*ˆ can be obtained as

$$
\beta = -\theta\_1^T.\tag{15}
$$

Similarly, we let *<sup>θ</sup>*<sup>2</sup> <sup>=</sup> <sup>−</sup>*γT*, *<sup>y</sup>*<sup>2</sup> <sup>=</sup> *<sup>f</sup> <sup>T</sup>* <sup>2</sup> , *<sup>W</sup>* = *<sup>z</sup>T*. Afterwards, Equation (10c) can be rewritten as

$$
\Delta y\_2 = \mathcal{W}\theta\_2.\tag{16}
$$

The feedback gain of the human 2 are estimated by minimizing the total prediction error

$$J\_2 = \int\_0^t \exp(-\lambda\_2 t) \| y\_2(s) - \mathcal{W}(s) \dot{\theta}\_2 \|^2 ds \tag{17}$$

where *λ*<sup>2</sup> is the constant forgetting factor. The update rule of the parameter *θ*<sup>2</sup> can be obtained as

$$\begin{aligned} \dot{\theta}\_2 &= -P\mathcal{W}^T \mathbf{c}\_2\\ \dot{P} &= \lambda\_2 P - P\mathcal{W}^T \mathcal{W} P\\ \mathbf{c}\_1 &= \mathcal{Y}\_2 - \mathcal{Y}\_2. \end{aligned} \tag{18}$$

The estimated error of ˆ *θ*<sup>2</sup> is

$$e\_{\theta\_2}(t) = \exp(-\lambda\_2 t) P(t) P^{-1}(0) e\_{\theta\_2}(0). \tag{19}$$

Thus, the estimate *γ*ˆ can be obtained as

$$
\hat{\gamma} = -\hat{\theta}\_2^T.\tag{20}
$$

Equations (13), (15), (18) and (20) are critical, because they enable each agent to recognize their partners' control objectives and use Equation (10a–f) to adjust their own control.

In order to ensure the performance of cooperative object transporting task, we let

$$Q + Q\_1 + Q\_2 \equiv \mathbb{C} \tag{21}$$

where *C* is the total weight. The cooperative object transporting task fixes the task performance through the total weight C and uses Equation (21) to share the the effort between 2 humans and the robot. Equation (21) makes the proposed controller be able to adjust the contributions between the humans and the robot and makes the humans and the robot take complementary roles as well.

The control architecture is shown in Figure 2.

**Figure 2.** Control Architecture.

A pseudo-code summarizes the implementation procedures of the proposed method as Algorithm 1.


**Input:** Current state *z*, target *xd*.

**Output:** Robot's control input *u*, estimated the humans' cost function state weight *Q*ˆ 1, *Q*ˆ <sup>2</sup> in Equation (10e,f). **Begin**

Define *xd*, initialize *Q*, *Q*ˆ 1, *Q*ˆ 2, *u*, *f*1, *f*2, *z*ˆ, *α*, *β*ˆ, *γ*ˆ, *Pr*, *P*ˆ 1, *P*ˆ 2, set *λ*<sup>1</sup> in Equation (13), *λ*<sup>2</sup> in Equation (18), *C* in Equation (21), the terminal time *tf* of one trial.

**While** *t* < *tf* do

Measure the position *x*(*t*), velocity *x*˙(*t*), and form *z*.

Update *β*ˆ using Equations (13) and (15), Update *γ*ˆ using Equations (18) and (20).

Solve the Riccati equation in Equation (10d) to obtain *P*, and calculate the robot's control input *u*.

Calculate estimated the humans' cost function state weights *Q*ˆ 1, *Q*ˆ <sup>2</sup> in Equation (10e,f) using the Riccati equation.

Compute robot's cost function state weight *Q* according to Equation (21).

**Theorem 1.** *Consider the robot dynamics shown in Equation (5). If the robot and the humans estimate the parameters of their partners' controller and adjust their own control according to Equations (10a–f), (13), (15), (18), (20) and (21), then the following conclusions will be drawn:*


**Proof of Theorem 1.** *β*ˆ, *γ*ˆ influence *u*, *f*1, *f*2, *z* as following:

$$
\dot{\hat{z}} = A\hat{z} + B\hat{u} + Bf\_1 + B\_2. \tag{22}
$$

By subtracting Equation (5) from Equation (22), we have

$$
\dot{e}\_z = Ae\_z + B(\dot{\mathfrak{u}} - \mathfrak{u}) + Be\_{f\_1} + Be\_{f\_2} \tag{23}
$$

where *ez* = *z*ˆ − *z*. By considering Equation (10a–c), we have

$$
\dot{e}\_z = (A - Ba)e\_z + Be\_{\theta\_1}^T z + Be\_{\theta\_2}^T z. \tag{24}
$$

Consider the Lyapunov function candidate as following

$$\mathcal{W} = \frac{1}{2}z^T z + \frac{1}{2}e\_{\theta\_1}^T e\_{\theta\_1} + \frac{1}{2}e\_{\theta\_2}^T e\_{\theta\_2} + \frac{\chi}{2}e\_z^T e\_z \tag{25}$$

where *χ* = *min*( <sup>2</sup>(*λ*1−*ρ*)*<sup>π</sup> <sup>ϕ</sup>*2*B*<sup>2</sup> , <sup>2</sup>(*λ*2−*ρ*)*<sup>π</sup> <sup>ϕ</sup>*2*B*<sup>2</sup> ), with *<sup>ρ</sup>* being the upper bound of the maximum eigenvalue of *PP*˙ <sup>−</sup>1, *<sup>π</sup>* being the lower bound of the minimum eigenvalue of *<sup>B</sup><sup>α</sup>* <sup>−</sup> *<sup>A</sup>*, *<sup>ϕ</sup>* being the upper bound of *z*.

When considering function *V* = <sup>1</sup> <sup>2</sup> *<sup>z</sup>Tz* and differentiating *<sup>V</sup>* with respect to time, we obtain

$$\dot{V} = z^T \dot{z} = -z^T (Ba + B\beta + B\gamma - A)z. \tag{26}$$

According to Equation (10d), *Bα* + *Bβ* + *Bγ* − *A* is positive definite if *Q* is positive definite, it follows lim*t*→∞*z*= 0. Therefore, *z* is bounded and we define *ϕ* as the upper bound of *z*. By differentiating Equation (25), with respect to time, and considering Equations (14), (19) and (24), we obtain

*W*˙ =*zTz*˙ + *e<sup>T</sup> θ*1 *<sup>e</sup>*˙*θ*<sup>1</sup> + *<sup>e</sup><sup>T</sup> θ*2 *<sup>e</sup>*˙*θ*<sup>2</sup> + *<sup>χ</sup>e<sup>T</sup> <sup>z</sup> e*˙*<sup>z</sup>* <sup>=</sup> <sup>−</sup> *<sup>z</sup>T*(*B<sup>α</sup>* <sup>+</sup> *<sup>B</sup><sup>β</sup>* <sup>+</sup> *<sup>B</sup><sup>γ</sup>* <sup>−</sup> *<sup>A</sup>*)*<sup>z</sup>* <sup>−</sup> *<sup>λ</sup>*1*e<sup>T</sup> θ*1 *<sup>e</sup>θ*<sup>1</sup> + *<sup>e</sup><sup>T</sup> θ*1 *PP*˙ <sup>−</sup>1*eθ*<sup>1</sup> <sup>−</sup> *<sup>λ</sup>*2*e<sup>T</sup> θ*2 *<sup>e</sup>θ*<sup>2</sup> + *<sup>e</sup><sup>T</sup> <sup>θ</sup>*2*PP*˙ <sup>−</sup>1*eθ*<sup>2</sup> <sup>−</sup> *<sup>χ</sup>e<sup>T</sup> <sup>z</sup>* (*B<sup>α</sup>* <sup>−</sup> *<sup>A</sup>*)*ez* <sup>+</sup> *<sup>χ</sup>e<sup>T</sup> <sup>z</sup> Be<sup>T</sup> θ*1 *z* + *χe<sup>T</sup> <sup>z</sup> Be<sup>T</sup> θ*2 *z* ≤ − *<sup>z</sup>T*(*B<sup>α</sup>* <sup>+</sup> *<sup>B</sup><sup>β</sup>* <sup>+</sup> *<sup>B</sup><sup>γ</sup>* <sup>−</sup> *<sup>A</sup>*)*<sup>z</sup>* <sup>−</sup> *<sup>λ</sup>*1*eθ*<sup>1</sup> 2+*ρeθ*<sup>1</sup> 2−*λ*2*eθ*<sup>2</sup> 2+*ρeθ*<sup>2</sup> <sup>2</sup> <sup>−</sup> *χπez*2+*χϕBezeθ*<sup>1</sup> +*χϕBezeθ*<sup>2</sup> <sup>=</sup> <sup>−</sup> *<sup>z</sup>T*(*B<sup>α</sup>* <sup>+</sup> *<sup>B</sup><sup>β</sup>* <sup>+</sup> *<sup>B</sup><sup>γ</sup>* <sup>−</sup> *<sup>A</sup>*)*<sup>z</sup>* − ( *<sup>λ</sup>*<sup>1</sup> <sup>−</sup> *<sup>ρ</sup>eθ*<sup>1</sup> −*χπ* <sup>2</sup> *ez*)<sup>2</sup> − 2 *<sup>λ</sup>*<sup>1</sup> − *<sup>ρ</sup> χπ* <sup>2</sup> *eθ*<sup>1</sup> *ez*+*χϕBezeθ*<sup>1</sup> − ( *<sup>λ</sup>*<sup>2</sup> <sup>−</sup> *<sup>ρ</sup>eθ*<sup>2</sup> −*χπ* <sup>2</sup> *ez*)<sup>2</sup> − 2 *<sup>λ</sup>*<sup>2</sup> − *<sup>ρ</sup> χπ* <sup>2</sup> *eθ*<sup>2</sup> *ez*+*χϕBezeθ*<sup>2</sup> ≤ − *<sup>z</sup>T*(*B<sup>α</sup>* <sup>+</sup> *<sup>B</sup><sup>β</sup>* <sup>+</sup> *<sup>B</sup><sup>γ</sup>* <sup>−</sup> *<sup>A</sup>*)*<sup>z</sup>* + (−2 *<sup>λ</sup>*<sup>1</sup> − *<sup>ρ</sup> χπ* <sup>2</sup> <sup>+</sup> *χϕB*)*ezeθ*<sup>1</sup> + (−2 *<sup>λ</sup>*<sup>2</sup> − *<sup>ρ</sup> χπ* <sup>2</sup> <sup>+</sup> *χϕB*)*ezeθ*<sup>2</sup> ≤0 (27)

According to Equations (26) and (27), we have lim*t*→∞*z*= 0, lim*t*→<sup>∞</sup> *ez* = 0. Therefore, *z*(*t*) is bounded and lim*t*→<sup>∞</sup> *e*˙*z* = 0. According to Equation (27), we have lim*t*→∞*eθ*<sup>1</sup> = 0, lim*t*→<sup>∞</sup> *eθ*<sup>2</sup> <sup>=</sup> 0. Because of *<sup>e</sup>θ*<sup>1</sup> <sup>=</sup> <sup>ˆ</sup> *<sup>θ</sup>*<sup>1</sup> <sup>−</sup> *<sup>θ</sup>*<sup>1</sup> = (−*β*ˆ)*<sup>T</sup>* <sup>−</sup> (−*β*)*<sup>T</sup>* <sup>=</sup> *<sup>β</sup><sup>T</sup>* <sup>−</sup> *<sup>β</sup>*ˆ*T*, *<sup>e</sup>θ*<sup>2</sup> = <sup>ˆ</sup> *<sup>θ</sup>*<sup>2</sup> <sup>−</sup> *<sup>θ</sup>*<sup>2</sup> = (−*γ*ˆ)*<sup>T</sup>* <sup>−</sup> (<sup>−</sup> *<sup>γ</sup>*)*<sup>T</sup>* <sup>=</sup> *<sup>γ</sup><sup>T</sup>* <sup>−</sup> *<sup>γ</sup>*<sup>ˆ</sup> *<sup>T</sup>*, we can obtain lim*t*→∞*β<sup>T</sup>* <sup>−</sup> *<sup>β</sup>*ˆ*T*<sup>=</sup> 0, lim*t*→∞*γ<sup>T</sup>* <sup>−</sup> *<sup>γ</sup>*<sup>ˆ</sup> *<sup>T</sup>*<sup>=</sup> 0. *<sup>β</sup>*, *<sup>γ</sup>* are assumed to be bounded, since they are the feedback gains of the

humans. Therefore, *β*ˆ, *γ*ˆ are also bounded. According to Equation (10a–c), *P*1, *P*<sup>2</sup> are also bounded. According to Equation (10d), *Ar* is bounded. Therefore, *P*, *α* and *u* are bounded.

According to Equation (10e), we can calculate the estimated errors *eQ*<sup>1</sup> <sup>=</sup> *<sup>Q</sup>*<sup>ˆ</sup> <sup>1</sup> <sup>−</sup> *<sup>Q</sup>*1, *eQ*<sup>2</sup> <sup>=</sup> *<sup>Q</sup>*<sup>ˆ</sup> <sup>2</sup> <sup>−</sup> *<sup>Q</sup>*2. *eQ*<sup>1</sup> ,*eQ*<sup>2</sup> are due to the errors *eP*,*eP*<sup>1</sup> ,*eP*<sup>2</sup> . Because *eP*,*eP*<sup>1</sup> ,*eP*<sup>2</sup> converge to zero, we have lim*t*→∞*eQ*<sup>1</sup> = 0, lim*t*→∞*eQ*<sup>2</sup> <sup>=</sup> 0, that is lim*t*→<sup>∞</sup> *<sup>Q</sup>*<sup>ˆ</sup> <sup>1</sup> <sup>=</sup> *<sup>Q</sup>*1, lim*<sup>t</sup>*→<sup>∞</sup> *<sup>Q</sup>*<sup>ˆ</sup> <sup>2</sup> <sup>=</sup> *<sup>Q</sup>*2.

Multiplying Equation (10d) by *z*ˆ*<sup>T</sup>* on the left side and by *z*ˆ on the right side, and considering Equation (13), we have

$$\begin{aligned} 0 &= \hat{\boldsymbol{z}}^{T} \boldsymbol{Q} \hat{\boldsymbol{z}} + \hat{\boldsymbol{z}}^{T} \boldsymbol{P}\_{\boldsymbol{r}} \boldsymbol{B} \boldsymbol{B}^{T} \boldsymbol{P}\_{\boldsymbol{r}} \hat{\boldsymbol{z}} + \hat{\boldsymbol{z}}^{T} \boldsymbol{P}\_{\boldsymbol{r}} \dot{\boldsymbol{z}} \\ &+ \hat{\boldsymbol{z}} \boldsymbol{P}\_{\boldsymbol{r}} \dot{\boldsymbol{z}}^{T} + \hat{\boldsymbol{z}}^{T} \boldsymbol{P}\_{\boldsymbol{r}} \boldsymbol{H} \boldsymbol{e}\_{\boldsymbol{z}} + \hat{\boldsymbol{z}} \boldsymbol{P}\_{\boldsymbol{r}} \boldsymbol{H} \boldsymbol{e}\_{\boldsymbol{z}}^{T} \\ &\equiv \boldsymbol{\theta} . \end{aligned} \tag{28}$$

Considering lim*t*→<sup>∞</sup> *ez* = 0, lim*t*→<sup>∞</sup> *e*˙*<sup>z</sup>* = 0, we can obtain

$$\begin{split} \lim\_{t \to \infty} \sigma &\equiv \lim\_{t \to \infty} (z^T Q Z + z^T P\_r B B^T P\_r z \\ &\quad + z^T P\_r \dot{z} + z P\_r \dot{z}^T) \\ &= 0. \end{split} \tag{29}$$

Similarly, we can obtain

$$\begin{aligned} \lim\_{t \to \infty} \sigma\_1 &\equiv \lim\_{t \to \infty} (z^T Q\_1 Z + z^T P\_1 B B^T P\_1 z \\ &\quad + z^T P\_1 \dot{z} + z P\_1 \dot{z}^T) \\ &\quad = 0 \\ \lim\_{t \to \infty} \sigma\_2 &\equiv \lim\_{t \to \infty} (z^T Q\_2 Z + z^T P\_2 B B^T P\_2 z \\ &\quad + z^T P\_2 \dot{z} + z P\_2 \dot{z}^T) \\ &\quad = 0. \end{aligned} \tag{30}$$

lim*t*→<sup>∞</sup> *σ* = 0, lim*t*→<sup>∞</sup> *σ*<sup>1</sup> = 0 and lim*t*→<sup>∞</sup> *σ*<sup>2</sup> = 0 indicate that the Nash equilibrium is achieved for the human-robot-human interaction system.

### **4. Simulations and Results**

### *4.1. Experimental Design and Ssimulation Settings*

With the development of the robot technology, in the future, robots will enter our homes and become a member of family in our daily lives. In our daily lives, we often need to carry various objects. Some objects (e.g., objects with smaller size and lower weight) can be successfully carried by one human; some objects (e.g., objects with medium size and medium weight) need to be carried successfully by two humans; some objects (e.g., objects with larger size and higher weight) can be carried successfully by three or more humans. Consider one scenario: In our home, we have an object (such as a table with a relatively larger size and higher weight) that need to be carried by three humans. However, there are only two humans in the home. In this case, we can let the robot help us carry the object together with the two humans. The robot can play the same role as one human. A simulation is conducted with CoppeliaSim in order to verify the control performance of the controller proposed in this paper. The version of CoppeliaSim that we used is CoppeliaSim 4.0.0 (CoppeliaSim Edu, Windows). Figure 3 demonstrates the CoppeliaSim simulation scenario of cooperative object transporting task. The humans cooperate with the robot to transport the object between −10 cm and +10 cm back and forth along the horizontal direction.

**Figure 3.** Simulation of cooperative object transporting task. The humans cooperate with the robot to transport the object back and forth between −10 cm and +10 cm along the horizontal direction. The forces that are exerted by the humans on the object are measured by force sensors at the interaction point.

The controller that is proposed in this paper implements interactive control because every agent considers the control of other partners. In order to present the advantages of the proposed controller, we compare the proposed controller with the linear quadratic regulators (LQR) optimal controller. The LQR controller can be obtained by setting *Ar* = *A*, *A*<sup>1</sup> = *A*, *A*<sup>2</sup> = *A* in Equation (10d–f). The LQR controller allows each agent to form its own control input optimally, but it ignores the controls of other partners. Let *Q* = *Q*<sup>1</sup> = *Q*<sup>2</sup> = *diag*(100, 0).

The cost functions of the humans usually change during the physical human-robot-human interaction. The robot needs to identify the change and adaptively adjust its own cost function in order to complete the cooperative object transporting task. In order to verify the ability of the robot to adaptively interact with two humans when humans' cost functions change, we simulated a scenario where the robot cooperated with the humans to perform an object transporting task. The task performance is achieved by setting the value of *C* in Equation (21). Let *C* = *diag*(300, 0). The cost functions of the human 1 and the human 2 change randomly according to *Q*<sup>1</sup> = *diag*(50, 0) + *ρ* · *diag*(50, 0), *Q*<sup>2</sup> = *diag*(50, 0) + *ρ* · *diag*(50, 0) ( *ρ* is a uniformly distributed random number between [0, 1]).

The human-robot-human cooperative object transporting task can be fulfilled with less effort with the proposed controller. In order to make this affirmation, we made a comparison with a human-robot cooperative object transporting task. In simulation of the human-robot-human cooperative object transporting task, we let *Q* = *Q*<sup>1</sup> = *Q*<sup>2</sup> = *diag*(100, 0). In simulation of the human-robot cooperative object transporting task, we let *Q* = *diag*(100, 0), *Q*<sup>1</sup> = *diag*(100, 0), *Q*<sup>2</sup> = *diag*(0, 0).

We assume that the humans and the robot do not have prior knowledge of each other (thus, initially *<sup>α</sup>*<sup>ˆ</sup> <sup>≡</sup> 0, *<sup>β</sup>*<sup>ˆ</sup> <sup>≡</sup> 0, *<sup>γ</sup>*<sup>ˆ</sup> <sup>≡</sup> 0 ). The control input of the robot are generated by Equations (5), (10a–f), (13), (15), (18) and (20). The simulated interaction forces *f*1, *f*<sup>2</sup> of the human 1 and the human 2 are generated by a similar set of equations. The simulation time is 40 s. Let the inertia of the robot *Md* = 6 kg, the damping of the robot *Cd* <sup>=</sup> <sup>−</sup>0.2 *<sup>N</sup>* · <sup>m</sup>−<sup>1</sup> [19], the real-time least squares algorithm forgetting factor *λ*<sup>1</sup> = *λ*<sup>2</sup> = 0.95. Simulation time step is 0.005 s.

### *4.2. Results*

Figure 4 depicts the change in position of the end effector with respect to time. The results plotted in Figure 4 is a smooth curve that looks like a sinusoidal signal. This smooth curve is determined by Equation (3). In Equation (3), *u*(*t*), *f*1(*t*), *f*2(*t*) are iteratively calculated by our proposed controller based on game theory. Due to the fact that the humans and the robot do not transport the object at a constant speed using our method, the end effector follows a curve signal rather than a straight line signal. As can be seen from Figure 4, the end effector can reach the target position with the proposed controller which means that the cooperative object transporting task is successfully fulfilled. In contrast, the end effector can not reach the target position with the LQR controller, which means that

the cooperative object transporting task is not successfully fulfilled. The reason why the cooperative object transporting task can be successfully fulfilled with the proposed controller rather than with the LQR controller is that the proposed controller considers the interaction with other partners. When one partner decreases effort, the other partners will gradually increase their efforts to ensure the successful fulfillment of the cooperative object transporting task. In contrast, the LQR controller does not consider the interaction with other partners, so the cooperative object transporting task cannot be guaranteed to be successfully fulfilled.

**Figure 4.** The end effector position value.

In Figure 5, we can see that the estimated humans' feedback gains converge to the real values in a few seconds. This means that the humans' feedback gains can be successfully estimated by the proposed method.

**Figure 5.** Control gains of humans. (**a**) the position error feedback gain of the human 1. (**b**) the velocity feedback gain of the human 1. (**c**) the position error feedback gain of the human 2. (**d**) the velocity feedback gain of the human 2.

Figure 6 demonstrates that fulfilling the cooperative object transporting task requires larger control gains *β*, *γ* with the LQR controller compared with the controller proposed in this paper. It means that accomplishing the same task requires less effort using the proposed controller. This is because that the proposed controller considers the interaction with other partners and calculates the minimal effort for the humans and the robot to complete the task. In contrast, the LQR controller doesn't consider the interaction with other partners, so the humans and the robot only minimize their own cost function and may, therefore, require larger effort.

**Figure 6.** Humans' control gains (**a**) the position error feedback gain of the human 1. (**b**) the velocity feedback gain of the human 1. (**c**) the position error feedback gain of the human 2. (**d**) the velocity feedback gain of the human 2.

The feedback gains are affected by the state weights of the cost functions. In order to verify the advantages of the proposed controller when the state weights vary, we let *Q*<sup>1</sup> vary from 0 to 10*Q* with *Q*<sup>2</sup> = *diag*(100, 0) and let *Q*<sup>2</sup> vary from 0 to 10*Q* with *Q*<sup>1</sup> = *diag*(100, 0) respectively. It can be seen from Figure 7 that accomplishing the same task always requires less effort using the proposed controller. We can also see that the difference between the control gains with our proposed controller and the control gains with LQR controller becoming smaller when *Q*1/*Q* or *Q*2/*Q* increases, this is because that the robot's relative influence decreases.

From Figures 4–7, we can conclude that the human-robot-human cooperative object transporting task can be fulfilled with less effort and the system can be kept stable using the proposed controller.

It can be seen from Figure 8 that, when the cost functions of the human 1 and the human 2 change, the cost function of the robot will also change adaptively. When the sum of the state weights of the human 1 and the human 2 *Q*<sup>1</sup> + *Q*<sup>2</sup> increases, the state weight of the Robot *Q* decreases accordingly. Conversely, when the sum of the state weights of the human 1 and the human 2 *Q*<sup>1</sup> + *Q*<sup>2</sup> decreases, the state weight of the robot *Q* increases accordingly. The reason why the robot can change adaptively is that we set the constant *C* value in Equation (21). Equation (21) makes the proposed controller able to adjust the contributions between the humans and the robot and makes the humans and the robot take complementary roles as well.

Figure 9 shows that, using the proposed controller, the adaptive cooperative object transporting task can be fulfilled and the system can be kept stable.

**Figure 7.** Control gains for different values of humans' state weights. (**a**) and (**b**) the state weight of the human 1 vary. (**c**) and (**d**) the state weight of the human 2 vary.

**Figure 8.** Humans' state weights. (**a**) the state weight of the human 1. (**b**) the state weight of the human 2. (**c**) the sum of the state weights of the human 1 and human 2. (**d**) the state weight of the robot.

From Figures 8 and 9, we can conclude that the adaptive cooperative object transporting task can be fulfilled with the proposed controller. During the physical interaction, the robot can successfully identify the change of each human's cost function, and then adaptively adjust its own cost function to achieve interactive optimal control.

Figure 10 demonstrates that fulfilling the human-robot-human cooperative object transporting task requires smaller control gains *βe*, *β<sup>v</sup>* as compared with the human-robot cooperative object transporting task. It means that accomplishing the same task requires less effort by means of the human-robot-human physical interaction. This is because the human-robot-human cooperative object transporting task considers the interaction with more partners (two partners) and calculates minimal effort for the humans and the robot to complete the task. In contrast, the human-robot cooperative object transporting task consider the interaction with less partners (only one partner), so the human and the robot may therefore require larger effort.

**Figure 9.** The end effector position value. (**a**) The end effector position value in Trial 1. (**b**) The end effector position value in Trial 2. (**c**) The end effector position value in Trial 3. (**d**) The end effector position value in Trial 4.

**Figure 10.** Humans' control gains. The dashed lines correspond to the human-robot cooperative object transporting task. The solid lines correspond to the human-robot-human cooperative object transporting task. (**a**) the position error feedback gain of the human 1. (**b**) the velocity feedback gain of the human 1. (**c**) the position error feedback gain of the human 2. (**d**) the velocity feedback gain of the human 2.

### **5. Conclusions**

In this paper, the human-robot-human physical interaction problem has been studied. An adaptive optimal control framework for the human-robot-human physical interaction has been proposed based on N-player game theory. The recursive least squares algorithm based on forgetting factor has been used to identify unknown control parameters of the humans online. The performance of the controller proposed in this paper has been verified by simulations of cooperative object transporting task. The simulation results show that the proposed controller can achieve adaptive optimal control during the interaction between the robot and two humans and keep the system stable. Compared with the LQR controller, the proposed controller has more superior performance. Compared with the human-robot physical interaction, accomplishing the same cooperative object transporting task requires less effort by means of the human-robot-human physical interaction based on the approach proposed in the paper. Although this paper only conducts simulations on the physical interaction between one robot and two humans, it is worth mentioning that the framework that is proposed in this paper has the potential to be generalized to the situation where multiple robots physically interact with multiple humans. As future work, we will extend the framework to the interaction between multiple robots and multiple humans.

**Author Contributions:** R.Z. conceived the original ideas, designed all the experiments, and subsequently drafted the manuscript. Y.L. provided supervision and funding support for the project. J.Z. provided supervision and funding support for the project. H.C. provided supervision. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by the Major Research Plan of the National Natural Science Foundation of China under Grant 91948201.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Sensors* Editorial Office E-mail: sensors@mdpi.com www.mdpi.com/journal/sensors

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel: +41 61 683 77 34

www.mdpi.com

ISBN 978-3-0365-5214-9