Next Article in Journal
Bathymetric Monitoring of Alluvial River Bottom Changes for Purposes of Stability of Water Power Plant Structure with a New Methodology for River Bottom Hazard Mapping (Wloclawek, Poland)
Next Article in Special Issue
Development and Application of a Tandem Force Sensor
Previous Article in Journal
A Soft Sensor Approach Based on an Echo State Network Optimized by Improved Genetic Algorithm
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Letter

A Framework for Human-Robot-Human Physical Interaction Based on N-Player Game Theory

State Key Laboratory of Robotics and Systems, Harbin Institute of Technology, Harbin 150001, China
*
Author to whom correspondence should be addressed.
Current address: 92 Xidazhi Street, Nangang District, State Key Laboratory of Robotics and Systems, Harbin Institute of Technology, Harbin 150001, China.
Sensors 2020, 20(17), 5005; https://doi.org/10.3390/s20175005
Submission received: 31 July 2020 / Revised: 31 August 2020 / Accepted: 1 September 2020 / Published: 3 September 2020
(This article belongs to the Special Issue Human-Robot Collaborations in Industrial Automation)

Abstract

:
In order to analyze the complex interactive behaviors between the robot and two humans, this paper presents an adaptive optimal control framework for human-robot-human physical interaction. N-player linear quadratic differential game theory is used to describe the system under study. N-player differential game theory can not be used directly in actual scenerie, since the robot cannot know humans’ control objectives in advance. In order to let the robot know humans’ control objectives, the paper presents an online estimation method to identify unknown humans’ control objectives based on the recursive least squares algorithm. The Nash equilibrium solution of human-robot-human interaction is obtained by solving the coupled Riccati equation. Adaptive optimal control can be achieved during the human-robot-human physical interaction. The effectiveness of the proposed method is demonstrated by rigorous theoretical analysis and simulations. The simulation results show that the proposed controller can achieve adaptive optimal control during the interaction between the robot and two humans. Compared with the LQR controller, the proposed controller has more superior performance.

1. Introduction

In the past decade, physical human-robot interaction has attracted the attention of the research community due to the urgent requirement for robot technology in unstructured environment [1,2,3,4]. Physical human-robot interaction combines the advantages of humans and robots, which means that humans are good at reasoning and problem solving with high flexibility, while robots perform well in terms of execution as well as guaranteeing the accuracy of task execution [5,6]. The combination of these advantages has led to the wide application of physical human-robot interaction, such as teleoperation [7,8], collaborative assembly [9,10], and collaborative transportation [11,12,13].
Two types of specific human robot interaction strategies have been widely studied: co-activity type of interaction strategy and master-slave control strategy [14,15]. Co-activity type of interaction strategy is used in typical rehabilitation robots that help limb movement training or intelligent industrial systems that support heavy objects to resist gravity, where robots completely ignore human users’ behaviors [16,17]. In contrast, the master–slave control strategy is used in the teleoperated robots or force extender exoskeletons use where robots completely follow the control of human users [18]. However, these strategies can only be used for specific interactive behaviors, the general framework for analyzing various interactive behaviors between robot and humans is still missing [19,20].
It has been pointed out that game theory can be used as a general framework to analyze complex interactive behaviors between multiple agents because different combinations of individual cost functions and different optimization objectives can be used to describe various interactive behaviors in game theory [21]. In [22], the human and the robot were been regarded as two agents and game theory was used in order to analyze the performance of the two agents. In [23], the optimal control was obtained for a given game with a linear system cost function by solving the coupled Riccati equation. In [24], an optimal control algorithm was developed for human-robot collaboration by solving the Riccati equation in each loop. In [25,26,27,28], policy iteration was used to solve the Nash equilibrium solution in order to improve the calculation speed. In [29], cyber-physical human systems was modeled via an interplay between reinforcement learning and game theory. In [30], haptic shared control for human-robot collaboration was modeled by a game-theoretical approach. In [31], human-like motion planning was studied based on game theoretic decision making. In [32], cooperative game was used for human-robot collaborative manufacturing. In [33], a bayesian framework was proposed for nash equilibrium inference in human-robot parallel play. In [19], non-cooperative differential game theory was used to model human-robot interaction system that results in a variety of interaction strategies. However, the above studies only consider two agents, that is, the interaction between one human and one robot. Therefore, the aforementioned methods are not suitable for human-robot-human physical interaction where more than one human interact with one robot physically. It is worth noting that the physical interaction between one robot and two humans will bring greater advantages such as operating larger loads, improving the flexibility and robustness of the system [28,34,35,36,37]. These greater advantages are brought by the team collaboration between the robot and two humans. To the authors’ acknowledgment, no literature have researched the problem of the physical interaction between one robot and two humans based on game theory.
In the paper, a general adaptive optimal control framework for human-robot-human physical interaction is proposed based on N-player game theory. Accordingly, the robot and two humans can interact with each other optimally by learning each other’s control. N-player differential game theory was used to model the human-robot-human interaction system in order to analyze the complex interactive behaviors between the robot and two humans. In N-player differential game theory, humans’ control objectives are assumed to be knowledge [38,39]. However, N-player differential game theory can not be used directly in actual scenerie since the robot cannot know humans’ control objectives in advance. In order to let the robot know humans’ control objectives, the paper presents an online estimation method to identify unknown humans’ control objectives based on the recursive least squares algorithm. Subsequently, the Nash equilibrium solution of the multi-human robot physical interaction is obtained by solving the coupled Riccati equation to achieve coupled optimization. Finally, the effectiveness of the proposed method is demonstrated by rigorous theoretical analysis and simulation experiments. This paper makes the following four contributions.
(1)
N-player differential game theory is firstly used to model the human-robot-human interaction system.
(2)
An online estimation method to identify unknown humans’ control objectives based on the recursive least squares algorithm is presented.
(3)
A general adaptive optimal control framework for human-robot-human physical interaction is propose based on (1) and (2).
(4)
The effectiveness of the proposed method is demonstrated by rigorous theoretical analysis and simulation experiments.
The remainder of this paper is organized, as follows: Section 2 models the human-robot-human physical interaction system based on N-player differential game theory. Section 3 establishes an adaptive optimal control law, and the control performance of the system is analyzed theoretically. Section 4 verifies the effectiveness of the proposed method through simulation experiments. Finally, Section 5 concludes this work.

2. Problem Formulation

2.1. System Description

The system considered contains two humans and one robot. An example scenario is shown in Figure 1, where the robot and the humans collaborate to perform an object transporting task. In this shared control task, when the control objectives of humans’ change, the robot should recognize the humans’ control objectives and response adaptively and optimally. The forces exerted by the humans on the object are measured by force sensors at the interaction point. It is worth noting that the humans’ control objectives are unknown to the robot.
The forward kinematics of the robot are described as
x ( t ) = ϕ ( q ( t ) )
where x ( t ) R m and q ( t ) R n are the positions in Cartesian space and joint space respectively, m and n are degrees of freedom. Derivation of Equation (1) with time can be obtained
x ˙ ( t ) = J ( q ( t ) ) q ˙ ( t )
where J ( q ( t ) ) R m × n is the Jacobian matrix.
The following impedance model is given in Cartesian space
M d x ¨ ( t ) + C d x ˙ ( t ) = u ( t ) + f 1 ( t ) + f 2 ( t )
where M d R m × m is the desired inertial matrix, C d R m × m is the damping matrix, u ( t ) R m is the control input in the Cartesian space [40,41,42], f 1 ( t ) R n is the contact force between object and human 1, f 2 ( t ) R n is the contact force between object and human 2.
To track a common and fixed target x d R m ( x ˙ d R m ) in cooperative object transporting task, Equation (3) can be transformed, as following
M d ( x ¨ ( t ) x ¨ d ) + C d ( x ˙ ( t ) x ˙ d ) = u ( t ) + f 1 ( t ) + f 2 ( t ) .
In order to ease the design of the control, Equation (4) can be rewritten as the following state-space form
z ˙ = A z + B 1 u + B 2 f 1 + B 3 f 2 z = x ( t ) x d x ˙ ( t ) , A = 0 m 1 m 0 m M d 1 C d B 1 = B 2 = B 3 = B = 0 m M d 1
where 0 m and 1 m denote m × m zero and unit matrices, respectively.

2.2. Problem Formulation

According to non-cooperative differential game theory, in the paper, the interaction between the robot and the humans is described as a game between N players (in this paper, N = 3 ) [43]. In the game, each player will minimize their respective cost function
Γ t 0 z T Q z + u T u d t Γ 1 t 0 z T Q 1 z + f 1 T f 1 d t Γ 2 t 0 z T Q 2 z + f 2 T f 2 d t Q = Q 01 0 n × n 0 n × n Q 02 Q 1 = Q 11 0 n × n 0 n × n Q 12 Q 2 = Q 21 0 n × n 0 n × n Q 22
where Γ , Γ 1 , Γ 2 are cost functions of the robot, human 1, and human 2, respectively, Q , Q 1 , Q 2 are state weights matrices of the robot, human 1 and human 2, respectively. Each player achieves the cooperative object transporting task by minimizing the error to the target while minimizing their own costs. Q , Q 1 , Q 2 contain two components corresponding to position regulation and velocity, respectively. Q 01 , Q 11 , Q 21 correspond to position regulation and Q 02 , Q 12 , Q 22 correspond to velocity.
In [27], the N-player game has been studied if the cost functions are known. However, Γ 1 , Γ 2 are unknown to the robot because they are determined by the humans. Therefore, a method is proposed in the paper to estimate Γ 1 , Γ 2 in order to achieve adaptive optimal control and, thus, the human-robot-human cooperative object transporting task.

2.3. N-Player Differential Game Theory

Based on the differential game theory of linear systems, for N-player game the following linear differential equation [43] is considered:
z ˙ = A z + B 1 u 1 + + B N u N , z ( 0 ) = z 0 .
Each player has a quadratic cost function that they want to minimize:
Γ i = 0 z T Q i z + u i T u i d t , i = 1 , , N
Different types of multi-agent behaviors are defined in game theory, which can be achieved through different concepts of game equilibrium [44,45]. In this paper, Nash equilibrium is considered. In the sense of Nash equilibrium, each player minimizes their cost function:
u i = η i z , η i = B i T P i ( A j i N B i η i ) T P i + P i ( A j i N B i η i ) i + Q i P i B i B i T P = 0 , i = 1 , , N
where N is equal to 3 in this paper. In the sense of Nash equilibrium, the humans and the robot minimizes their own cost function:
u = α z α = B T P r
f 1 = β z β = B T P 1
f 2 = γ z γ = B T P 1
A r T P r + P r A r + Q P r B B T P r = 0 2 n , A r = A B β B γ
A 1 T P 1 + P 1 A 1 + Q P 1 B B T P 1 = 0 2 n , A 1 = A B α B γ
A 2 T P 2 + P 2 A 2 + Q P 2 B B T P 2 = 0 2 n , A 2 = A B α B β
where α α e , α v is the feedback gain of the robot, β β e , β v is the feedback gain of the human 1, γ γ e , γ v is the feedback gain of of the human 2. α e , β e , γ e are the position error gains, α v , β v , γ v are the velocity gains, P r , P 1 , P 2 are the solutions of the above well-known Riccati equation consisting of Equation (10d–f). The robot and the humans influence each other through A r , A 1 , and A 2 in order to achieve the interactive control and the coupling optimization.
β , γ are unknown to the robot. Therefore, we aim to propose a method to estimate them in the following section.

3. Adaptive Optimal Control

A recursive least squares algorithm with forgetting factors is used in this paper to get the estimate β ^ , γ ^ of β , γ in order to estimate the feedback gains of the humans in real time and avoid the data saturation phenomenon caused by the standard least squares algorithm [46]. Subsequently, the estimate Q ^ 1 , Q ^ 2 of Q 1 , Q 2 can be obtained using Equation (10e,f).
Equation (10e) is used as the model for identification. For convenience, we let θ 1 = β T , y 1 = f 1 T , W = z T . Subsequently, Equation (10b) can be rewritten as
y 1 = W θ 1 .
The feedback gain of the human 1 are estimated by minimizing the total prediction error
J 1 = 0 t e x p ( λ 1 t ) y 1 ( s ) W ( s ) θ ^ 1 2 d s
where λ 1 is the constant forgetting factor. The update rule of the parameter θ 1 can be obtained as
θ ^ ˙ 1 = P W T e 1 P ˙ = λ 1 P P W T W P e 1 = y ^ 1 y 1 .
The estimated error of θ ^ 1 is
e θ 1 ( t ) = e x p ( λ 1 t ) P ( t ) P 1 ( 0 ) e θ 1 ( 0 ) .
Thus, the estimate β ^ can be obtained as
β ^ = θ ^ 1 T .
Similarly, we let θ 2 = γ T , y 2 = f 2 T , W = z T . Afterwards, Equation (10c) can be rewritten as
y 2 = W θ 2 .
The feedback gain of the human 2 are estimated by minimizing the total prediction error
J 2 = 0 t e x p ( λ 2 t ) y 2 ( s ) W ( s ) θ ^ 2 2 d s
where λ 2 is the constant forgetting factor. The update rule of the parameter θ 2 can be obtained as
θ ^ ˙ 2 = P W T e 2 P ˙ = λ 2 P P W T W P e 1 = y ^ 2 y 2 .
The estimated error of θ ^ 2 is
e θ 2 ( t ) = e x p ( λ 2 t ) P ( t ) P 1 ( 0 ) e θ 2 ( 0 ) .
Thus, the estimate γ ^ can be obtained as
γ ^ = θ ^ 2 T .
Equations (13), (15), (18) and (20) are critical, because they enable each agent to recognize their partners’ control objectives and use Equation (10a–f) to adjust their own control.
In order to ensure the performance of cooperative object transporting task, we let
Q + Q 1 + Q 2 C
where C is the total weight. The cooperative object transporting task fixes the task performance through the total weight C and uses Equation (21) to share the the effort between 2 humans and the robot. Equation (21) makes the proposed controller be able to adjust the contributions between the humans and the robot and makes the humans and the robot take complementary roles as well.
The control architecture is shown in Figure 2.
A pseudo-code summarizes the implementation procedures of the proposed method as Algorithm 1.
Algorithm 1 Adaptive optimal control algorithm based on N-player game
Input: Current state z, target x d .
Output: Robot’s control input u, estimated the humans’ cost function state weight Q ^ 1 , Q ^ 2 in Equation (10e,f).
Begin
  Define x d , initialize Q , Q ^ 1 , Q ^ 2 , u , f 1 , f 2 , z ^ , α , β ^ , γ ^ , P r , P ^ 1 , P ^ 2 , set λ 1 in Equation (13), λ 2 in Equation (18), C in
  Equation (21), the terminal time t f of one trial.
  While  t < t f do
    Measure the position x ( t ) , velocity x ˙ ( t ) , and form z.
    Update β ^ using Equations (13) and (15), Update γ ^ using Equations (18) and (20).
    Solve the Riccati equation in Equation (10d) to obtain P, and calculate the robot’s control input u.
    Calculate estimated the humans’ cost function state weights Q ^ 1 , Q ^ 2 in Equation (10e,f) using the Riccati equation.
    Compute robot’s cost function state weight Q according to Equation (21).
Theorem 1. 
Consider the robot dynamics shown in Equation (5). If the robot and the humans estimate the parameters of their partners’ controller and adjust their own control according to Equations (10a–f), (13), (15), (18), (20) and (21), then the following conclusions will be drawn:
  • The closed-loop system is stable, and z , α , β ^ , γ ^ , u are bounded.
  • lim x Q ^ 1 = Q 1 , lim x Q ^ 2 = Q 2 , which indicate that Q ^ 1 , Q ^ 2 converge to the correct values Q 1 , Q 2 , if z is persistently exciting.
  • The Nash equilibrium is achieved for th human-robot-human interaction system.
Proof of Theorem 1. 
β ^ , γ ^ influence u , f 1 , f 2 , z as following:
z ^ ˙ = A z ^ + B u ^ + B f 1 + B 2 .
By subtracting Equation (5) from Equation (22), we have
e ˙ z = A e z + B ( u ^ u ) + B e f 1 + B e f 2
where e z = z ^ z . By considering Equation (10a–c), we have
e ˙ z = ( A B α ) e z + B e θ 1 T z + B e θ 2 T z .
Consider the Lyapunov function candidate as following
W = 1 2 z T z + 1 2 e θ 1 T e θ 1 + 1 2 e θ 2 T e θ 2 + χ 2 e z T e z
where χ = m i n ( 2 ( λ 1 ρ ) π φ 2 B 2 , 2 ( λ 2 ρ ) π φ 2 B 2 ) , with ρ being the upper bound of the maximum eigenvalue of P ˙ P 1 , π being the lower bound of the minimum eigenvalue of B α A , φ being the upper bound of z .
When considering function V = 1 2 z T z and differentiating V with respect to time, we obtain
V ˙ = z T z ˙ = z T ( B α + B β + B γ A ) z .
According to Equation (10d), B α + B β + B γ A is positive definite if Q is positive definite, it follows lim t z = 0 . Therefore, z is bounded and we define φ as the upper bound of z . By differentiating Equation (25), with respect to time, and considering Equations (14), (19) and (24), we obtain
W ˙ = z T z ˙ + e θ 1 T e ˙ θ 1 + e θ 2 T e ˙ θ 2 + χ e z T e ˙ z = z T ( B α + B β + B γ A ) z λ 1 e θ 1 T e θ 1 + e θ 1 T P ˙ P 1 e θ 1 λ 2 e θ 2 T e θ 2 + e θ 2 T P ˙ P 1 e θ 2 χ e z T ( B α A ) e z + χ e z T B e θ 1 T z + χ e z T B e θ 2 T z z T ( B α + B β + B γ A ) z λ 1 e θ 1 2 + ρ e θ 1 2 λ 2 e θ 2 2 + ρ e θ 2 2 χ π e z 2 + χ φ B e z e θ 1 + χ φ B e z e θ 2 = z T ( B α + B β + B γ A ) z ( λ 1 ρ e θ 1 χ π 2 e z ) 2 2 λ 1 ρ χ π 2 e θ 1 e z + χ φ B e z e θ 1 ( λ 2 ρ e θ 2 χ π 2 e z ) 2 2 λ 2 ρ χ π 2 e θ 2 e z + χ φ B e z e θ 2 z T ( B α + B β + B γ A ) z + ( 2 λ 1 ρ χ π 2 + χ φ B ) e z e θ 1 + ( 2 λ 2 ρ χ π 2 + χ φ B ) e z e θ 2 0
According to Equations (26) and (27), we have lim t z = 0 , lim t e z = 0 . Therefore, z ( t ) is bounded and lim t e ˙ z = 0 . According to Equation (27), we have lim t e θ 1 = 0 , lim t e θ 2 = 0 . Because of e θ 1 = θ ^ 1 θ 1 = ( β ^ ) T ( β ) T = β T β ^ T , e θ 2 = θ ^ 2 θ 2 = ( γ ^ ) T ( γ ) T = γ T γ ^ T , we can obtain lim t β T β ^ T = 0 , lim t γ T γ ^ T = 0 . β , γ are assumed to be bounded, since they are the feedback gains of the humans. Therefore, β ^ , γ ^ are also bounded. According to Equation (10a–c), P 1 , P 2 are also bounded. According to Equation (10d), A r is bounded. Therefore, P , α and u are bounded.
According to Equation (10e), we can calculate the estimated errors e Q 1 = Q ^ 1 Q 1 , e Q 2 = Q ^ 2 Q 2 . e Q 1 , e Q 2 are due to the errors e P , e P 1 , e P 2 . Because e P , e P 1 , e P 2 converge to zero, we have lim t e Q 1 = 0 , lim t e Q 2 = 0 , that is lim t Q ^ 1 = Q 1 , lim t Q ^ 2 = Q 2 .
Multiplying Equation (10d) by z ^ T on the left side and by z ^ on the right side, and considering Equation (13), we have
0 = z ^ T Q z ^ + z ^ T P r B B T P r z ^ + z ^ T P r z ^ ˙ + z ^ P r z ^ ˙ T + z ^ T P r H e z + z ^ P r H e z T σ ^ .
Considering lim t e z = 0 , lim t e ˙ z = 0 , we can obtain
lim t σ lim t ( z T Q Z + z T P r B B T P r z + z T P r z ˙ + z P r z ˙ T ) = 0 .
Similarly, we can obtain
lim t σ 1 lim t ( z T Q 1 Z + z T P 1 B B T P 1 z + z T P 1 z ˙ + z P 1 z ˙ T ) = 0 lim t σ 2 lim t ( z T Q 2 Z + z T P 2 B B T P 2 z + z T P 2 z ˙ + z P 2 z ˙ T ) = 0 .
lim t σ = 0 , lim t σ 1 = 0 and lim t σ 2 = 0 indicate that the Nash equilibrium is achieved for the human-robot-human interaction system. ☐

4. Simulations and Results

4.1. Experimental Design and Ssimulation Settings

With the development of the robot technology, in the future, robots will enter our homes and become a member of family in our daily lives. In our daily lives, we often need to carry various objects. Some objects (e.g., objects with smaller size and lower weight) can be successfully carried by one human; some objects (e.g., objects with medium size and medium weight) need to be carried successfully by two humans; some objects (e.g., objects with larger size and higher weight) can be carried successfully by three or more humans. Consider one scenario: In our home, we have an object (such as a table with a relatively larger size and higher weight) that need to be carried by three humans. However, there are only two humans in the home. In this case, we can let the robot help us carry the object together with the two humans. The robot can play the same role as one human. A simulation is conducted with CoppeliaSim in order to verify the control performance of the controller proposed in this paper. The version of CoppeliaSim that we used is CoppeliaSim 4.0.0 (CoppeliaSim Edu, Windows). Figure 3 demonstrates the CoppeliaSim simulation scenario of cooperative object transporting task. The humans cooperate with the robot to transport the object between −10 cm and +10 cm back and forth along the horizontal direction.
The controller that is proposed in this paper implements interactive control because every agent considers the control of other partners. In order to present the advantages of the proposed controller, we compare the proposed controller with the linear quadratic regulators (LQR) optimal controller. The LQR controller can be obtained by setting A r = A , A 1 = A , A 2 = A in Equation (10d–f). The LQR controller allows each agent to form its own control input optimally, but it ignores the controls of other partners. Let Q = Q 1 = Q 2 = d i a g ( 100 , 0 ) .
The cost functions of the humans usually change during the physical human-robot-human interaction. The robot needs to identify the change and adaptively adjust its own cost function in order to complete the cooperative object transporting task. In order to verify the ability of the robot to adaptively interact with two humans when humans’ cost functions change, we simulated a scenario where the robot cooperated with the humans to perform an object transporting task. The task performance is achieved by setting the value of C in Equation (21). Let C = d i a g ( 300 , 0 ) . The cost functions of the human 1 and the human 2 change randomly according to Q 1 = d i a g ( 50 , 0 ) + ρ · d i a g ( 50 , 0 ) , Q 2 = d i a g ( 50 , 0 ) + ρ · d i a g ( 50 , 0 ) ( ρ is a uniformly distributed random number between [ 0 , 1 ] ).
The human-robot-human cooperative object transporting task can be fulfilled with less effort with the proposed controller. In order to make this affirmation, we made a comparison with a human-robot cooperative object transporting task. In simulation of the human-robot-human cooperative object transporting task, we let Q = Q 1 = Q 2 = d i a g ( 100 , 0 ) . In simulation of the human-robot cooperative object transporting task, we let Q = d i a g ( 100 , 0 ) , Q 1 = d i a g ( 100 , 0 ) , Q 2 = d i a g ( 0 , 0 ) .
We assume that the humans and the robot do not have prior knowledge of each other (thus, initially α ^ 0 , β ^ 0 , γ ^ 0 ). The control input of the robot are generated by Equations (5), (10a–f), (13), (15), (18) and (20). The simulated interaction forces f 1 , f 2 of the human 1 and the human 2 are generated by a similar set of equations. The simulation time is 40 s. Let the inertia of the robot M d = 6 kg, the damping of the robot C d = 0.2 N · m 1 [19], the real-time least squares algorithm forgetting factor λ 1 = λ 2 = 0.95 . Simulation time step is 0.005 s.

4.2. Results

Figure 4 depicts the change in position of the end effector with respect to time. The results plotted in Figure 4 is a smooth curve that looks like a sinusoidal signal. This smooth curve is determined by Equation (3). In Equation (3), u ( t ) , f 1 ( t ) , f 2 ( t ) are iteratively calculated by our proposed controller based on game theory. Due to the fact that the humans and the robot do not transport the object at a constant speed using our method, the end effector follows a curve signal rather than a straight line signal. As can be seen from Figure 4, the end effector can reach the target position with the proposed controller which means that the cooperative object transporting task is successfully fulfilled. In contrast, the end effector can not reach the target position with the LQR controller, which means that the cooperative object transporting task is not successfully fulfilled. The reason why the cooperative object transporting task can be successfully fulfilled with the proposed controller rather than with the LQR controller is that the proposed controller considers the interaction with other partners. When one partner decreases effort, the other partners will gradually increase their efforts to ensure the successful fulfillment of the cooperative object transporting task. In contrast, the LQR controller does not consider the interaction with other partners, so the cooperative object transporting task cannot be guaranteed to be successfully fulfilled.
In Figure 5, we can see that the estimated humans’ feedback gains converge to the real values in a few seconds. This means that the humans’ feedback gains can be successfully estimated by the proposed method.
Figure 6 demonstrates that fulfilling the cooperative object transporting task requires larger control gains β , γ with the LQR controller compared with the controller proposed in this paper. It means that accomplishing the same task requires less effort using the proposed controller. This is because that the proposed controller considers the interaction with other partners and calculates the minimal effort for the humans and the robot to complete the task. In contrast, the LQR controller doesn’t consider the interaction with other partners, so the humans and the robot only minimize their own cost function and may, therefore, require larger effort.
The feedback gains are affected by the state weights of the cost functions. In order to verify the advantages of the proposed controller when the state weights vary, we let Q 1 vary from 0 to 10Q with Q 2 = d i a g ( 100 , 0 ) and let Q 2 vary from 0 to 10 Q with Q 1 = d i a g ( 100 , 0 ) respectively. It can be seen from Figure 7 that accomplishing the same task always requires less effort using the proposed controller. We can also see that the difference between the control gains with our proposed controller and the control gains with LQR controller becoming smaller when Q 1 / Q or Q 2 / Q increases, this is because that the robot’s relative influence decreases.
From Figure 4, Figure 5, Figure 6 and Figure 7, we can conclude that the human-robot-human cooperative object transporting task can be fulfilled with less effort and the system can be kept stable using the proposed controller.
It can be seen from Figure 8 that, when the cost functions of the human 1 and the human 2 change, the cost function of the robot will also change adaptively. When the sum of the state weights of the human 1 and the human 2 Q 1 + Q 2 increases, the state weight of the Robot Q decreases accordingly. Conversely, when the sum of the state weights of the human 1 and the human 2 Q 1 + Q 2 decreases, the state weight of the robot Q increases accordingly. The reason why the robot can change adaptively is that we set the constant C value in Equation (21). Equation (21) makes the proposed controller able to adjust the contributions between the humans and the robot and makes the humans and the robot take complementary roles as well.
Figure 9 shows that, using the proposed controller, the adaptive cooperative object transporting task can be fulfilled and the system can be kept stable.
From Figure 8 and Figure 9, we can conclude that the adaptive cooperative object transporting task can be fulfilled with the proposed controller. During the physical interaction, the robot can successfully identify the change of each human’s cost function, and then adaptively adjust its own cost function to achieve interactive optimal control.
Figure 10 demonstrates that fulfilling the human-robot-human cooperative object transporting task requires smaller control gains β e , β v as compared with the human-robot cooperative object transporting task. It means that accomplishing the same task requires less effort by means of the human-robot-human physical interaction. This is because the human-robot-human cooperative object transporting task considers the interaction with more partners (two partners) and calculates minimal effort for the humans and the robot to complete the task. In contrast, the human-robot cooperative object transporting task consider the interaction with less partners (only one partner), so the human and the robot may therefore require larger effort.

5. Conclusions

In this paper, the human-robot-human physical interaction problem has been studied. An adaptive optimal control framework for the human-robot-human physical interaction has been proposed based on N-player game theory. The recursive least squares algorithm based on forgetting factor has been used to identify unknown control parameters of the humans online. The performance of the controller proposed in this paper has been verified by simulations of cooperative object transporting task. The simulation results show that the proposed controller can achieve adaptive optimal control during the interaction between the robot and two humans and keep the system stable. Compared with the LQR controller, the proposed controller has more superior performance. Compared with the human-robot physical interaction, accomplishing the same cooperative object transporting task requires less effort by means of the human-robot-human physical interaction based on the approach proposed in the paper. Although this paper only conducts simulations on the physical interaction between one robot and two humans, it is worth mentioning that the framework that is proposed in this paper has the potential to be generalized to the situation where multiple robots physically interact with multiple humans. As future work, we will extend the framework to the interaction between multiple robots and multiple humans.

Author Contributions

R.Z. conceived the original ideas, designed all the experiments, and subsequently drafted the manuscript. Y.L. provided supervision and funding support for the project. J.Z. provided supervision and funding support for the project. H.C. provided supervision. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Major Research Plan of the National Natural Science Foundation of China under Grant 91948201.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. De Santis, A.; Siciliano, B.; De Luca, A.; Bicchi, A. An atlas of physical human-robot interaction. Mech. Mach. Theory 2008, 43, 253–270. [Google Scholar] [CrossRef] [Green Version]
  2. Carolina, P.; Angelika, P.; Martin, B. A survey of environment-, operator-, and task-adapted controllers for teleoperation systems. Mechatronics 2010, 20, 787–801. [Google Scholar]
  3. Losey, D.P.; McDonald, C.G.; Battaglia, E.; O’Malley, M.K. A review of intent detection, arbitration, and communication aspects of shared control for physical human-robot interaction. Appl. Mech. Rev. 2018, 70, 010804. [Google Scholar] [CrossRef] [Green Version]
  4. Aslam, P.; Jeha, R. Safe physical human robot interaction-past, present and future. J. Mech. Sci. Technol. 2008, 22, 469. [Google Scholar]
  5. Li, Y.; Ge, S.S. Human–robot collaboration based on motion intention estimation. IEEE-ASME Trans. Mechatron. 2013, 19, 1007–1014. [Google Scholar] [CrossRef]
  6. Li, Y.; Ge, S.S. Force tracking control for motion synchronization in human-robot collaboration. Robotica 2016, 34, 1260–1281. [Google Scholar] [CrossRef] [Green Version]
  7. Sandra, H.; Martin, B. Human-oriented control for haptic teleoperation. Proc. IEEE 2012, 100, 623–647. [Google Scholar]
  8. Chen, Z.; Huang, F.; Yang, C.; Yao, B. Adaptive fuzzy backstepping control for stable nonlinear bilateral teleoperation manipulators with enhanced transparency performance. IEEE Trans. Ind. Electron. 2019, 67, 746–756. [Google Scholar] [CrossRef]
  9. Liu, C.; Masayoshi, T. Modeling and controller design of cooperative robots in workspace sharing human-robot assembly teams. In Proceedings of the IROS 2014, Chicago, IL, USA, 14–18 September 2014; pp. 1386–1391. [Google Scholar]
  10. Zanchettin, A.M.; Casalino, A.; Piroddi, L.; Rocco, P. Prediction of human activity patterns for human-robot collaborative assembly tasks. IEEE Trans. Ind. Inform. 2018, 15, 3934–3942. [Google Scholar] [CrossRef]
  11. Alexander, M.; Martin, L.; Ayse, K.; Metin, S.; Cagatay, B.; Sandra, H. The role of roles: Physical cooperation between humans and robots. Int. J. Robot. Res. 2012, 31, 1656–1674. [Google Scholar]
  12. Costa, M.J.; Dieter, C.; Veronique, L.; Johannes, C.; El-Houssaine, A. A structured methodology for the design of a human-robot collaborative assembly workplace. Int. J. Adv. Manuf. Technol. 2019, 102, 2663–2681. [Google Scholar]
  13. Daniel, N.; Jan, K. A problem design and constraint modelling approach for collaborative assembly line planning. Robot. Comput. Integr. Manuf. 2019, 55, 199–207. [Google Scholar]
  14. Selma, M.; Sandra, H. Control sharing in human-robot team interaction. Annu. Rev. Control 2017, 44, 342–354. [Google Scholar]
  15. Mahdi, K.; Aude, B. A dynamical system approach to task-adaptation in physical human-robot interaction. Auton. Robot. 2019, 43, 927–946. [Google Scholar]
  16. Roberto, C.; Vittorio, S. Rehabilitation Robotics: Technology and Applications. In Rehabilitation Robotics; Colombo, R., Sanguineti, V., Eds.; Academic Press: London, UK, 2018; pp. xix–xxvi. [Google Scholar]
  17. Colgate, J.E.; Decker, P.F.; Klostermeyer, S.H.; Makhlin, A.; Meer, D.; Santos-Munne, J.; Peshkin, M.A.; Robie, M. Methods and Apparatus for Manipulation of Heavy Payloads with Intelligent Assist Devices. U.S. Patent 7,185,774, 6 March 2007. [Google Scholar]
  18. Zoss, A.B.; Kazerooni, H.; Chu, A. Biomechanical design of the Berkeley lower extremity exoskeleton (BLEEX). IEEE-ASME Trans. Mechatron. 2006, 11, 128–138. [Google Scholar] [CrossRef]
  19. Li, Y.; Carboni, G.; Gonzalez, F.; Campolo, D.; Burdet, E. Differential game theory for versatile physical human-robot interaction. Nat. Mach. Intell. 2019, 1, 36–43. [Google Scholar] [CrossRef] [Green Version]
  20. Li, Y.; Tee, K.P.; Yan, R.; Chan, W.L.; Wu, Y. A framework of human-robot coordination based on game theory and policy iteration. IEEE Trans. Robot. 2016, 32, 1408–1418. [Google Scholar] [CrossRef]
  21. Nathanaël, J.; Themistoklis, C.; Etienne, B. A framework to describe, analyze and generate interactive motor behaviors. PLoS ONE 2012, 7, e49945. [Google Scholar]
  22. Li, Y.; Tee, K.P.; Yan, R.; Chan, W.L.; Wu, Y.; Limbu, D.K. Adaptive optimal control for coordination in physical human-robot interaction. In Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, Germany, 28 September–3 October 2015; pp. 20–25. [Google Scholar]
  23. Kirk, D.E. Optimal control theory: An introduction. In Optimal Control Theory; Dover Publications: Mineola, NY, USA, 2004. [Google Scholar]
  24. Li, Y.; Tee, K.P.; Chan, W.L.; Yan, R.; Chua, Y.; Limbu, D.K. Continuous role adaptation for human-robot shared control. IEEE Trans. Robot. 2015, 31, 672–681. [Google Scholar] [CrossRef]
  25. Lewis, F.L.; Vrabie, D. Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst. Mag. 2009, 9, 32–50. [Google Scholar] [CrossRef]
  26. Vamvoudakis, K.G.; Lewis, F.L. Multi-player non-zero-sum games: Online adaptive learning solution of coupled Hamilton–Jacobi equations. Automatica 2011, 47, 1556–1569. [Google Scholar] [CrossRef]
  27. Zhang, H.; Wei, Q.; Liu, D. An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games. Automatica 2011, 47, 207–214. [Google Scholar] [CrossRef]
  28. Liu, D.; Li, H.; Wang, D. Online synchronous approximate optimal learning algorithm for multi-player non-zero-sum games with unknown dynamics. IEEE Trans. Syst. Man Cybern. Syst. 2014, 44, 1015–1027. [Google Scholar] [CrossRef]
  29. Albaba, B.M.; Yildiz, Y. Modeling cyber-physical human systems via an interplay between reinforcement learning and game theory. Annu. Rev. Control 2019, 48, 1–21. [Google Scholar] [CrossRef] [Green Version]
  30. Music, S.; Hirche, S. Haptic Shared Control for Human-Robot Collaboration: A Game-Theoretical Approach. In Proceedings of the 21st IFAC World Congress, Berlin, Germany, 12–17 July 2020. [Google Scholar]
  31. Turnwald, A.; Wollherr, D. Human-like motion planning based on game theoretic decision making. Int. J. Soc. Robot. 2019, 11, 151–170. [Google Scholar] [CrossRef] [Green Version]
  32. Liu, Z.; Liu, Q.; Xu, W.; Zhou, Z.; Pham, D.T. Human-robot collaborative manufacturing using cooperative game: Framework and implementation. Procedia CIRP 2018, 72, 87–92. [Google Scholar] [CrossRef]
  33. Bansal, S.; Xu, J.; Howard, A.; Isbell, C. A Bayesian Framework for Nash Equilibrium Inference in Human-Robot Parallel Play. arXiv 2020, arXiv:2006.05729. [Google Scholar]
  34. Antonelli, G.; Chiaverini, S.; Marino, A. A coordination strategy for multi-robot sampling of dynamic fields. In Proceedings of the 2012 IEEE International Conference on Robotics and Automation, Saint Paul, MN, USA, 14–18 May 2012; pp. 1113–1118. [Google Scholar]
  35. Yan, Z.; Jouandeau, N.; Cherif, A.A. A survey and analysis of multi-robot coordination. Int. J. Adv. Robot. Syst. 2013, 10, 399. [Google Scholar] [CrossRef]
  36. Martina, L.; Alessandro, M.; Stefano, C. A distributed approach to human multi-robot physical interaction. In Proceedings of the 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC), Bari, Italy, 6–9 October 2019. [Google Scholar]
  37. Kim, W.; Marta, L.; Balatti, P.; Wu, Y.; Arash, A. Towards ergonomic control of collaborative effort in multi-human mobile-robot teams. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Macau, China, 3–8 November 2019. [Google Scholar]
  38. Starr, A.W.; Ho, Y.-C. Nonzero-sum differential games. J. Optim. Theory Appl. 1969, 3, 184–206. [Google Scholar] [CrossRef]
  39. Fudenberg, D.; Tirole, J. Noncooperative game theory for industrial organization: An introduction and overview. Handb. Ind. Organ. 1989, 1, 259–327. [Google Scholar]
  40. Hegan, N. Impedance Control: An Approach To Manipulation: Part I-Theory Part II-Implementation Part III-Applications. J. Dyn. Syst. Meas. Control 1985, 107, 1–24. [Google Scholar] [CrossRef]
  41. Blank, A.A.; Okamura, A.M.; Whitcomb, L.L. Task-dependent impedance and implications for upper-limb prosthesis control. Int. J. Robot. Res. 2014, 33, 827–846. [Google Scholar] [CrossRef]
  42. Vogel, J.; Haddadin, S.; Jarosiewicz, B.; Simeral, J.D.; Bacher, D.; Hochberg, L.R.; Donoghue, J.P.; van der Smagt, P. An assistive decision-and-control architecture for force-sensitive hand–arm systems driven by human–machine interfaces. Int. J. Robot. Res. 2015, 34, 763–780. [Google Scholar] [CrossRef]
  43. Basar, T.; Olsder, G.J. Dynamic Noncooperative Game Theory, 2nd ed.; Society for Industrial and Applied Mathematics; The Math Works Inc.: Natick, MA, USA, 1999. [Google Scholar]
  44. Shima, T.; Rasmussen, S. UAV cooperative decision and control: Challenges and practical approaches. In UAV Cooperative Decision and Control; SIAM: Philadelphia, PA, USA, 2009. [Google Scholar]
  45. Hudas, G.; Vamvoudakis, K.G.; Mikulski, D.; Lewis, F.L. Online adaptive learning for team strategies in multi-agent systems. J. Def. Model. Simul. 2012, 9, 59–69. [Google Scholar] [CrossRef]
  46. Tan, H.J.; Chan, S.C.; Lin, J.Q.; Sun, X. A New Variable Forgetting Factor-Based Bias-Compensated RLS Algorithm for Identification of FIR Systems With Input Noise and Its Hardware Implementation. IEEE Trans. Circuits Syst. I Regul. Pap. 2019, 67, 198–211. [Google Scholar] [CrossRef]
Figure 1. A scenario where the humans and the robot collaborate to perform an object transporting task.
Figure 1. A scenario where the humans and the robot collaborate to perform an object transporting task.
Sensors 20 05005 g001
Figure 2. Control Architecture.
Figure 2. Control Architecture.
Sensors 20 05005 g002
Figure 3. Simulation of cooperative object transporting task. The humans cooperate with the robot to transport the object back and forth between −10 cm and +10 cm along the horizontal direction. The forces that are exerted by the humans on the object are measured by force sensors at the interaction point.
Figure 3. Simulation of cooperative object transporting task. The humans cooperate with the robot to transport the object back and forth between −10 cm and +10 cm along the horizontal direction. The forces that are exerted by the humans on the object are measured by force sensors at the interaction point.
Sensors 20 05005 g003
Figure 4. The end effector position value.
Figure 4. The end effector position value.
Sensors 20 05005 g004
Figure 5. Control gains of humans. (a) the position error feedback gain of the human 1. (b) the velocity feedback gain of the human 1. (c) the position error feedback gain of the human 2. (d) the velocity feedback gain of the human 2.
Figure 5. Control gains of humans. (a) the position error feedback gain of the human 1. (b) the velocity feedback gain of the human 1. (c) the position error feedback gain of the human 2. (d) the velocity feedback gain of the human 2.
Sensors 20 05005 g005
Figure 6. Humans’ control gains (a) the position error feedback gain of the human 1. (b) the velocity feedback gain of the human 1. (c) the position error feedback gain of the human 2. (d) the velocity feedback gain of the human 2.
Figure 6. Humans’ control gains (a) the position error feedback gain of the human 1. (b) the velocity feedback gain of the human 1. (c) the position error feedback gain of the human 2. (d) the velocity feedback gain of the human 2.
Sensors 20 05005 g006
Figure 7. Control gains for different values of humans’ state weights. (a) and (b) the state weight of the human 1 vary. (c) and (d) the state weight of the human 2 vary.
Figure 7. Control gains for different values of humans’ state weights. (a) and (b) the state weight of the human 1 vary. (c) and (d) the state weight of the human 2 vary.
Sensors 20 05005 g007
Figure 8. Humans’ state weights. (a) the state weight of the human 1. (b) the state weight of the human 2. (c) the sum of the state weights of the human 1 and human 2. (d) the state weight of the robot.
Figure 8. Humans’ state weights. (a) the state weight of the human 1. (b) the state weight of the human 2. (c) the sum of the state weights of the human 1 and human 2. (d) the state weight of the robot.
Sensors 20 05005 g008
Figure 9. The end effector position value. (a) The end effector position value in Trial 1. (b) The end effector position value in Trial 2. (c) The end effector position value in Trial 3. (d) The end effector position value in Trial 4.
Figure 9. The end effector position value. (a) The end effector position value in Trial 1. (b) The end effector position value in Trial 2. (c) The end effector position value in Trial 3. (d) The end effector position value in Trial 4.
Sensors 20 05005 g009
Figure 10. Humans’ control gains. The dashed lines correspond to the human-robot cooperative object transporting task. The solid lines correspond to the human-robot-human cooperative object transporting task. (a) the position error feedback gain of the human 1. (b) the velocity feedback gain of the human 1. (c) the position error feedback gain of the human 2. (d) the velocity feedback gain of the human 2.
Figure 10. Humans’ control gains. The dashed lines correspond to the human-robot cooperative object transporting task. The solid lines correspond to the human-robot-human cooperative object transporting task. (a) the position error feedback gain of the human 1. (b) the velocity feedback gain of the human 1. (c) the position error feedback gain of the human 2. (d) the velocity feedback gain of the human 2.
Sensors 20 05005 g010

Share and Cite

MDPI and ACS Style

Zou, R.; Liu, Y.; Zhao, J.; Cai, H. A Framework for Human-Robot-Human Physical Interaction Based on N-Player Game Theory. Sensors 2020, 20, 5005. https://doi.org/10.3390/s20175005

AMA Style

Zou R, Liu Y, Zhao J, Cai H. A Framework for Human-Robot-Human Physical Interaction Based on N-Player Game Theory. Sensors. 2020; 20(17):5005. https://doi.org/10.3390/s20175005

Chicago/Turabian Style

Zou, Rui, Yubin Liu, Jie Zhao, and Hegao Cai. 2020. "A Framework for Human-Robot-Human Physical Interaction Based on N-Player Game Theory" Sensors 20, no. 17: 5005. https://doi.org/10.3390/s20175005

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop