1. Introduction
Humanoid robots, as highly articulated floating-base robotic systems, have become a focal point of whole body control (WBC) research over the past decade. Underactuated leg structures lacking ankle roll joints can simplify robot design and reduce costs, but also impose significant control challenges for WBC. Whole body plan (WBP) methods based on operational space (OS) control [
1], and null space (NS) techniques [
2] have been widely applied to biped humanoid robots for tasks such as dynamic walking [
3], jumping [
4], and manipulation [
5]. These methods present two main advantages for humanoid robots: First, they enable the design of NS-based multi-priority tasks, which are crucial for scenarios like walking that involve stance leg contact, swing leg planning, and center of mass (CoM) maintenance [
6], thus ensuring orderly trajectory planning. Second, the mapping operations between OS and joint space Jacobians can reduce the influence of a specific task space degree of freedom (DOF) on joint space, enhancing control reliability over underactuated leg structures [
7,
8]. Furthermore, walking task control needs to meet the reliability requirements of high-frequency control commands, actual physical constraints, and accurate ground reaction forces [
9,
10]. WBC based on contact force constraints and dynamic constraints has also been widely applied in humanoid robots. A WBC approach involving ground reaction forces and whole-body dynamic constraints was proposed to calculate desired joint torques [
11]. A hybrid approach combining NS kinematic planning and friction cone-constrained whole-body dynamic optimization has been shown to effectively control humanoid gait [
12].
Although WBC can generate high-frequency, real-time joint control commands to drive humanoid or quadruped robots, the solutions produced by WBC are typically local optima [
13]. Model predictive control (MPC), as an optimization-based approach with a finite prediction horizon, can fully exploit the robot’s kinematic range and torque limits within this horizon [
14,
15]. Furthermore, MPC’s optimization result adheres to the Bellman optimality principle, ensuring that each update yields a globally optimal solution with continuity [
16]. However, for high-dimensional humanoid or quadruped robots, the combination of a long prediction horizon and system complexity significantly increases the dimensionality of the optimization variables, which is especially burdensome for nonlinear MPC [
17], leading to computational challenges that undermine real-time performance. Common solutions include simplifying the humanoid robot model to an inverted pendulum [
18,
19,
20] or a single rigid-body dynamic (SRBD) model [
21,
22,
23], using linearized convex MPC models [
24,
25], and employing discretization techniques such as forward Euler (FE) or matrix exponential (ME) methods [
26,
27,
28,
29]. However, simplifying the robot model or MPC model, or applying discretization inevitably reduces the accuracy of MPC solutions, making it infeasible to directly use the output for robot control in complex systems.
Despite these advancements, three key challenges remain: underactuated Robot Control, optimization numerical stability, and integrated optimally planning and control. This study aims to develop an improved hierarchical optimization control strategy for underactuated humanoid robots by integrating the global optimal control capabilities of MPC with the high-frequency, precise control capabilities of WBC. First, for the high-level optimization, a convex MPC based on the SRBD model (as in MIT Cheetah 3 [
21]) is employed to generate optimal CoM trajectories and estimated ground reaction forces. To ensure numerical stability in high-stiffness systems and reduce computational load, the ME method is utilized instead of the FEr method, allowing for accurate predictions even with larger discretization steps. Second, for the low-level optimization, a multi-priority WBP is designed to generate desired trajectories from operational space to joint space for an underactuated humanoid robot without ankle roll joints. Since the simplified MPC model cannot directly provide accurate ground reaction forces for WBC, further optimization of these forces is necessary. As described in [
30], the foot contact surface can be modeled as a polygon, and the resultant wrench from multiple contact points must satisfy the contact wrench cone (CWC) constraints to ensure feasible contact forces and a stable contact mode. Thus, a WBC strategy is constructed with constraints on whole-body dynamics, joint torques, and CWC. This ensures that the output joint torques comply with physical constraints while achieving precise trajectory tracking. The WBC’s optimization objective focuses on incremental ground reaction forces and joint accelerations. Using the upstream planner’s output as the initial solution to the QP problem enables warm-starting, which accelerates convergence, improves continuity, and enhances smoothness of the output.
To address the challenging problem of whole-body complex planning and control for underactuated humanoid robots, the proposed method has the following three innovations:
The proposed convex MPC strategy employs ME discretization method, ensuring numerical stability even with large discretization steps. This approach enables globally optimal trajectory prediction while reducing computational costs.
The proposed complementary hierarchical optimization framework integrates the high-level MPC planner and the low-level WBPC, effectively balancing the need for globally optimal trajectories and high-frequency control commands that comply with physical constraints. This enhances system real-time performance, stability and robustness.
The proposed framework is validated on X02: a lightweight underactuated humanoid robot lacking ankle roll DOF. Walking experiments verify the effectiveness of the hierarchical optimization strategy. The subsequent sections are organized as follows:
Section 2 introduces the preliminaries, including MPC, SRBD, and WBP;
Section 3 presents the proposed hierarchical optimization framework;
Section 4 conducts walking simulation experiments on the X02 robot based on MuJoCo;
Section 5 discusses the experimental results; and
Section 6 concludes with a summary and future prospects.
3. Improved Hierarchical Optimization Framework
An improved hierarchical optimization framework is proposed for high-frequency real-time whole-body motion control of humanoid robots. The hierarchical optimization algorithm consists of two parts: (1) a convex MPC high-dimensional planner based on ME discretization, used for planning the centroid trajectory and desired ground contact forces within the optimization finite-time horizon; (2) a low-level WBC based on the output of MPC and WBP. The desired joint torques computed by WBC are combined with the desired joint trajectories obtained by WBP to feed to PVT controller that driving the motor moving.
3.1. High Level Planner Based on Convex MPC
This section introduces a convex MPC control strategy based on ME discretization. The method treats the entire robot as a SRBD subject to forces at the contact patches. For the high-level planning problem of multi-body robots, the influence of leg dynamics can be neglected while maintaining robust system performance. Taking the robot used in this study as an example, due to its excellent structural design, the mass of the arms and legs accounts for only 30% of the total mass. Through model simplification and ME discretization, the algorithm ensures real-time performance while maintaining the accuracy of the model approximation.
The ME method is employed for the discretization of the system, offering a robust approach to handle the inherent stiffness in dynamic models. This method ensures greater numerical stability by accurately capturing both the fast and slow dynamics of the system, even with larger time steps, compared to traditional explicit methods. In particular, humanoid robot dynamics, despite simplifications, exhibit stiffness due to rapid variations in ground reaction forces and frequent gait changes. Explicit discretization methods, such as FE, often require extremely small time steps to maintain stability in stiff systems, which can significantly reduce computational efficiency. By contrast, the ME method mitigates this challenge, enabling stable and efficient numerical solutions without the need for excessively small time steps.
The convex MPC (
4) problem after discretization using the ME method is
In a linear time-varying system, although the state matrix changes over time, it can still be regarded as locally linear at each discrete time step, and thus these matrices remain linear. The solution to the problem is repeated at every iteration, but only the control input for the first step is used for control. Therefore, the linear time-varying system can still be considered to meet the premise of convex MPC.
The above problem needs to be transformed into the standard QP form for solving. First, by concatenating the state equations at each time step, the system’s state can be expressed as a function of the initial state
and the sequence of control inputs,
where
and
. By combining all of the states
, we obtain the following form:
where
is the condensed form of the state transition matrix, and
is the condensed form of the control input matrix.
Express all the states and control inputs within the prediction horizon in vector form as follows:
where
,
, and
.
According to the method for constructing QP outlined in the reference [
21], the following standard QP form can be obtained:
where
, and
.
Remark 2. Compared to the FE method for discretization, the ME method yields a more accurate numerical solution and effectively addresses numerical stiffness issues. To mitigate the increased computational complexity and the additional computational burden from the integral term in the ME method, the following improvements can be made in numerical calculations: By leveraging the sparse characteristics of the state transition matrix , a block matrix approach can be used to reduce the dimensionality of matrix computations and improve the efficiency of calculating the ME. Additionally, based on the form of the ME, the integral for Γ can be converted into a matrix calculation . With these optimizations, the ME method can maintain excellent accuracy while reducing the computational load.
3.2. Low Level Controller Based on WBPC
Based on the WBP theory, reference trajectories in joint space can be obtained by iteratively solving K OS tasks. The estimated contact forces from the MPC and the joint acceleration from the WBP are used as input commands for the WBC, where an incremental form of the WBC QP problem is constructed.
Using the whole-body dynamics model as equality constraints and CWC as inequality constraints, the joint control torques optimized by WBC satisfy the actual physical constraint. The QP based WBC problem is formulated as follows:
where
represent the optimized increments of contact force and generalized acceleration, respectively.
and
represent the weights of the two optimization variables, respectively.
, and
represent the generalized mass/inertia matrix, the sum of Coriolis/centrifugal forces, and gravity forces, respectively. Equation (14) indicates that the stability of the CoM can be maintained even after adjusting the CoM acceleration and contact forces.
is the centroidal coordinate selection matrix. Equations (15) and (16) indicate that after fine-tuning the joint acceleration and contact forces; the resulting torques must satisfy the physical upper and lower limit constraints.
is the joint space selection matrix.
is the contact Jacobian matrix. Equations (17)–(19) jointly form the CWC.
X and
Y represent the half-length and half-width of the foot contact surface, respectively.
After obtaining
, the torque control command for the joint actuators,
, can be solved using the dynamic model
The joint space controller uses the PVT controller, taking the desired velocity
and desired position
from WBP, as well as the joint feedforward torque
from WBC as inputs. The specific form of the controller is
The pseudocode for the low-level control algorithm based on WBPC and PVT is shown in Algorithm 1.
Algorithm 1 Low-Level Controller Based on WBPC and PVT |
- Require:
▹ The inputs required for the algorithm - Ensure:
▹ The output of the algorithm
- 1:
Initialization: , , , - 2:
for
do - 3:
▹ Compute the error for task i - 4:
Calculate the null space projection matrix - 5:
Compute the mapped Jacobian matrix - 6:
▹ Get the trajectory increments for task i - 7:
end for - 8:
Save the final trajectory: - 9:
Compute the dynamic equality constraints and CWC constraints - 10:
Formulate the QP problem for the objective function - 11:
Solve the QP problem to obtain increments - 12:
Calculate the feedforward torque from the whole-body dynamic model - 13:
Calculate the PVT command: - 14:
Return
|
4. Simulation Experiment
In this section, we validate the effectiveness of the proposed hierarchical optimization control algorithm on the X02 platform, which lacks the ankle roll DOF. First, we design a WBP multi-priority task adapted to the absence of the ankle roll DOF. Next, we evaluate the numerical stability and prediction accuracy of the convex MPC based on the ME method under different step sizes. Finally, we validate the effectiveness of the incremental WBC based on CWC constraints and whole-body dynamics constraints, along with an interpretation of joint trajectory tracking.
4.1. Experimental Setup
The X02 is a unique, lightweight humanoid robot designed by us, characterized by low energy consumption and long endurance. The leg degrees of freedom and sequence of X02 are: {hip_yaw, hip_roll, hip_pitch, knee_pitch, ankle_pitch}. In this experiment, the X02 robot model, shown in
Figure 1, is loaded onto the MuJoCo simulation platform. The robot model parameters and MuJoCo physics engine settings are shown in
Table 1. The control algorithm architecture is modified based on the OpenLoong [
32] framework. To facilitate the deployment of the control algorithm on the physical robot, we have stripped the simulation portion from the original framework. The upper-level control and lower-level drives communicate using Google gRPC. The control program is deployed on a Lenovo Y9000R laptop with an Ubuntu 22.04 operating system, Intel i9-14900HX × 32 CPU, and 32 GB of memory. The upper-level optimization algorithm runs at 1k Hz, while the lower-level drive operates at 2k Hz.
4.2. Implementation Details
The multi-priority tasks based on WBP are prioritized as follows: maintaining_contact, holding_CoM_pose, tracking_swing_leg, in order from high to low priority. In the tracking_swing_leg task, due to the absence of the ankle roll DOF, we need to set the roll row in both the Jacobian and its derivative to 0, as well as set the roll component of the error vector to 0. This eliminates the contribution of the roll direction to the control of each joint when using WBP for joint-space trajectory planning.
4.3. Results of Convex MPC
In this experiment, The whole parameters are designed as follows in
Table 2.
In
Figure 2, the first 5 s represent the simulation’s stable standing phase, and the MPC is introduced at the fifth second. The sampling time step increases sequentially from top to bottom in the figure. From the data, it can be observed that both methods exhibit large errors in the
y-direction before 10 s. This is due to the first step of the robot’s gait planning based on a finite state machine, which drives the swing leg to move laterally. As the sampling time step increases, the FE method shows significant prediction errors caused by numerical instability, while the ME method does not. From the MSE metrics in the table and the trend in the figure, it can be seen that both methods achieve the smallest prediction error at
ms, rather than at
ms. This is because too small a time step increases the computational load and the setup of the QP solver.
Table 3 also shows that the average solution time for ME is longer than for FE. Overall, under all four sampling time steps, the ME discretization method, compared to the FE discretization method, achieves smaller prediction errors and greater numerical stability, albeit at the cost of longer solving times.
4.4. Results of WBPC
The relevant parameters of the WBC optimization problem are presented in
Table 4. The joint torque limiting sequence is as follows: hip yaw, hip roll, hip pitch, knee pitch, and ankle pitch.
From
Figure 3, it can be observed that the WBC further optimizes the contact forces predicted by the MPC based on the CWC constraints and dynamic constraints in joint space. The figure illustrates the optimized contact forces for several complete gait cycles of both the left and right feet. The foot–ground contact coordinate system aligns with the global coordinate system, following the right-hand convention. For the force in the
y-direction, the output by the WBC for the left foot is negative, while it is positive for the right foot, aligning better with the physical scenario. For the
z-direction force, the WBC outputs smoother results that satisfy the CWC constraints.
Figure 4 and
Figure 5 show the joint torques performance and joint trajectories response, respectively. In the figure, the abbreviations in the legend are defined as follows: LHY represents “left hip yaw”, LHP represents “left hip pitch”, LHR represents “left hip roll”, LKP represents “left knee pitch”, and LAP represents “left ankle pitch”. Similarly, R indicates the right, and the other abbreviations follow the same naming convention. From
Figure 4, it can be seen that the optimized torques output by the WBC satisfy the physical constraints of the joint torque limits. The hip yaw joint contributes minimally during forward walking, requiring only a small torque for stabilization. Notably, the hip yaw motors of X02 rotate in opposite directions; to maintain stability in the yaw direction, the torque control commands exhibit a synchronized trend. During the swing phase, the knee pitch experiences nearly zero torque, while during the support phase, it generates significant torque. The pitch joints of the hip, knee, and ankle exhibit periodic contributions during the transition between the stance and swing legs. From the right subplot of
Figure 5, it can be observed that since the motor angles are zero when X02 is standing upright, the knee pitch angle during walking approximately equals the sum of the hip pitch and ankle pitch angles to maintain the stability of the center of mass.
5. Discussion
In the experimental section of this study, we compare the proposed ME-based MPC algorithm with the FE-based MPC algorithm used in previous research. The experimental results show that under large discrete step sizes, the ME approach can still ensure numerical stability. The MSE metric for ME is generally an order of magnitude lower than that of FE, reaching the level of . However, due to the integral nature of the ME approach, its computational load increases, leading to longer solution times compared to the FE approach. To reduce the computational cost of the ME method, we leverage the derivative property of the exponential function to convert the matrix exponential integration into , thus shortening the solution time. From the average solution-time comparison data, the ME approach is only about 0.2 ms slower than the FE approach, which is fully acceptable for large discrete step sizes. Therefore, when solving MPC problems with large step sizes, the ME approach provides both numerical accuracy and computational efficiency.
In the WBC algorithm, ground reaction force is key to driving joint motion and maintaining the stability of the floating base. In the traditional WBC algorithm [
9], the lack of prior estimation for foot contact forces means that the initial value is typically set to zero, often requiring multiple iterations to find the optimal solution—particularly costly when gait switches occur frequently during walking. In this study, we use the ground reaction forces computed by the MPC as the initial guess for the WBC optimization in an incremental optimization manner, allowing a rapid and accurate determination of the foot contact forces needed to drive joint motion. From the experimental data, although the ground reaction forces output by the MPC are not fully accurate or continuous, they converge quickly and become smooth and continuous after WBC optimization in each gait cycle for both feet.
6. Conclusions and Future Work
This study proposes an improved optimization control algorithm based on MPC and WBPC to address the numerical stability, computational efficiency, and real-time requirements of humanoid robot walking tasks. By comparing two discretization-based MPC algorithms under different step sizes with respect to convergence time and error accuracy, the results indicate that the ME method is more suitable for solving the MPC problem of a simplified humanoid robot model using large discrete step sizes, ensuring numerical stability and reducing computational demands. Moreover, to meet the high-frequency real-time requirements and physical constraints of humanoid walking, the ground reaction forces estimated by the MPC are utilized as the initial guess for the WBC optimization problem, combined with CWC constraints in an incremental optimization scheme for joint trajectories and feedforward torques. This reduces the number of optimization iterations while enhancing the accuracy and real-time performance of the numerical solutions. Simulation-based walking experiments verify the feasibility of the proposed algorithm. By integrating the global optimal solution from MPC with the high-frequency real-time characteristics of WBC, the proposed hierarchical optimization framework improves overall system robustness. Leveraging a discretization approach with stronger numerical stability and a hierarchical optimization strategy, this framework effectively balances computational load and solution accuracy, making it well-suited for deployment on edge computing platforms and providing theoretical support for physical implementation. In the future, we plan to further investigate gait planning algorithms with strong disturbance rejection to enhance the stability of this hierarchical optimization framework and achieve faster walking speeds. Additionally, capitalizing on the existing X02 communication framework allows rapid deployment of the algorithm from simulation to physical hardware, and we will validate its performance on a real robot in a forthcoming work.