Research on High-Precision Motion Planning of Large Multi-Arm Rock Drilling Robot Based on Multi-Strategy Sampling Rapidly Exploring Random Tree*

Xu, Qiaoyu; Lin, Yansong

doi:10.3390/s25092654

Open AccessArticle

Research on High-Precision Motion Planning of Large Multi-Arm Rock Drilling Robot Based on Multi-Strategy Sampling Rapidly Exploring Random Tree*

by

Qiaoyu Xu

^* and

Yansong Lin

College of Mechanical and Electrical Engineering, Henan University of Science and Technology, Luoyang 471000, China

^*

Author to whom correspondence should be addressed.

Sensors 2025, 25(9), 2654; https://doi.org/10.3390/s25092654

Submission received: 24 March 2025 / Revised: 17 April 2025 / Accepted: 21 April 2025 / Published: 22 April 2025

(This article belongs to the Section Sensors and Robotics)

Download

Browse Figures

Versions Notes

Abstract

In addressing the optimal motion planning issue for multi-arm rock drilling robots, this paper introduces a high-precision motion planning method based on Multi-Strategy Sampling RRT* (MSS-RRT*). A dual Jacobi iterative inverse solution method, coupled with a forward kinematics error compensation model, is introduced to dynamically correct target positions, improving end-effector positioning accuracy. A multi-strategy sampling mechanism is constructed by integrating DRL position sphere sampling, spatial random sampling, and goal-oriented sampling. This mechanism flexibly applies three sampling methods at different stages of path planning, significantly improving the adaptability and search efficiency of the RRT* algorithm. In particular, DRL position sphere sampling is prioritized during the initial phase, effectively reducing the number of invalid sampling points. For training a three-arm DRL model with the twin delayed deep deterministic policy gradient algorithm (TD3), the Hindsight Experience Replay-Obstacle Arm Transfer (HER-OAT) method is used for data replay. The cylindrical bounding box method effectively prevents collisions between arms. The experimental results show that the proposed method improves motion planning accuracy by 94.15% compared to a single Jacobi iteration. MSS-RRT* can plan a superior path in a shorter duration, with the planning time under optimal path conditions being only 20.71% of that required by Informed-RRT*, and with the path length reduced by 21.58% compared to Quick-RRT* under the same time constraints.

Keywords:

multi-strategy sampling; RRT*; deep reinforcement learning; motion planning; multi-arm

1. Introduction

With the continuous advancement of tunnel modernization, the intelligentization and automation of multi-arm rock drilling robots have gradually become key industry trends. Robot arm motion planning is a critical research area in robotics [1]. When a multi-arm rock drilling robot performs drilling operations, it requires the robotic arms to autonomously plan the path to the target position while simultaneously ensuring accurate localization, optimal path planning, fast planning, and safe obstacle avoidance.

Robotic arm motion planning consists of two key components: path planning and trajectory planning, with path planning being central to determining performance. For the path planning of multi-joint robotic arms in 3D space, the RRT (rapidly exploring random tree) algorithm is widely used due to its strong capability in high-dimensional spatial search. The RRT algorithm, proposed by Lavalle S M [2] in 1998, randomly selects sampling points, which often results in inefficient and suboptimal paths.

To address the shortcomings of the RRT algorithm, researchers have proposed various solutions [3,4,5,6]. Kuffner et al. [7] proposed the RRT-Connect algorithm, which generates two trees from the start and end points, respectively, which speeds up the path search speed but does not guarantee the optimality of the paths. Karaman et al. [8] proposed the RRT* algorithm, which optimizes the path by reselecting parent nodes and rewiring to solve the problem of poor asymptotic optimality of RRT, but the search efficiency is low. Gammell et al. [9] proposed the Informed RRT* algorithm, which speeds up the search for the optimal solution by sampling point selection in an ellipsoidal state-space region. However, it still relies on undirected exploration in the beginning of the planning phase. Zhang et al. [10] proposed the IPQ-RRT* Connect algorithm for multi-objective point motion planning of assembly robotic arms. Due to the inherent characteristics of the RRT algorithm, the drawback of random selection of sampling points leading to lower search efficiency remains unsolved.

With the rapid development of deep learning, domestic and international researchers began to apply deep reinforcement learning to the motion planning of robotic arms [11,12,13,14,15]. The DRL model is trained through the interaction between the robotic arm and the environment to learn the optimal motion strategy in different states.

Yang et al. [16] proposed a deep Q-learning framework, incorporating dynamic action evaluation and an object separation reward network to effectively realize robotic arm grasping. Zhao et al. [17] proposed the HM-DDPG method, which utilizes joint-space positive kinematics planning and full retention of experience, combined with bias-free hierarchical memory, to improve the efficiency and accuracy of textile robotic arm path planning. Andrychowicz et al. [18] proposed a new technique called “hindsight experience replay”, which can be combined with arbitrary off-policy reinforcement learning algorithms by utilizing sparse binary rewards for efficient sample learning, avoiding complex reward design. Kim et al. [19] proposed a motion planning algorithm that combines the twin delayed deep deterministic policy gradient with hindsight experience replay to solve the path smoothness problem of a robotic arm in a continuous action space through reinforcement learning. Hao et al. [20] proposed a visual servoing method based on target detection and KAN-BiLSTM trajectory prediction, which effectively predicts the trajectory of robotic arm motion and realizes dynamic planning. Liu et al. [21] proposed a DRL path planning system incorporating a safety verification mechanism, which significantly improves the response speed and safety performance of collaborative robots against dynamic obstacles.

In complex tasks, the performance of model-free DRL applied to path planning alone is more limited, and the current trend is to combine DRL with traditional path planning methods [22,23]. Gao et al. [24] proposed a new incremental training model by pre-training the DRL model in a 2D environment and migrating it to a 3D environment, which effectively improves the robot’s path planning efficiency and generalization ability. Cai et al. [25] proposed a path planning framework integrating deep reinforcement learning and the RRT* algorithm and improved the TD3 algorithm to enhance the performance of robotic arm path planning in 2D and 3D environments. Liu et al. [26] proposed DDPG-RRT, a multi-mechanical arm path planning algorithm incorporating deep reinforcement learning, which improves the traditional RRT algorithm by introducing the dynamic step setting of the DDPG algorithm to achieve collision-free path search with synchronized departure of multi-mechanical arms.

In this study, the DRL model is integrated into the sampling point selection of the RRT* algorithm to form the MSS-RRT* algorithm, which is designed to efficiently plan a short and safe motion path for the robotic arm in a short time.

The main contributions of this paper are as follows:

A dual Jacobi iterative inverse solution method based on target correction is proposed, which effectively improves the positioning accuracy of motion planning through the dynamic correction of target position.
The MSS-RRT* planning algorithm is proposed, incorporating a multi-strategy sampling mechanism that includes DRL position sphere sampling, spatial random sampling, and goal-oriented sampling. It effectively reduces invalid sampling by dynamically switching sampling strategies at multiple stages. It significantly improves the planning efficiency compared to algorithms such as Informed RRT* and Quick-RRT*.
In DRL model training, the HER-OAT experience replay mechanism is employed to convert failure data into successful outcomes by adjusting the target joint values and obstacle arm positions. This approach effectively enhances the performance of DRL position ball sampling during the initial stage of planning.

This paper is organized as follows:

In Section 2, the robotic arm is first modeled using forward kinematics, followed by an analysis of the positioning error relationship between forward and inverse kinematics, and the establishment of the forward solution error model. In Section 3, the dual Jacobi iterative inverse solution algorithm based on target correction is proposed. In Section 4, the multi-strategy sampling mechanism is introduced, the process of building and training the DRL model is explained, and the specific approach for MSS-RRT* motion planning is detailed. In Section 5, the effectiveness of the proposed method is validated through simulations and experiments.

2. The Establishment of DH Model and Forward Kinematics Error Model

This paper focuses on the G3Zi rock drilling robot, which is equipped with three robotic arms. Each arm consists of five rotary joints and two translational joints, with a total length of approximately 7.84 m, making it a large hydraulic robotic arm. A DH kinematic model is established to calculate the theoretical positions and orientations of the robotic arm’s end effector for various joint configurations. The structure of the robotic arm model is shown in Figure 1.

The DH parameter values for each joint of the model, including the rotation angle

θ_{i}

, link offset

d_{i}

, twist angle

α_{i}

, and link length

a_{i}

, are presented in Table 1.

The deflection error caused by the huge size of the robot arm itself seriously affects the positioning accuracy of the end of the robot arm. Therefore, it is necessary to build a forward kinematic error model to accurately position the robot arm. The DH method is used to calculate the theoretical position of the end of the robot arm, and the actual position of the end corresponding to the joint value is recorded by the total station. The two positions are subtracted to obtain the end position error. The joint value and the end position error are used as the input and output of the neural network model for training, respectively, to obtain the mapping relationship between the joint value and the position error of the robot arm. The spatial error distance between the actual position and the theoretical position when the robot arm is at different joint angles can be determined. The model building process is shown in Figure 2.

3. The Dual Jacobi Iterative Inverse Solution Algorithm Based on Target Correction

In robotic motion planning, solving inverse kinematics is critical. The primary approaches include analytical methods, numerical methods, and intelligent algorithms. The analytical method [27] directly computes joint variables through mathematical derivation, offering real-time performance advantages but being limited by the structural complexity of the manipulator. Intelligent algorithms [28,29] employ data-driven kinematic mapping to handle complex scenarios yet depend on extensive training data and require high computational costs.

The Jacobi iterative method belongs to the numerical solution methods [30]. It provides a general and flexible solution framework for inverse kinematics problems through differential kinematics mapping and iterative approximation, which is especially suitable for multi-DOF robotic arms.

3.1. Mathematical Foundations of the Jacobi Iterative Method

The relationship between the generalized velocity vector and the joint velocity vectors is

\dot{x} = J (q) \dot{q}

, where

J_{i j} (q) = \partial x_{i} / \partial q_{j}

. Since the robotic arm has seven joints,

J (q)

is a 6 × 7 matrix. Expand

\dot{x} = J (q) \dot{q}

as shown in Equation (1).

[\begin{matrix} v_{x} \\ v_{y} \\ v_{z} \\ ω_{x} \\ ω_{y} \\ ω_{z} \end{matrix}] = [\begin{matrix} J_{11} & J_{12} & J_{13} & J_{14} & J_{15} & J_{16} & J_{17} \\ J_{21} & J_{22} & J_{23} & J_{24} & J_{25} & J_{26} & J_{27} \\ J_{31} & J_{32} & J_{33} & J_{34} & J_{35} & J_{36} & J_{37} \\ J_{41} & J_{42} & J_{43} & J_{44} & J_{45} & J_{46} & J_{47} \\ J_{51} & J_{52} & J_{53} & J_{54} & J_{55} & J_{56} & J_{57} \\ J_{61} & J_{62} & J_{63} & J_{64} & J_{65} & J_{66} & J_{67} \end{matrix}] [\begin{matrix} d q_{1} \\ d q_{2} \\ d q_{3} \\ d q_{4} \\ d q_{5} \\ d q_{6} \\ d q_{7} \end{matrix}] = J_{n} [\begin{matrix} d q_{1} \\ d q_{2} \\ d q_{3} \\ d q_{4} \\ d q_{5} \\ d q_{6} \\ d q_{7} \end{matrix}]

(1)

In the above equation,

v_{x} v_{y} v_{z}

represents the end effector’s translational linear velocity,

ω_{x} ω_{y} ω_{z}

indicates its rotational angular velocity, and

J_{n}

denotes the arm’s Jacobi matrix.

The relationship between the differential transform in the end coordinate system and the differential transform in the base coordinate system is shown in Equation (2).

[\begin{matrix} {}^{T}d_{x} \\ {}^{T}d_{y} \\ {}^{T}d_{z} \\ {}^{T}δ_{x} \\ {}^{T}δ_{y} \\ {}^{T}δ_{z} \end{matrix}] = [\begin{matrix} n_{x} & n_{y} & n_{z} & {(p \times n)}_{x} & {(p \times n)}_{y} & {(p \times n)}_{z} \\ o_{x} & o_{y} & o_{z} & {(p \times o)}_{x} & {(p \times o)}_{y} & {(p \times o)}_{z} \\ a_{x} & a_{y} & a_{z} & {(p \times a)}_{x} & {(p \times a)}_{y} & {(p \times a)}_{z} \\ 0 & 0 & 0 & n_{x} & n_{y} & n_{z} \\ 0 & 0 & 0 & o_{x} & o_{y} & o_{z} \\ 0 & 0 & 0 & a_{x} & a_{y} & a_{z} \end{matrix}] [\begin{matrix} d_{x} \\ d_{y} \\ d_{z} \\ δ_{x} \\ δ_{y} \\ δ_{z} \end{matrix}]

(2)

Since the rotation axis of the rotary joint is the Z-axis, its differential rotation vector has non-zero components only in the Z-direction, and the corresponding Jacobi matrix column vectors are shown in Equation (3).

J_{1, 2, 4, 5, 6} = {[\begin{matrix} {(p \times n)}_{z} & {(p \times o)}_{z} & {(p \times a)}_{z} & n_{z} & o_{z} & a_{z} \end{matrix}]}^{T}

(3)

The translation joints move along the Z-axis so that only Z-axis non-zero components exist in their differential translation vectors, and the corresponding Jacobi matrix column vectors are shown in Equation (4).

J_{3, 7} = {[\begin{matrix} n_{z} & o_{z} & a_{z} & 0 & 0 & 0 \end{matrix}]}^{T}

(4)

3.2. Implementation of Dual Jacobi Iteration Algorithm Based on Target Correction

The forward solution error compensation model can accurately determine the actual position of the robotic arm’s end, but it does not fully address the positioning deviation between the actual end position and the target end position. This forward kinematic localization error, which is known but not compensated for, becomes the primary factor affecting the accuracy of inverse kinematics. The robotic arm position error is illustrated in Figure 3, where the gray arm represents the target end position, the yellow arm represents the actual position, and the red line segment denotes the end position error.

The dual Jacobi iterative inverse solution algorithm based on target correction effectively solves the above problems. The joint values obtained from the first inverse solution are calculated by DH forward solution and forward solution error compensation model to obtain the end position error. The target position is corrected according to this error, and the final accurate combination of joint values is obtained by the second small-scope Jacobi iteration on the basis of the first inverse solution joint values, which improves the inverse solution localization accuracy.

The symbol reference table for this chapter is shown in Table 2.

The flowchart of the algorithm is shown in Figure 4.

The specific process is as follows:

Firstly, the first Jacobi iteration is performed based on the current joint values $θ_{n o w}^{1 - 7}$ , the initial target pose matrix $T_{o n e}^{t g t}$ , and the number of iterations N to obtain the first set of inverse solution joint values $θ_{o n e}^{1 - 7}$ .
Perform DH forward solution on the joint values $θ_{o n e}^{1 - 7}$ to obtain the theoretical position $P_{o n e}^{t h e o}$ of the end of the robot arm, and then obtain the end position error compensation amount $P_{o n e}^{c o m p}$ through the forward solution error compensation model.
$P_{o n e}^{a c t} = P_{o n e}^{t h e o} + P_{o n e}^{c o m p}$ : Add $P_{o n e}^{t h e o}$ and $P_{o n e}^{c o m p}$ to obtain the actual end position $P_{o n e}^{a c t}$ of the joint values $θ_{o n e}^{1 - 7}$ .
$P_{o n e}^{e r r o r} = P_{o n e}^{t g t} - P_{o n e}^{a c t}$ : The end position error $P_{o n e}^{e r r o r}$ is obtained by making a difference between the target end position $P_{o n e}^{t g t}$ and the actual end position $P_{o n e}^{a c t}$ .
$P_{t w o}^{t g t} = P_{o n e}^{t g t} + P_{o n e}^{e r r o r}$ : The corrected end target position $P_{t w o}^{t g t}$ (second target position) is obtained by adding the target end position $P_{o n e}^{t g t}$ to the end position error $P_{o n e}^{e r r o r}$ .
Enter the second Jacobi iteration and first update the current joint values $θ_{u p d}^{1 - 7}$ . Update the target pose matrix $T_{t w o}^{t g t}$ according to the corrected end target position, complete the second Jacobi iteration inverse solution, and obtain the exact inverse solution joint values $θ_{t w o}^{1 - 7}$ .

4. MSS-RRT* Motion Planning Algorithm

By combining deep reinforcement learning with the RRT* algorithm, this approach leverages the intelligent decision-making capability of the DRL model and the powerful search ability of RRT*, offering a novel solution for fast and efficient motion planning.

The middle arm is selected as the main robotic arm, denoted as

A_{m a i n}

, with

A_{m a i n}^{s t a r t}

representing its initial state,

A_{m a i n}^{n o w}

indicating the current state, and

A_{m a i n}^{g o a l}

denoting the target state. The left and right arms are designated as obstacle robotic arms, denoted as

A_{o b s}

.

The symbol reference table for this chapter is shown in Table 3.

4.1. Multi-Strategy Sampling Mechanism

The large size of the rock drilling robot arm results in a wide search space. At the same time, the random sampling strategy of the RRT* algorithm makes it difficult to accumulate enough sampling points near the optimal path in a short period of time, which limits the effectiveness of the RRT* algorithm in rewiring and reselecting parent nodes, thus affecting the optimality of the path. In addition, the non-directional sampling strategy of the RRT* algorithm in the initial stage generates a large number of invalid sampling points, which is the key reason for the low efficiency of the algorithm.

Introducing the DRL model to guide the RRT* algorithm for sampling point selection can effectively overcome the disadvantage of lower search efficiency in the initial stage. However, relying on DRL model sampling alone will lose the original advantages of RRT*. In order to solve the above problems, this paper proposes an RRT* algorithm with a multi-strategy sampling mechanism, namely the MSS-RRT* algorithm.

The MSS-RRT* algorithm is optimized for the sampling strategy of the traditional RRT* with three different sampling methods: DRL position ball sampling (DRL-PBS), spatial random sampling (SRS), and goal-oriented sampling (GOS). The core idea is to dominate DRL position ball sampling in the early planning stage, increase the weight of spatial random sampling when approaching obstacles, and dominate goal-oriented sampling after successfully avoiding obstacles. The principle of the multi-strategy sampling mechanism is shown in Figure 5.

The stage of the

A_{m a i n}^{s t a r t}

approaches the

A_{o b s}

, focusing on DRL position ball sampling as it carries directionality, which can avoid the generation of invalid sampling points caused by the wrong starting direction. In the stage when the

A_{m a i n}

is detached from the

A_{o b s}

, the focus is on spatial random sampling, which provides a basic guarantee for the RRT* algorithm to search for the optimal path. In the stage of the

A_{m a i n}

to the

A_{m a i n}^{g o a l}

, the target-oriented sampling is the main focus so as to arrive at the target position quickly and accelerate the path search efficiency.

Based on the parameter optimization results from several experiments, this study set the probability of the primary sampling method to 0.8 and the probability of the secondary sampling method to 0.1. The probabilities of the three sampling strategies at the beginning of planning are set to (0.8, 0.1, and 0.1). The critical minimum distance

D_{t h r e s h}

is set in advance. During the initial to critical states, the collision detection method is used to calculate the minimum distance

D_{n o w}^{o b s}

between the

A_{m a i n}^{n o w}

and the

A_{o b s}

and the minimum distance

D_{s t a r t}^{o b s}

between the

A_{m a i n}^{s t a r t}

and the

A_{o b s}

, respectively, to update the sizes of

P_{D R L}

and

P_{S R S}

, as shown in Equation (5). When arriving at the first critical threshold point, the probabilities of the three sampling strategies become (0.1, 0.8, and 0.1), maintained until the second critical threshold point appears. During the process from the second critical threshold point to the target position,

P_{S R S}

and

P_{G O S}

are updated by calculating the Euclidean distance

d_{n o w}^{g o a l}

between the end position of the

A_{m a i n}^{n o w}

and the end position of the

A_{m a i n}^{g o a l}

and combining it with the Euclidean distance

d_{t h r e s h}^{g o a l}

between the end position of the

A_{m a i n}

in the critical state and the end position of the

A_{m a i n}^{g o a l}

, as shown in Equation (6). When arriving at the

A_{m a i n}^{g o a l}

, the probabilities of the three sampling strategies become (0.1, 0.1, and 0.8). The multi-strategy sampling mechanism adds constraints to the three sampling strategy probabilities, as shown in Equation (7).

\{\begin{matrix} P_{D R L} = 0.8 - 0.7 \times \frac{D_{s t a r t}^{o b s} - D_{n o w}^{o b s}}{D_{s t a r t}^{o b s} - D_{t h r e s h}} \\ P_{S R S} = 0.1 + 0.7 \times \frac{D_{s t a r t}^{o b s} - D_{n o w}^{o b s}}{D_{s t a r t}^{o b s} - D_{t h r e s h}} \\ P_{G O S} = 0.1 \end{matrix}

(5)

\{\begin{matrix} P_{D R L} = 0.1 \\ P_{S R S} = 0.8 - 0.7 \times \frac{d_{t h r e s h}^{g o a l} - d_{n o w}^{g o a l}}{d_{t h r e s h}^{g o a l}} \\ P_{G O S} = 0.1 + 0.7 \times \frac{d_{t h r e s h}^{g o a l} - d_{n o w}^{g o a l}}{d_{t h r e s h}^{g o a l}} \end{matrix}

(6)

\{\begin{matrix} P_{D R L} + P_{S R S} + P_{G O S} = 1 \\ 0.1 \leq P_{D R L}, P_{S R S}, P_{G O S} \leq 0.8 \end{matrix}

(7)

4.2. DRL Model Construction and Training Results

The key components in building a DRL model for a multi-arm rock drilling robot include establishing the foundational framework (such as algorithm selection, defining state and action spaces, and setting the reward function), designing a multi-arm collision detection scheme, and incorporating HER-OAT, an experience replay mechanism that enhances training efficiency.

4.2.1. DRL Model Foundation Framework Construction

The TD3 algorithm is employed to train a three-arm deep reinforcement learning model for motion planning within the Pytorch framework. TD3 utilizes dual Q-learning with two independent Q networks, selecting the smaller value as the Q value estimate for policy updates to mitigate overestimation bias. A delay strategy is implemented for updating the policy network, where the policy network is updated once after several updates to the critical network, reducing instability caused by value estimation errors. The framework of the TD3 algorithm used in this study is shown in Figure 6.

The state space

S_{t} = {θ_{m a i n}^{n o w}, θ_{m a i n}^{g o a l}, θ_{o b s}^{l e f t}, θ_{o b s}^{r i g h t}}

consists of

θ_{m a i n}^{n o w}

,

θ_{m a i n}^{g o a l}

,

θ_{o b s}^{l e f t}

, and

θ_{o b s}^{r i g h t}

, with a total of 28 dimensional state variables.

The action space is configured according to the range of motion of each joint of the main robotic arm. To avoid drastic changes in the joint angles, a Tanh activation function is added at the end of the action model so that the output value is limited to the interval from −1 to 1 without adding noise. Meanwhile, to ensure the consistency of the weights between the rotational and translational joints for the subsequent calculations, the unit of the translational joints is set to decimeters.

The determination of superior and inferior paths in DRL model training is directly affected by the reward function

r_{t} (a_{t}, s_{t})

. In this paper, the reward function consists of three parts, which are the difference reward and penalty of the joint values in two consecutive states

r_{d i f f}

, the collision penalty

r_{c o l l}

, and the arrival reward

r_{r e a c h}

, as shown in Equations (8)–(11).

r_{d i f f} = (({}^{t}θ_{n o w}^{1 - 7} - θ_{t a g}^{1 - 7}) - ({}^{t - 1}θ_{n o w}^{1 - 7} - θ_{t a g}^{1 - 7})) \times k

(8)

r_{c o l l} = \{\begin{matrix} - V_{c o l l i s i o n}, i f c o l l i s i o n \\ 0, i f n o n - c o l l i s i o n \end{matrix}

(9)

r_{r e a c h} = \{\begin{matrix} V_{r e a c h}, a l l (θ_{n o w}^{1 - 7} - θ_{t a g}^{1 - 7}) < t h r \\ 0, o t h e r s \end{matrix}

(10)

r_{t} (a_{t}, s_{t}) = r_{d i f f} + r_{c o l l} + r_{r e a c h}

(11)

4.2.2. Collision Detection

The single-arm self-collision detection and multi-arm mutual collision detection of the rock drilling robot are transformed into solving the problem of minimum distance between the envelope boxes of spatial cylinders. According to the working conditions of the robotic arm, the collision-prone area is analyzed and divided into three sections for collision detection. These three sections are simplified as cylindrical envelope boxes: the upper arm (CEB1), the lower arm box (CEB2), and the beam (CEB3). Single-arm collisions are most likely to occur between CEB1 and CEB2, as well as between CEB1 and CEB3. Multiple-arm collisions with each other may occur at any two rods, which need to be detected sequentially. The mechanical arm envelope box model diagram is shown in Figure 7.

The detection processes of single-arm collision and multi-arm collision are as follows:

Calculate the spatial coordinate position: According to the link coordinate system and the actual size of the robot arm, calculate the spatial coordinate positions P1, P2, Q1, and Q2 of the endpoint values of each cylinder main axis segment.
Calculate the closest distance: Calculate the closest distance between two spatial line segments.
Collision detection: The closest distance minus the radius of the two cylinders to realize the detection of single-arm collision and multi-arm collision.

The safety status of the robotic arm can be quickly determined by collision detection to provide safety for subsequent motion planning.

4.2.3. HER-OAT Experience Replay Mechanism

In order to improve the training efficiency, this paper proposes the HER-OAT method to play back the data and convert some of the failed planning into successful planning. The HER method significantly improves the successful planning probability by reselecting the target joint values

θ_{m a i n}^{g o a l}

. On this basis, the HER-OAT method carries out a reasonable movement of the

A_{o b s}

, thus avoiding the collision between robotic arms to form an effective path. First, based on the collision position data returned from the collision detection, determine whether the collision occurs in the Y or Z direction. Next, using the joint values

θ_{o b s}

θ_{m a i n}^{n o w}

of both the obstacle and main robotic arms, identify the optimal movement direction for the

A_{o b s}

. The Z-axis movement direction is determined by comparing the pitch values of the two robotic arms, while the Y-axis movement direction is determined by comparing their swing values. To ensure path optimization, the

A_{o b s}

undergoes 10 movement cycles, with each cycle adjusting the joint values by ±1°. If collisions persist after completing all cycles, the system is considered to be in a collision state. The HER-OAT method is shown in Figure 8.

The pseudo-code of the TD3 algorithm combined with HER-OAT is shown below (Algorithm 1).

Algorithm 1: TD3 (HER-OAT)

Use random parameters

θ_{1}

,

θ_{2}

,

ϕ

to initialize the value networks

Q_{θ_{1}}

,

Q_{θ_{2}}

and the policy network

π_{ϕ}

Initialize the target networks

{θ_{1}}^{'} \leftarrow θ_{1}

,

{θ_{2}}^{'} \leftarrow θ_{2}

,

ϕ^{'} \leftarrow ϕ

Initialize the replay buffer B;
for

t = 1

to T do

end

4.2.4. DRL Model Training Results

The training reward curves of the TD3 model using the HER-OAT method and not using this method are shown in Figure 9.

The reward of the model using the HER-OAT method stabilizes after 8000 iterations, whereas the model without this method requires 16,000 iterations to approach stability, and the reward does not reach its maximum value. This demonstrates that the HER-OAT method significantly enhances both training efficiency and performance. However, as shown in Figure 9, some instability remains. In the later iterations, collisions may still occur, and the planned path may not be globally optimal.

The planning results using the DRL model alone are shown in Figure 10, where the brown curve indicates the optimal path and the blue curve indicates the path resulting from planning.

As shown in Figure 10, when relying on the DRL model alone for motion planning, it is susceptible to environmental noise interference, which makes it difficult to adequately ensure path safety and optimality. Especially in the vicinity of obstacles, the performance of the DRL model is degraded, and it is easy to fall into local fluctuations and be far away from the optimal path. However, in the initial stage of planning and the goal-approaching stage, the DRL model shows excellent performance.

4.3. MSS-RRT* Motion Planning Solution

The MSS-RRT* motion planning solution integrates multiple advanced algorithms, adopts the target-corrected dual Jacobi iteration method to achieve accurate inverse solution, leverages the cylindrical envelope box method for efficient collision detection, uses a multi-strategy sampling mechanism to enhance the performance of RRT* path planning, and utilizes the B-spline interpolation method for path smoothing.

The MSS-RRT* motion planning process is shown in Figure 11, with the following detailed steps:

First, the accurate target joint values are calculated using the target-corrected dual Jacobi iteration inverse solution. The intermediate joint values obtained from the two Jacobi iterations are then checked for collisions. If no collision is detected, dynamic trajectory planning is performed directly. If a collision is detected, the MSS-RRT* method is employed for motion planning.
Configure the sampling method probabilities based on the current position. DRL position ball sampling constructs the state using the current joint values, target joint values, and obstacle joint values. The state is input into the DRL model to obtain the action, which determines the end position based on the joint value changes. This position is then used as the center of a ball for sampling within the ball. Goal-oriented sampling randomly samples around the target position. Spatial random sampling randomly selects sampling points within the robot arm’s effective space.
Collision detection is performed on the new sampling point, which is discarded if a collision occurs and retained if no collision occurs.
Perform the reselection of parent nodes and rewiring operations of the MSS-RRT* algorithm.
Check if the target joint value is reached. If not, repeat steps 2 and 3. Once the target joint value is reached, perform path backtracking and smoothing, followed by dynamic information planning to complete the path planning task.
If the maximum number of iterations is reached and the target joint value has not been achieved, the planning is considered a failure.

The MSS-RRT* algorithm pseudo-code is shown below (Algorithm 2).

Algorithm 2: MSS-RRT*

Input:

V \leftarrow {x_{i n i t}}

,

E \leftarrow {\emptyset}

,

x_{g o a l}

,

θ_{o b s}

,

θ_{m a i n}^{i n i t}

,

θ_{m a i n}^{g o a l}

,

s t e p s i z e

,

I_{m a x}

,

T r

Output:

G = (V, E)

The rough change curve of each joint is obtained by path backtracking, and finally the curve is smoothed by B-spline interpolation method.

5. Experiments and Simulations

This section will sequentially present experiments on inverse solution localization accuracy, path planning, and joint smoothing, followed by trajectory planning simulations for ROS systems.

5.1. Inverse Solution Localization Accuracy Experiment

To verify the accuracy of the target-corrected dual Jacobi iteration inverse solution method, 50 sets of inverse solution experiments were conducted. The proposed method is compared with the single Jacobi iteration and the ASWO-BP neural network inverse solution method, where the ASWO-BP neural network establishes a mapping between the actual position of the end effector and the robotic arm joint values, resulting in higher inverse solution accuracy [31]. The experimental results are shown in Figure 12.

The average localization errors in the X, Y, and Z directions for the single Jacobi iteration are 54.43 mm, 53.16 mm, and 60.44 mm. In contrast, for the target-corrected dual Jacobi iteration, the average localization errors in the three directions are 4.56 mm, 3.78 mm, and 4.75 mm, representing improvements of 91.62%, 92.89%, and 92.14%. These errors are also lower than the three-direction average localization errors of the ASWO-BP neural network inverse solution method, which are 11.84 mm, 13.78 mm, and 14.95 mm. From the box plots of the spatial position errors of the three methods, the mean and standard deviation of the three methods are 106.09 ± 37.82 mm, 6.21 ± 3.29 mm, and 24.75 ± 7.55 mm, respectively. Among them, the spatial position error of the method in this paper (6.21 ± 3.29 mm) is significantly lower than that of the other two methods, and the degree of dispersion of the error is smaller. In addition, the method in this paper performs optimally in terms of the maximum value and median value of the error.

5.2. Path Planning and Joint Smoothing

First, the planning processes of the four algorithms are visualized through path planning illustrations. Then, based on 30 sets of path planning data, the performance metrics of the four algorithms are compared and analyzed. Finally, the results of joint smoothing and the end-effector trajectory are presented.

5.2.1. Path Planning Illustration

A simulation platform for robotic arm path planning is constructed using MATLAB 2023b software, running on an Intel I7 14650HX CPU. To verify the effectiveness of the MSS-RRT* algorithm in path planning, the RRT* algorithm (the basic algorithm, set to stop when it finds the path), the Quick-RRT* algorithm (by accelerating the optimization to obtain a faster planning time), and the Informed-RRT* algorithm (which produces an asymptotically optimal path) are chosen for comparison. The path planning results of the four algorithms are shown in Figure 13.

The settings for

θ_{m a i n}^{s t a r t}

and

θ_{o b s}

are (0°, 0°, 0 mm, 0°, 0°, 0°, 0 mm), and the settings for

θ_{m a i n}^{g o a l}

are (37°, 2°, 933 mm, 0°, −2°, −37°, 309 mm). In the figure, the gray robotic arm represents

A_{m a i n}^{s t a r t}

, the red robotic arm represents

A_{o b s}

, the purple point denotes the start position, the red point denotes the target position, the green points represent the nodes, the brown lines indicate the node connections, and the blue line represents the feasible path planned by the algorithm.

As shown in Figure 13, the undirected sampling in the initial stage of the RRT* algorithm generates a large number of invalid sampling points. Meanwhile, the limited number of valid sampling points results in more inflection points, causing the path to zigzag. The Quick-RRT* algorithm achieves faster path planning, but the paths are not optimal by accelerating and optimizing the sampling expansion, parent node selection, and rewiring processes. The Informed-RRT* algorithm, after successfully planning the initial path, reduces the generation of some invalid sampling points by sampling within a defined elliptical effective region. While this approach finds a better path, it requires a longer planning time. In contrast, the MSS-RRT* algorithm avoids invalid directional sampling in the first segment using DRL position sphere sampling. In the middle segment, it optimizes the path near obstacles through spatial random sampling, mitigating the failure and collisions associated with DRL position sphere sampling. In the final segment, it quickly reaches the target position through goal-oriented sampling, resulting in an overall path that closely approximates the optimal solution.

5.2.2. Statistical Analysis of 30 Sets of Path Planning

Thirty sets of

{θ_{m a i n}^{s t a r t}, θ_{m a i n}^{g o a l}, θ_{o b s}}

were randomly selected. Due to the stochastic nature of the sampling-based path planning algorithm, each group was planned 10 times. The length of the robotic arm’s end-effector path, the number of sampling points, the planning time, and the number of failures were recorded. (MSS-RRT*, Quick-RRT*, and RRT* algorithms are considered planning failures if they exceed 500 iterations.)

The four key data sets in Table 4 were statistically analyzed, and p-values as well as 95% confidence intervals were calculated to assess statistical significance, practical relevance, and reliability.

The comparison of path lengths between the MSS-RRT* and RRT* algorithms shows a mean difference of 1.15 (95% CI [0.96, 1.34], p < 0.001), indicating a stable and statistically significant difference between the two groups.
The comparison of path lengths between the MSS-RRT* and Informed-RRT* algorithms shows a mean difference of 0.09 (95% CI [−0.08, 0.26], p = 0.289), indicating no statistically significant difference between the two groups.
The comparison of planning time between the MSS-RRT* and RRT* algorithms shows a mean difference of 2.7 (95% CI [2.35, 3.05], p < 0.001), confirming a stable and statistically significant difference between the two groups.
The comparison of planning time between the MSS-RRT* and Quick-RRT* algorithms shows a mean difference of 0.3 (95% CI [0.025, 0.485], p = 0.031 < 0.05), providing confidence that there is a small and stable difference between the two groups.

As shown in Table 4, in terms of the optimal path, the average path length of MSS-RRT* in 30 sets of experiments is 3.67 m, which is close to Informed-RRT*’s 3.58 m, indicating that MSS-RRT* plans a near-optimal path in only one-fifth of the planning time of Informed-RRT*. In terms of the number of sampling points, the average number of sampling points used for planning by the MSS-RRT* algorithm is 172, which is less than the 267 of the RRT* algorithm, the 258 of the Quick-RRT* algorithm, and the 836 of the Informed-RRT* algorithm. The number of sampling points directly affects the planning time, with the MSS-RRT* algorithm taking an average of 5.2 s, which is slightly lower than the Quick-RRT* algorithm’s 5.5 s and significantly lower than the RRT* algorithm’s 7.9 s. In the 30 sets of experiments, the MSS-RRT* algorithm had only one failed planning, while the Quick-RRT* algorithm and the RRT* algorithm had three and six, respectively.

5.2.3. Joint Smoothing and End Trajectory

Six path planning experiments were conducted to record the change data of each joint of the robot arm (the change amplitude of the robot arm beam flip joint is small, so it is ignored here). After smoothing by B-spline interpolation method, the change curves of the six joints and the change curve of the spatial position of the end of the robot arm in the base coordinate system are shown in Figure 14. The red point in the figure is the starting point of the path, and the blue point is the end point of the path.

As shown in Figure 14, there are no obvious inflection points in the change curves of each joint, indicating that the B-spline interpolation method effectively ensures the smoothness of joint movement. The smooth change in the joint makes the end running trajectory smooth as well, effectively avoiding the shaking of the robot arm and ensuring the smooth operation of the robot arm.

5.3. ROS System Trajectory Planning Simulation

Based on the rock drilling robot arm URDF file, a three-arm simulation model of the rock drilling robot is constructed to accurately restore the geometric dimensions and physical properties of the experimental object. The discrete joint position information is input into the ROS system for interpolation processing to calculate the speed and acceleration information. The TimeParameterization module in MoveIt is used to time-parameterize the joint trajectory and optimize the time allocation of the trajectory to ensure that the movement of the robot arm is smooth and executable while meeting the speed and acceleration constraints. Finally, the motion trajectory of the robot arm is visualized in Rviz.

In the same state, multiple experiments were conducted using the MSS-RRT* and RRT* algorithms to verify the stability of each method, as shown in Figure 15. The joint settings are the same as those in the path planning experiment. The red robot arm in the figure is

A_{o b s}

, the yellow robot arm is

A_{m a i n}^{g o a l}

, the gray robot arm represents the planning process of

A_{m a i n}

from the initial pose to the target pose, and the white curve represents the terminal running trajectory of

A_{m a i n}

. In order to clearly observe the running status of the robot arm, the planning situation is observed from the side view and the front view, respectively.

As shown in Figure 15, the MSS-RRT* algorithm is significantly better than the RRT* algorithm in terms of the length of the end trajectory and the joint inflection points. In terms of stability, the RRT* algorithm produces paths with significant variation, resulting in longer paths in some cases. In contrast, the MSS-RRT* algorithm generates more consistent paths, effectively avoiding obstacles and staying closer to the optimal trajectory.

6. Conclusions

The dual Jacobi iteration method based on target correction improves spatial localization accuracy by 94.15% compared to the single Jacobi iteration method, and by 74.95% compared to the ASWO-BP neural network inverse solution method. The proposed HER-OAT method significantly enhances DRL model training efficiency. The multi-strategy sampling mechanism of the MSS-RRT* algorithm strategically applies three sampling methods at different stages of motion planning, effectively reducing the generation of invalid sampling points and guiding the paths toward optimal solutions in a short time. The experimental results demonstrate that MSS-RRT* plans a near-optimal path in only 20.71% of the planning time required by Informed-RRT*. With similar planning times, MSS-RRT* reduces the path length by 21.58% compared to Quick-RRT* while offering superior security and stability.

The method proposed in this paper provides an innovative solution for high-precision motion planning of multi-manipulator arm systems and shows significant application potential in complex task scenarios. However, this study still has the following limitation: in dynamic obstacle environments, the real-time performance of robotic arm motion planning is limited by the computational efficiency of the current algorithm, which has difficulty meeting the real-time response requirements of highly dynamic scenes. Future research should focus on breaking through the real-time computational bottleneck of dynamic motion planning, which can be explored in depth in the following directions:

Accelerating the sampling process with GPU parallel computing architecture to improve planning efficiency.
Establishing a real-time dynamic response mechanism based on prediction-correction, enhancing system real-time performance through feedforward compensation and feedback adjustment.
Developing a multimodal obstacle motion state estimation system by integrating vision and depth sensor information.

Author Contributions

Conceptualization, Y.L.; methodology, Q.X. and Y.L.; software, Y.L.; validation, Y.L.; formal analysis, Y.L.; investigation, Y.L. and Q.X.; resources, Y.L.; data curation, Y.L.; writing—original draft preparation, Y.L.; writing—review and editing, Q.X. and Y.L.; visualization, Y.L.; supervision, Q.X.; project administration, Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used to support the findings of this study are available from the corresponding author upon request. The data are not publicly available due to privacy.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

MSS-RRT*	Multi-strategy sampling RRT*
DRL	Deep reinforcement learning
TD3	Twin delayed deep deterministic policy gradient
HER-OAT	Hindsight experience replay-obstacle arm transfer
DRL-PBS	DRL position ball sampling
SRS	Spatial random sampling
GOS	Goal-oriented sampling

References

Gasparetto, A.; Boscariol, P.; Lanzutti, A.; Vidoni, R. Path planning and trajectory planning algorithms: A general overview. In Motion and Operation Planning of Robotic Systems: Background and Practical Approaches; Springer: Berlin/Heidelberg, Germany, 2015; pp. 3–27. [Google Scholar]
LaValle, S. Rapidly-Exploring Random Trees: A New Tool for Path Planning; The Annual Research Report; Department of Computer Science, Iowa State University: Ames, IA, USA, 1 January 1998. [Google Scholar]
Lv, H.; Zeng, D.; Li, X. Based on GMM-RRT* Algorithm for Path Planning Picking Kiwifruit Manipulator. In Proceedings of the 2023 42nd Chinese Control Conference (CCC), Tianjin, China, 24–26 July 2023; pp. 4255–4260. [Google Scholar]
Long, H.; Li, G.; Zhou, F.; Chen, T. Cooperative dynamic motion planning for dual manipulator arms based on RRT* Smart-AD Algorithm. Sensors 2023, 23, 7759. [Google Scholar] [CrossRef] [PubMed]
Jeong, I.B.; Lee, S.J.; Kim, J.H. Quick-RRT*: Triangular inequality-based implementation of RRT* with improved initial solution and convergence rate. Expert Syst. Appl. 2019, 123, 82–90. [Google Scholar] [CrossRef]
Wu, B.; Wu, X.; Hui, N.; Han, X. Trajectory Planning and Singularity Avoidance Algorithm for Robotic Arm Obstacle Avoidance Based on an Improved Fast Marching Tree. Appl. Sci. 2024, 14, 3241. [Google Scholar] [CrossRef]
Kuffner, J.J.; LaValle, S.M. RRT-connect: An efficient approach to single-query path planning. In Proceedings of the 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No. 00CH37065), San Francisco, CA, USA, 24–28 April 2000; Volume 2, pp. 995–1001. [Google Scholar]
Karaman, S.; Frazzoli, E. Sampling-based algorithms for optimal motion planning. Int. J. Robot. Res. 2011, 30, 846–894. [Google Scholar] [CrossRef]
Gammell, J.D.; Srinivasa, S.S.; Barfoot, T.D. Informed RRT*: Optimal sampling-based path planning focused via direct sampling of an admissible ellipsoidal heuristic. In Proceedings of the 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, Chicago, IL, USA, 14–18 September 2014; pp. 2997–3004. [Google Scholar]
Zhang, Q.; Li, H.; Duan, J.; Qin, J.; Zhou, Y. Multi-Objective Point Motion Planning for Assembly Robotic Arm Based on IPQ-RRT* Connect Algorithm. Actuators 2023, 12, 459. [Google Scholar] [CrossRef]
Liang, B.; Chen, Z.; Guo, M.; Wang, Y.; Wang, Y. Space robot target intelligent capture system based on deep reinforcement learning model. In Proceedings of the Journal of Physics: Conference Series; IOP Publishing: Bristol, UK, 2021; Volume 1848, p. 012078. [Google Scholar]
Liang, C.; Liu, L.; Liu, C. Multi-UAV autonomous collision avoidance based on PPO-GIC algorithm with CNN–LSTM fusion network. Neural Netw. 2023, 162, 21–33. [Google Scholar] [CrossRef]
Zhang, S.; Xia, Q.; Chen, M.; Cheng, S. Multi-objective optimal trajectory planning for robotic arms using deep reinforcement learning. Sensors 2023, 23, 5974. [Google Scholar] [CrossRef]
Liu, X.Y.; Wang, G.L. A Trajectory Planning Method for Capture Operation of Space Robotic Arm Based on Deep Reinforcement Learning. J. Comput. Inf. Sci. Eng. 2024, 24, 091003. [Google Scholar]
Amarjyoti, S. Deep reinforcement learning for robotic manipulation-the state of the art. arXiv 2017, arXiv:1701.08878. [Google Scholar]
Yang, Y.; Ni, Z.; Gao, M.; Zhang, J.; Tao, D. Collaborative pushing and grasping of tightly stacked objects via deep reinforcement learning. IEEE/CAA J. Autom. Sin. 2021, 9, 135–145. [Google Scholar] [CrossRef]
Zhao, D.; Ding, Z.; Li, W.; Zhao, S.; Du, Y. Robotic Arm Trajectory Planning Method Using Deep Deterministic Policy Gradient With Hierarchical Memory Structure. IEEE Access 2023, 11, 140801–140814. [Google Scholar] [CrossRef]
Andrychowicz, M.; Wolski, F.; Ray, A.; Schneider, J.; Fong, R.; Welinder, P.; McGrew, B.; Tobin, J.; Pieter Abbeel, O.; Zaremba, W. Hindsight experience replay. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar] [CrossRef]
Kim, M.; Han, D.K.; Park, J.H.; Kim, J.S. Motion planning of robot manipulators for a smoother path using a twin delayed deep deterministic policy gradient with hindsight experience replay. Appl. Sci. 2020, 10, 575. [Google Scholar] [CrossRef]
Hao, Z.; Zhang, D.; Honarvar Shakibaei Asli, B. Motion Prediction and Object Detection for Image-Based Visual Servoing Systems Using Deep Learning. Electronics 2024, 13, 3487. [Google Scholar] [CrossRef]
Liu, J.; Yap, H.J.; Khairuddin, A.S.M. Path Planning for the Robotic Manipulator in Dynamic Environments Based on a Deep Reinforcement Learning Method. J. Intell. Robot. Syst. 2025, 111, 3. [Google Scholar] [CrossRef]
Chiang, H.T.L.; Hsu, J.; Fiser, M.; Tapia, L.; Faust, A. RL-RRT: Kinodynamic motion planning via learning reachability estimators from RL policies. IEEE Robot. Autom. Lett. 2019, 4, 4298–4305. [Google Scholar] [CrossRef]
Kawabe, T.; Nishi, T. A flexible collision-free trajectory planning for multiple robot arms by combining Q-learning and RRT. In Proceedings of the 2022 IEEE 18th International Conference on Automation Science and Engineering (CASE), Mexico City, Mexico, 20–24 August 2022; pp. 2363–2368. [Google Scholar]
Gao, J.; Ye, W.; Guo, J.; Li, Z. Deep reinforcement learning for indoor mobile robot path planning. Sensors 2020, 20, 5493. [Google Scholar] [CrossRef]
Cai, R.; Li, X. Path Planning Method for Manipulators Based on Improved Twin Delayed Deep Deterministic Policy Gradient and RRT. Appl. Sci. 2024, 14, 2765. [Google Scholar] [CrossRef]
Liu, S.; Wu, Y.; Zhang, H. An Improved RRT Path Planning Method Incorporating Deep Reinforcement Learning for Space Multi-arm Robot. In International Conference on Artificial Intelligence, Robotics, and Communication; Springer: Berlin/Heidelberg, Germany, 2023; pp. 225–237. [Google Scholar]
Manocha, D.; Canny, J.F. Efficient inverse kinematics for general 6R manipulators. IEEE Trans. Robotics Autom. 1994, 10, 648–657. [Google Scholar] [CrossRef]
Aydogmus, O.; Boztas, G. Implementation of singularity-free inverse kinematics for humanoid robotic arm using Bayesian optimized deep neural network. Measurement 2024, 229, 114471. [Google Scholar] [CrossRef]
Wagaa, N.; Kallel, H.; Mellouli, N. Analytical and deep learning approaches for solving the inverse kinematic problem of a high degrees of freedom robotic arm. Eng. Appl. Artif. Intell. 2023, 123, 106301. [Google Scholar] [CrossRef]
Reiter, A.; Müller, A.; Gattringer, H. On Higher Order Inverse Kinematics Methods in Time-Optimal Trajectory Planning for Kinematically Redundant Manipulators. IEEE Trans. Ind. Inform. 2018, 14, 1681–1690. [Google Scholar] [CrossRef]
Lin, Y.; Xu, Q.; Ju, W.; Zhang, T. Inverse Kinematics of Large Hydraulic Manipulator Arm Based on ASWO Optimized BP Neural Network. Appl. Sci. 2024, 14, 5551. [Google Scholar] [CrossRef]

Figure 1. The DH model structure of the robotic arm.

Figure 2. The process of establishing the forward kinematics error model.

Figure 3. End-effector position error.

Figure 4. The dual Jacobi iterative inverse solution algorithm based on target correction.

Figure 5. Multi-strategy sampling mechanism.

Figure 6. TD3 algorithm framework diagram.

Figure 7. Robotic arm envelope box model.

Figure 8. HER-OAT method.

Figure 9. Model training reward curve.

Figure 10. Using DRL position ball sampling alone.

Figure 11. MSS-RRT* motion planning process.

Figure 12. Inverse solution error comparison.

Figure 13. Path planning illustration.

Figure 14. Joint variation and end-effector trajectory curves.

Figure 15. Trajectory planning simulation.

Table 1. DH parameters table.

i	$θ_{i} / °$	$d_{i} /$ mm	$α_{i} / °$	$a_{i} /$ mm	Joint Range
1	$θ_{1}$ (0)	0	−90	200	$θ_{1}$ (−40,40)
2	$θ_{2}$ (90)	0	90	−35	$θ_{2}$ (−50,40)
3	90	$d_{3}$ (4297)	0	0	$d_{3}$ (0,2200)
4	$θ_{4}$ (90)	0	90	0	$θ_{4}$ (−180,180)
5	$θ_{5}$ (90)	650	90	80	$θ_{5}$ (−130,130)
6	$θ_{6}$ (90)	679.5	90	241	$θ_{6}$ (−12,58)
7	0	$d_{7}$ (3058)	0	0	$d_{7}$ (0,1600)

Table 2. Symbol reference table.

Symbol	Definition	Symbol	Definition
$θ_{n o w}^{1 - 7}$	Current initial joint values	$P_{o n e}^{t g t}$	Target end-effector position
$T_{o n e}^{t g t}$	Initial target pose matrix	$P_{o n e}^{e r r o r}$	Error value between $P_{o n e}^{t g t}$ and $P_{o n e}^{a c t}$
$θ_{o n e}^{1 - 7}$	Joint values obtained from the first inverse kinematics solution	$P_{t w o}^{t g t}$	Corrected target position
$P_{o n e}^{t h e o}$	Theoretical end-effector position corresponding to $θ_{o n e}^{1 - 7}$	$θ_{u p d}^{1 - 7}$	Joint values updated after the first inverse kinematics solution
$P_{o n e}^{c o m p}$	End-effector error compensation corresponding to $θ_{o n e}^{1 - 7}$	$T_{t w o}^{t g t}$	Updated target pose matrix
$P_{o n e}^{a c t}$	Actual end-effector position corresponding to $θ_{o n e}^{1 - 7}$	$θ_{t w o}^{1 - 7}$	Precise joint values obtained from the second inverse kinematics solution

Table 3. Symbol reference table.

Symbol	Definition	Symbol	Definition
$A_{m a i n}$	Main robotic arm (middle arm)	$P_{S R S}$	Probability of spatial random sampling
$A_{m a i n}^{s t a r t}$	Initial state of $A_{m a i n}$	$P_{G O S}$	Probability of goal-oriented sampling
$A_{m a i n}^{n o w}$	Current state of $A_{m a i n}$	$d_{n o w}^{g o a l}$	Euclidean distance between the end effectors of $A_{m a i n}^{n o w}$ and $A_{m a i n}^{g o a l}$
$A_{m a i n}^{g o a l}$	Goal state of $A_{m a i n}$	$d_{t h r e s h}^{g o a l}$	Euclidean distance between the end effectors of $A_{m a i n}$ and $A_{m a i n}^{g o a l}$ at critical state
$A_{o b s}$	Obstacle robotic arm (left arm and right arm)	$θ_{m a i n}^{s t a r t}$	Initial joint values of $A_{m a i n}$
$D_{n o w}^{o b s}$	Minimum distance between $A_{m a i n}^{n o w}$ and $A_{o b s}$	$θ_{m a i n}^{n o w}$	Current joint values of $A_{m a i n}$
$D_{s t a r t}^{o b s}$	Minimum distance between $A_{m a i n}^{s t a r t}$ and $A_{o b s}$	$θ_{m a i n}^{g o a l}$	Goal joint values of $A_{m a i n}$
$D_{t h r e s h}$	Critical minimum distance	$θ_{o b s}^{l e f t}$	Joint values of the obstacle robotic arm (left)
$P_{D R L}$	Probability of DRL position ball sampling	$θ_{o b s}^{r i g h t}$	Joint values of the obstacle robotic arm (right)

Table 4. Thirty sets of planning data statistics.

Group	Algorithm	End Path Length/m	Number of Sampling Points	Planning Time/s	Number of Failures
Group 1	MSS-RRT*	2.97–3.21	122–150	3.9–4.1	0
	RRT*	3.43–4.36	204–285	6.1–9.2	0
	Informed-RRT*	2.88–3.04	752–906	22.6–26.5	-
	Quick-RRT*	3.34–4.45	216–281	4.5–4.9	0
Group 2	MSS-RRT*	3.89–4.15	184–202	5.5–6.1	0
	RRT*	4.37–5.72	221–315	6.7–9.5	1
	Informed-RRT*	3.81–4.06	763–943	23.1–28.3	-
	Quick-RRT*	4.35–5.41	219–288	5.6–6.3	0
⋮	⋮	⋮	⋮	⋮	⋮
Group 30	MSS-RRT*	3.34–3.57	161–190	5–5.8	0
	RRT*	3.86–4.63	203–261	6.1–7.9	0
	Informed-RRT*	3.22–3.39	720–857	22.3–25.7	-
	Quick-RRT*	3.64–4.68	199–271	5.2–6	0
Total average	MSS-RRT*	3.67	172	5.2	1 (cumulative)
	RRT*	4.82	264	7.9	6 (cumulative)
	Informed-RRT*	3.58	836	25.1	-
	Quick-RRT*	4.68	258	5.5	3 (cumulative)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xu, Q.; Lin, Y. Research on High-Precision Motion Planning of Large Multi-Arm Rock Drilling Robot Based on Multi-Strategy Sampling Rapidly Exploring Random Tree*. Sensors 2025, 25, 2654. https://doi.org/10.3390/s25092654

AMA Style

Xu Q, Lin Y. Research on High-Precision Motion Planning of Large Multi-Arm Rock Drilling Robot Based on Multi-Strategy Sampling Rapidly Exploring Random Tree*. Sensors. 2025; 25(9):2654. https://doi.org/10.3390/s25092654

Chicago/Turabian Style

Xu, Qiaoyu, and Yansong Lin. 2025. "Research on High-Precision Motion Planning of Large Multi-Arm Rock Drilling Robot Based on Multi-Strategy Sampling Rapidly Exploring Random Tree*" Sensors 25, no. 9: 2654. https://doi.org/10.3390/s25092654

APA Style

Xu, Q., & Lin, Y. (2025). Research on High-Precision Motion Planning of Large Multi-Arm Rock Drilling Robot Based on Multi-Strategy Sampling Rapidly Exploring Random Tree*. Sensors, 25(9), 2654. https://doi.org/10.3390/s25092654

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research on High-Precision Motion Planning of Large Multi-Arm Rock Drilling Robot Based on Multi-Strategy Sampling Rapidly Exploring Random Tree*

Abstract

1. Introduction

2. The Establishment of DH Model and Forward Kinematics Error Model

3. The Dual Jacobi Iterative Inverse Solution Algorithm Based on Target Correction

3.1. Mathematical Foundations of the Jacobi Iterative Method

3.2. Implementation of Dual Jacobi Iteration Algorithm Based on Target Correction

4. MSS-RRT* Motion Planning Algorithm

4.1. Multi-Strategy Sampling Mechanism

4.2. DRL Model Construction and Training Results

4.2.1. DRL Model Foundation Framework Construction

4.2.2. Collision Detection

4.2.3. HER-OAT Experience Replay Mechanism

4.2.4. DRL Model Training Results

4.3. MSS-RRT* Motion Planning Solution

5. Experiments and Simulations

5.1. Inverse Solution Localization Accuracy Experiment

5.2. Path Planning and Joint Smoothing

5.2.1. Path Planning Illustration

5.2.2. Statistical Analysis of 30 Sets of Path Planning

5.2.3. Joint Smoothing and End Trajectory

5.3. ROS System Trajectory Planning Simulation

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI