A Trajectory Optimisation-Based Incremental Learning Strategy for Learning from Demonstration

Wang, Yuqi; Li, Weidong; Liang, Yuchen

doi:10.3390/app14114943

Open AccessArticle

A Trajectory Optimisation-Based Incremental Learning Strategy for Learning from Demonstration

by

Yuqi Wang

^1,4,

Weidong Li

^2,1,* and

Yuchen Liang

³

¹

School of Transportation and Logistics Engineering, Wuhan University of Technology, Wuhan 430063, China

²

School of Mechanical Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China

³

School of Mechanical Engineering, Jiangsu University, Zhenjiang 212013, China

⁴

School of Engineering, University of Birmingham, Birmingham B15 2TT, UK

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(11), 4943; https://doi.org/10.3390/app14114943

Submission received: 13 May 2024 / Revised: 1 June 2024 / Accepted: 3 June 2024 / Published: 6 June 2024

(This article belongs to the Section Robotics and Automation)

Download

Browse Figures

Versions Notes

Abstract

:

The insufficient generalisation capability of the conventional learning from demonstration (LfD) model necessitates redemonstrations. In addition, retraining the model can overwrite existing knowledge, making it impossible to perform previously acquired skills in new application scenarios. These are not economical and efficient. To address the issues, in this study, a broad learning system (BLS) and probabilistic roadmap (PRM) are integrated with dynamic movement primitive (DMP)-based LfD. Three key innovations are proposed in this paper: (1) segmentation and extended demonstration: a 1D-based topology trajectory segmentation algorithm (1D-SEG) is designed to divide the original demonstration into several segments. Following the segmentation, a Gaussian probabilistic roadmap (G-PRM) is proposed to generate an extended demonstration that retains the geometric features of the original demonstration. (2) DMP modelling and incremental learning updating: BLS-based incremental learning for DMP (Bi-DMP) is performed based on the constructed DMP and extended demonstration. With this incremental learning approach, the DMP is capable of self-updating in response to task demands, preserving previously acquired skills and updating them without training from scratch. (3) Electric vehicle (EV) battery disassembly case study: this study developed a solution suitable for EV battery disassembly and established a decommissioned battery disassembly experimental platform. Unscrewing nuts and battery cell removal are selected to verify the effectiveness of the proposed algorithms based on the battery disassembly experimental platform. In this study, the effectiveness of the algorithms designed in this paper is measured by the success rate and error of the task execution. In the task of unscrewing nuts, the success rate of the classical DMP is 57.14% and the maximum error is 2.760 mm. After the optimisation of 1D-SEG, G-PRM, and Bi-DMP, the success rate of the task is increased to 100% and the maximum error is reduced to 1.477 mm.

Keywords:

learning from demonstration (LfD); dynamic movement primitives (DMP); incremental learning; broad learning system (BLS); probabilistic roadmap (PRM)

1. Introduction

Industrial robots have been designed to perform repetitive tasks, so they have been extensively used in the mass production of automotive, aerospace, and marine industries. As the demand for customised production has dramatically increased in recent years, collaborative robots (cobots) have been actively developed to carry out customised tasks within unstructured environments [1]. Learning from demonstration (LfD) is an effective approach to facilitating cobots to fulfil the above purpose. LfD can enable cobots to efficiently realise customised operations based on human demonstrations without the need for detailed motion programming or reprogramming [2,3]. The process of LfD is illustrated in Figure 1, where a cobot performs a pick-and-place task. To create a demonstration, a human operator drags the end-effector of the cobot to pick up an object from a start point (the yellow box in Figure 1) and place it at a target point (the marker in Figure 1). The demonstration (trajectory) is shown as the red curve in Figure 1. The demonstration is recorded and used to train an LfD-based learning model. To accomplish a similar task where the marker is moved to a new position, the trained LfD-based learning model can adaptively generate a new trajectory (generated trajectory). This is illustrated as the blue dashed line in Figure 1.

Dynamic movement primitive (DMP) is an effective learning model for implementing LfD [4]. Unlike Gaussian mixture modelling (GMM), the DMP model only needs to learn from one demonstration, thereby minimising the difficulty of creating multiple demonstrations [5,6,7]. The mathematical representation of the DMP model includes a second-order spring damping system and a force item [8]. The former ensures that the generated trajectory created using the DMP model can converge to the target without divergence. The latter, which is constructed based on a series of radial basis function (RBF) kernels, is used to control the shape of the generated trajectory and manage the convergence process. The advantage of the DMP model is its robustness to perturbations [9]. However, the model exhibits some limitations as well. For instance, when the position of the marker illustrated in Figure 1 is moved far away from the previous demonstration area, the generated trajectory via DMP cannot work properly [10]. That is, the difference between the positions of the generated trajectory and the marker should not exceed δ, which is a threshold representing the maximum error allowable for the successful execution of the pick-and-place task. The situation is illustrated by the green dashed curve in Figure 1. To address the issue, the following considerations are given:

A straightforward idea is to create a new demonstration for the new situation. However, redemonstration is time consuming as the experimental environment needs to be set up again [3]. To mitigate the problem, it will be beneficial to reuse part of the originally created demonstration to adaptively generate a new trajectory (namely, an extended demonstration) for the new target;
DMP is a one-shot learning model, which means that when the extended demonstration is learned, the knowledge gained from the previous demonstration is forgotten. Therefore, it is imperative to develop an incremental learning mechanism for the DMP model to improve its generalisation capability for various situations.

Various approaches for trajectory generation were developed to complement demonstration data, such as task-parameterised Gaussian mixture models (TP-GMMs) [3], adversarial generative models [11], B-spline-based models [12], and PRM-based models [13]. These approaches can generate new trajectories flexibly. However, the lack of imitation and utilisation of prior knowledge (i.e., existing demonstrations) poses safety risks in LfD applications. In addition, some researchers attempted to improve the performance of DMP [14]. For instance, neural networks and reinforcement learning [15,16,17] were used to enhance the learning performance of the force term in the DMP model. The development of the incremental learning mechanisms for the DMP model received attention as well [18,19,20]. However, the previous research has problems of high complexity and computational difficulty. A more detailed analysis is provided in the following Section 2.

Based on the above considerations, in this study, an improved DMP model with extended demonstration and incremental learning capabilities is designed. The capabilities are implemented based on a 1D-based topology trajectory segmentation algorithm (1D-SEG), a Gaussian probabilistic roadmap (G-PRM), and a broad learning system (BLS). By constructing an appropriate sampling strategy and loss function, G-PRM can generate an extended demonstration based on the features extracted from the segmented original demonstration via 1D-SEG. This extended demonstration and the force item of the DMP model are fed into the BLS to update the DMP model to store previously learned skills (i.e., incremental learning). Three innovative characteristics of this study are represented below.

Segmentation and extended demonstration: 1D-SEG is combined with G-PRM to generate an extended demonstration by incorporating the features of the original demonstration, so that fewer demonstrations are required to minimise the cost of data collection;
DMP modelling and incremental learning update: The BLS learns the difference between the extended demonstration and the original demonstration by incrementally increasing the number of network nodes (hereafter referred to as additional enhancement nodes). The force item of the previously constructed DMP model is updated with the results generated by the BLS.
Electric vehicle (EV) battery disassembly cases and an experimental platform were used to verify the effectiveness of the developed approach. Based on the approach, the successful disassembly of nuts and battery cells were achieved.

The remainder of this paper is structured as follows. Section 2 reviews the related work from the perspective of the trajectory generation and DMP-based optimisation. The key research gaps of the related work are summarised in Section 2. According to these gaps, segmentation and extended demonstration-based methods (namely 1D-SEG and G-PRM) and BLS-based incremental learning for DMP (namely Bi-DMP) are developed in Section 3. In Section 4, a disassembly experimentation platform based on EV batteries and collaborative robots is presented. The effectiveness of the proposed algorithms was verified by three cases, i.e., pick-and-place, unscrewing nuts, and battery cell removal. A comprehensive discussion is given in Section 5. Finally, Section 6 concludes the research.

2. Review of Related Work

2.1. Trajectory Generation

Trajectory generation is an important function to support robotic applications. Zhu et al. [3] developed an LfD-enabled trajectory generation algorithm that combines Gaussian noise with a task-parameterised Gaussian mixture model (TP-GMM). This algorithm facilitates the creation of the generation trajectory by augmenting the original demonstration. Hu et al. [21] proposed a method based on multi-region cost learning. By dividing a demonstration (trajectory) into various sub-regions and establishing cost functions for each sub-region, this method can fine-tune and preserve the geometric features of the demonstration more accurately to optimise the generated trajectory. Peng et al. [11] designed a generative adversarial imitation learning (GAIL)-based algorithm to optimise generated trajectories. In this algorithm, reinforcement learning and generative adversarial networks were combined to allow for more diversity in trajectory generation.

In order to achieve the local adjustment capability and meet the precision requirement of trajectory generation, Li et al. [12] proposed a trajectory generation method that combines a B-spline algorithm with a pruning algorithm. The pruning algorithm was used to create a trajectory, and B-spline was applied to smoothen the trajectory. However, the control points of the B-spline algorithm cannot accurately mimic the features of the created trajectory. To enhance the similarity of the generated trajectory to the original demonstration and enable local adjustment, sampling-based trajectory generation algorithms are highly promising choices, attributed to their simplicity, flexibility, and computational efficiency [13]. Hüppi et al. [13] and Zhou et al. [22] proposed a trajectory generation method based on PRM. PRM can generate trajectories through the Dijkstra algorithm and sampling points. The distribution of sampling points can affect both the algorithm’s efficiency and the quality of the generated trajectory. It is important to design a suitable sampling strategy.

However, several limitations persist when applying these approaches in manufacturing processes: (1) due to the lack of efficient trajectory segmentation methods, the generation of trajectories can only rely on the entire original demonstration, which may not be economical enough when trajectory adjustments are needed for just one sub-region. (2) Mimicking the characteristics of the demonstration proves challenging, and the shape of generated trajectories is hardly controlled because there is a lack of research on efficient sampling strategy and similarity measurement between the original demonstration and extended demonstration.

2.2. DMP-Based Optimisation

DMP has been widely used in LfD. Park et al. [23] and Ijspeert et al. [24] refined the classic DMP to accommodate demonstrations with the same start and target, as well as with obstacle avoidance, which provides a basis for further optimisation. The force item in DMP is constructed based on RBF kernels and weights which play a crucial role in determining the learning effect on demonstration. Consequently, the optimisation of the force item has received substantial attention. Teng et al. [25] integrated the Takagi–Sugeno fuzzy system with DMP, indicating that the fuzzy inference system can estimate the force item. To better learn the non-linear demonstrations, various researchers used neural networks to optimise the force item. For instance, Si et al. [15] adopted radial basis function neural networks (RBFNNs) to model the forcing item in DMP, enabling the learning of the position and orientation of demonstrations. To further improve the quality of DMP-generated trajectories, both Noohian et al. [16] and Kim et al. [17] not only replaced the force item with neural networks but also applied reinforcement learning for model training. The searching mechanisms in RL can improve the quality of trajectories generated by DMP. Moreover, Davchev et al. [6] introduced a residual correction policy aimed at improving the generalisation of DMP in peg-in-hole applications, thereby refining the quality of trajectories generated by DMP. Notably, the aforementioned methods are mainly focused on the improvement of force item and the DMP training process. Due to a lack of incremental learning capacity, when confronted with a new demonstration, the DMP has to be retrained.

To enable incremental learning within DMP, several scholars have made preliminary explorations. Lu et al. [18,19] attempted to combine a BLS with the force item of DMP, aiming to equip DMP with incremental learning capability. The BLS has a flat network structure and rapid update capability [20]. This allows the DMP to be iteratively updated efficiently. However, this approach is complex in force item construction. Simplifying the approach for combining BLS and DMP and reducing the computational complexity are necessary. In conclusion, optimising and enhancing the force item to endow DMP with incremental learning capabilities are crucial in real applications.

Therefore, the key research gaps of the related works can be summarised as follows:

Mimicking the geometric features of the original demonstration proves challenging. There is a lack of research in designing an efficient sampling strategy and similarity measurement to avoid sharp turns in generated trajectories and enhance their resemblance to the demonstrations.
These approaches neglect integrating the incremental learning function within DMP. When presented with a new demonstration, the DMP model has to be retrained. Meanwhile, there is a shortage of industrial applications leveraging incremental learning.

3. Research Methodology

The methodology in this research consists of the following steps: (i) segmentation and extended demonstration; (ii) DMP modelling and incremental learning update. The framework of this section is shown in Figure 2. Details are given below.

Before introducing the methodology of this paper, it is necessary to state the superiority of the designed method. Traditional trajectory segmentation methods mainly include subjective decision making based on trajectory geometric features [26,27] and adaptive segmentation based on neural networks [28]. In the optimised 1D-SEG developed in this paper, both the geometric features of the trajectory and a rigorous mathematical process are considered. This can make the segmentation results more objective. Compared to other trajectory generation algorithms like rapidly exploring random tree (RRT) [29], the advantage of PRM lies in the ease of optimising sampling points, as it constructs a roadmap to facilitate finding better trajectories in complex environments. In addition, the flat structure and rapid update strategy of the BLS do not require extensive training time, making it suitable for quickly building models.

3.1. Segmentation of Demonstration

Curvature serves as a quantitative measure of a trajectory’s features and, inspired by [30], a curvature-based trajectory segmentation algorithm has been designed. The details are as follows:

As illustrated in Figure 3, the blue dashed curve represents the curvatures of a trajectory. According to the changing trend of the curvatures, the trajectory can be divided into sub-regions represented as P_i. Ma_i and Mi_i denote the local maximum and local minimum curvatures in the ith region. Ma₀ and Mi₃ are the global maximum and global minimum curvature for the entire trajectory. The Euclidean distance (

E u d i s

) between Ma_i and Mi_i in the ith region is defined as the curvature distance of the ith region ||P_i||. To optimise the segmentation of the trajectory, a threshold ε is defined. If ||P_i|| < ε, the ith region and its Ma_i and Mi_i will be eliminated in the segmentation process. The rest of the local maximum and local minimum curvatures are selected as the segmentation points.

3.2. Extended Demonstration via G-PRM

An extended demonstration will be created for the new target. The extended demonstration not only retains the features of the original demonstration but also reduces the computational complexity compared to traditional neural network algorithms. The process is detailed below.

After segmentation, the original demonstration

y_{d}

is divided into several sub-demonstrations (specifically, sub-1 and sub-2 as illustrated in Figure 4b). Subsequently, the

E u d i s

from the segmentation point and the target point to the new target point are calculated as ||d₁|| and ||d₂||, respectively, as shown in Figure 4b. Since ||d₂|| < ||d₁||, sub-2 is closer to the new target point. Consequently, sub-2 is selected as the trajectory to be imitated (

{T r a}_{i m i}

) for generating the extended demonstration (

{T r a}_{g e n}

), as shown in Figure 4b,c.

For generating the

{T r a}_{g e n}

, the PRM-based method is a preferred choice [31]. Figure 5 illustrates the fundamental processes of PRM. Initially, PRM involves generating sampling points (represented by blue dots) while avoiding obstacles, as depicted in Figure 5a. Subsequently, each sampling point is connected to its adjacent sampling points to form a roadmap, as illustrated in Figure 5b. Finally, to determine the shortest trajectory from position 1 to position 2, the Dijkstra algorithm is employed to search results in the roadmap. Figure 6 displays the flowchart of the G-PRM, with two red blocks highlighting the improvements. To more accurately express the meaning of the following equations, specific letters that represent sets and matrices will be presented in bold.

3.2.1. Generation of Sampling Points

According to Figure 5, the sampling points in PRM are generated randomly, which makes it difficult to control the shape of the resulting trajectory. By using Gaussian distribution-based sampling points (Gaussian sampling points,

G S P

), the adjustment of the generated trajectory

{T r a}_{g e n}

can be effectively facilitated.

The initialisation of

G S P

relies on Gaussian noise and coordinate differences. Here are the details: a Gaussian distribution is defined as

N (μ, σ^{2})

, where μ is the mean, and

σ^{2}

represents the variance. Gaussian noise points directly generated from the

N (μ, σ^{2})

are defined as

G N = {{G N}_{1}, {G N}_{2}, \dots, {G N}_{s a m p l e}}

. The ‘

~

’ symbol indicates that each dimension of the element in the

G N

follows

N (μ, σ^{2})

, which is expressed as Equation (1).

s a m p l e

denotes the number of

G N

, and

{G N}_{i} = ({G N}_{i_{x}}, {G N}_{i_{y}}, {G N}_{i_{z}})

,

i \in [1, s a m p l e]

.

{G N}_{i_{x}} ~ N (μ, σ^{2}), {G N}_{i_{y}} ~ N (μ, σ^{2}), {G N}_{i_{z}} ~ N (μ, σ^{2})

(1)

G N

needs further processing to yield

G S P

, which is distributed between the segmentation point and the new target point. The number of elements in

G S P

matches that in

G N

, hence

G S P = {{G S P}_{1}, {G S P}_{2}, \dots, {G S P}_{s a m p l e}}

, and

{G S P}_{i} = ({G S P}_{i_{x}}, {G S P}_{i_{y}}, {G S P}_{i_{z}})

, with the same

i \in [1, s a m p l e]

. A time series

t = {t_{1}, t_{2}, \dots, t_{s a m p l e}}

serves as a control variable to calculate the

{G S P}_{i}

from

{G N}_{i}

based on the coordinate differences between the segmentation point and the new target point. The segmentation point and new target point for

{T r a}_{g e n}

are

s e g = (x_{s} {, y}_{s}, z_{s})

and

n_t a r = (x_{e} {, y}_{e}, z_{e})

, respectively. The calculation for

t_{i}

in

t

and

{G S P}_{i}

is described in Equation (2):

\{\begin{matrix} {G S P}_{i_{x}} = x_{s} + t_{i} \cdot (x_{e} - x_{s}) + {G N}_{i_{x}} \\ {G S P}_{i_{y}} = y_{s} + t_{i} \cdot (y_{e} - y_{s}) + {G N}_{i_{y}} \\ G {S P}_{i_{z}} = z_{s} + t_{i} \cdot (z_{e} - z_{s}) + {G N}_{i_{z}} \end{matrix}, t_{i} = \frac{i}{s a m p l e}, i = 0, 1, \dots, s a m p l e

(2)

3.2.2. Bias Optimisation

G-PRM can generate an initial

{T r a}_{g e n}

based on

G S P

. To derive bias

b v

for updating

G S P

and

{T r a}_{g e n}

, two key processes are required: ➀ it is essential to design a Fréchet distance-based [32] similarity criterion to measure the similarity between

{T r a}_{g e n}

and

{T r a}_{i m i}

(

F r e s i m ({T r a}_{g e n}, {T r a}_{i m i})

).

F r e s i m ({T r a}_{g e n}, {T r a}_{i m i})

is derived from Fréchet distance. ➁ Scaling

{T r a}_{g e n}

and

{T r a}_{i m i}

to the same scale facilitates the calculation of coordinate differences.

➀ To calculate the Fréchet distance between

{T r a}_{i m i}

and

{T r a}_{g e n}

, three steps are involved:

Step 1—Initialisation definitions: This paper defines

F r é (a, b)

as the Fréchet distance between the first

a

points of

{T r a}_{i m i}

and the first

b

points of

{T r a}_{g e n}

. Both

{T r a}_{i m i}

and

{T r a}_{g e n}

contain an equal number of points, with

a

and

b \in (1, k]

.

k

represents the number of points on the two trajectories, respectively. Since

{T r a}_{i m i}

and

{T r a}_{g e n}

are discrete, the computation of the Fréchet distance of the two trajectories is based on

E u d i s

.

Step 2—Boundary conditions for Fréchet distance:

\begin{array}{l} F r é (1, 1) = E u d i s (s e g, s e g) \\ F r é (a, 1) = m a x (F r é (a - 1, 1), E u d i s ({T r a}_{i m i_a}, s e g)), a > 1 \\ F r é (1, b) = m a x (F r é (1, b - 1), E u d i s (s e g, {T r a}_{g e n_b})), b > 1 \end{array}

(3)

where

F r é (1, 1)

in Equation (3) represents the Fréchet distance between the first points (

s e g

) of

{T r a}_{i m i}

and

{T r a}_{g e n}

, the

E u d i s (\cdot)

represents the distance between two elements in the bracket.

{T r a}_{i m i_a}

and

{T r a}_{g e n_b}

represent the ath point and the bth point in

{T r a}_{i m i}

and

{T r a}_{g e n}

, respectively. The

\max (\cdot)

the largest element in the brackets.

Step 3—Recursive calculation for Fréchet distance: Based on Equation (3), the Fréchet distance between

{T r a}_{i m i}

and

{T r a}_{g e n}

can be calculated as shown in Equation (4):

F r é (a, b) = m a x (\begin{matrix} \min (F r é (a - 1, b), F r é (a, b - 1), F r é (a - 1, b - 1)) \\ , E u d i s ({T r a}_{i m i_a}, {T r a}_{g e n_b}) \end{matrix}), a > 1 a n d b > 1

(4)

where

\min (\cdot)

opts for the smallest value in brackets. When

a = k

and

b = k

, the Fréchet distance

F r é ({T r a}_{i m i}, {T r a}_{g e n})

between

{T r a}_{i m i}

and

{T r a}_{g e n}

is obtained.

Based on the calculation results from Equation (4),

F r e s i m ({T r a}_{g e n}, {T r a}_{i m i})

is calculated as shown in Equation (5):

F r e s i m ({T r a}_{g e n}, {T r a}_{i m i}) = m a x (1 - \frac{F r é ({T r a}_{g e n}, {T r a}_{i m i})}{\frac{\sqrt{l e n ({T r a}_{g e n}) \times l e n ({T r a}_{i m i})}}{\sqrt{2}}}, 0)

(5)

where

\sqrt{l e n ({T r a}_{g e n}) \times l e n ({T r a}_{i m i})}

is the geometric mean of the lengths of the

{T r a}_{i m i}

and

{T r a}_{g e n}

.

l e n (\cdot)

represents the length of the trajectory calculated via

E u d i s

.

F r e s i m (\cdot)

∈ [0, 1]. A score of 0 indicates complete dissimilarity between the two trajectories, whereas 1 signifies that the trajectories are identical.

➁ To facilitate bias optimisation,

{T r a}_{g e n}

needs to be scaled to the same scale as

{T r a}_{i m i}

. The target point of

{T r a}_{i m i}

is represented by

t a r = (x_{o} {, y}_{o}, z_{o})

as shown by the red dot in Figure 4. The calculated scaling vector is denoted as

\vec{s c a l e_{f a c t o r}}

, as shown in Equation (6):

\vec{s c a l e_{f a c t o r}} = \vec{(\frac{{x_{e} - x}_{s}}{{x_{o} - x}_{s}}, \frac{{y_{e} - y}_{s}}{{y_{o} - y}_{s}}, \frac{{z_{e} - z}_{s}}{{z_{o} - z}_{s}})}

(6)

\vec{s c a l e_{f a c t o r}}

includes scaling factors for x, y, and z axes, denoted as

s c a l e_{f a c t o r_x}

,

s c a l e_{f a c t o r_y}

, and

s c a l e_{f a c t o r_z}

, respectively. The scaled trajectory of

{T r a}_{g e n}

is defined as

{T r a}_{s}

. Each coordinate in

{T r a}_{s}

can be calculated by Equation (7):

\{\begin{matrix} {T r a}_{{s_i}_{x}} = x_{s} + s c a l e_{f a c t o r_x} \cdot ({T r a}_{{g e n_i}_{x}} - x_{s}) \\ {T r a}_{{s_i}_{y}} = y_{s} + s c a l e_{f a c t o r_y} \cdot ({T r a}_{{g e n_i}_{y}} - y_{s}) \\ {T r a}_{{s_i}_{z}} = z_{s} + s c a l e_{f a c t o r_z} \cdot ({T r a}_{{g e n_i}_{z}} - z_{s}) \end{matrix}, i = (1, 2 \dots, k)

(7)

where

{T r a}_{{s_i}_{x}}

,

{T r a}_{{s_i}_{y}}

, and

{T r a}_{{s_i}_{z}}

represent the x, y, and z coordinates of the ith point in

{T r a}_{s}

.

{T r a}_{{g e n_i}_{x}}

,

{T r a}_{{g e n_i}_{y}},

and

{T r a}_{{g e n_i}_{z}}

refer to the x, y, and z coordinates of the ith point in

{T r a}_{g e n}

.

Finally, the bias

b v = {{b v}_{1}, {b v}_{2}, \dots, {b v}_{k}}

for updating

G S P

is calculated based on the results of ➀ and ➁. The initial

b v

in each dimension is zero.

ρ

is the learning factor, and

i t e r

represents the current number of iterations. Equation (8) shows the calculation process of

b v

:

\{\begin{matrix} {b v}_{i_{x}} \leftarrow {b v}_{i_{x}} + {(1 - F r e s i m) \cdot ρ}^{i t e r} \cdot ({T r a}_{{i m i_i}_{x}} - {T r a}_{{s_i}_{x}}) \\ {b v}_{i_{y}} \leftarrow {b v}_{i_{y}} + {(1 - F r e s i m) \cdot ρ}^{i t e r} \cdot ({T r a}_{{i m i_i}_{y}} {- T r a}_{{s_i}_{y}}) \\ {b v}_{i_{z}} \leftarrow {b v}_{i_{z}} + {(1 - F r e s i m) \cdot ρ}^{i t e r} \cdot ({T r a}_{{i m i_i}_{z}} {- T r a}_{{s_i}_{z}}) \end{matrix}, i = (1, 2 \dots, k)

(8)

In Equation (8),

{b v}_{i_{x}}

,

{b v}_{i_{y}}

, and

{b v}_{i_{z}}

represent coordinate differences between

{T r a}_{s}

and

{T r a}_{i m i}

in the x, y, and z dimensions, respectively.

{T r a}_{{i m i_i}_{x}}

,

{T r a}_{{i m i_i}_{y}}

, and

{T r a}_{{i m i_i}_{z}}

are the x, y, and z coordinates of the ith point in

{T r a}_{i m i}

. To update

G S P

based on the

b v

, interpolation is a necessary method to align the number of elements in

b v

with

G S P

. After interpolation, the

b v

is renamed and reformulated as

I b v = {{I b v}_{1}, {I b v}_{2}, \dots, {I b v}_{s a m p l e}}

. Subsequently, updated

G S P

(

G S P n e w

) is generated through Equation (9), where each coordinate is calculated accordingly:

\{\begin{matrix} {G S P n e w}_{i_{x}} = {G S P}_{i_{x}} + {I b v}_{i_{x}} \\ G {S P n e w}_{i_{y}} = {G S P}_{i_{y}} + {I b v}_{i_{y}} \\ {G S P n e w}_{i_{z}} = {G S P}_{i_{z}} + {I b v}_{i_{z}} \end{matrix}, i = 0, 1, \dots, s a m p l e

(9)

The

G S P n e w

is used to generate updated

{T r a}_{g e n}

via G-PRM.

G S P

is regenerated based on Equations (1) and (2) during each iteration. Subsequently,

F r e s i m (\cdot)

needs to be recalculated based on updated

{T r a}_{g e n}

and

{T r a}_{i m i}

. Additionally, Equations (6)–(9) are applied iteratively to continue updating

{T r a}_{g e n}

and

G S P n e w

. Through this process, the similarity between

{T r a}_{g e n}

and

{T r a}_{i m i}

gradually increases. After reaching the pre-set number of iterations, bias optimisation is completed.

3.3. DMP Modelling and Incremental Learning Updating

3.3.1. Modelling of DMP

In this session, the

y_{d}

is used as an example to model the DMP. This study focuses on discrete DMP represented in the Cartesian space. The Cartesian space representation exhibits advantages over the joint space representation as it allows for trajectory planning without considering the joints and their relative positions [33]. The DMP model is represented below [4,23,24]:

τ^{2} {\ddot{y}}_{d} = α_{y_{d}} (β_{y_{d}} (g - y_{d}) - τ {\dot{y}}_{d}) + K f (ϑ) - K ϑ (g - y_{d 0})

(10)

where

y_{d}

represents the original demonstration;

{\dot{y}}_{d}

and

{\ddot{y}}_{d}

denote the first- and second-order derivatives of the demonstration, respectively;

α_{y_{d}}

and

β_{y_{d}}

(

{α_{y_{d}}, β}_{y_{d}} > 0

) are damping and stiffness factors (normally

α_{y_{d}} = 4 β_{y_{d}}

);

τ

is the timing parameter used to adjust the timestep of the demonstration;

y_{d 0}

and

g

mark the start and target of the demonstration;

K

is a constant that is equal to

α_{y_{d}}

;

ϑ

is the phase variable computed by the canonical system as expressed using Equation (11). It is used to calculate the value of the force item for each timestep;

f (ϑ)

is the force item represented in Equation (12);

ω_{i}

in

f (ϑ)

is calculated by locally weighted regression (LWR) [24].

τ \dot{ϑ} = - α_{ϑ} ϑ

(11)

f (ϑ) = \frac{\sum_{i = 1}^{N} ω_{i} ψ_{i} (ϑ)}{\sum_{i = 1}^{N} ψ_{i} (ϑ)} ϑ, ψ_{i} (ϑ) = \exp (- h_{i} (ϑ - c_{i})^{2})

(12)

In Equations (11) and (12),

α_{ϑ}

is a constant, and it is always set to one;

τ

is the same as that defined in Equation (10);

N

denotes the number of RBF kernels;

ψ_{i} (ϑ)

represents the ith RBF kernel;

c_{i} > 0

and

h_{i} > 0

are the centre and width of the ith RBF kernel.

For computational convenience, Equation (11) is transformed into Equation (13) to calculate

ϑ

:

\begin{matrix} d ϑ = - τ α_{ϑ} ϑ d t \\ ϑ = ϑ + d ϑ \end{matrix}

(13)

where

d t

is the inverse of the number of the demonstration’s timestep points. When designating a new target

g_{n e w}

to replace

g

in Equation (10), each

{\ddot{y}}_{d}

in the generated trajectory, denoted as

{\ddot{y}}_{{g e n}_{l}}

, is calculated by using Equation (10). It is updated with the change in

ϑ

;

l

is a timestep within the generated trajectory.

The next coordinate

y_{{g e n}_{l + 1}}

is calculated by using Equation (14):

{\dot{y}}_{{g e n}_{l + 1}} \leftarrow {\dot{y}}_{{g e n}_{l}} + {τ \ddot{y}}_{{g e n}_{l}} d t, y_{{g e n}_{l + 1}} \leftarrow y_{{g e n}_{l}} + {τ \dot{y}}_{{g e n}_{l}} d t

(14)

3.3.2. Bi-DMP for Incremental Learning Updating

In this section, the structure and mathematical formulation of Bi-DMP are constructed. The core task of Bi-DMP is to construct the incremental force item for DMP. Inspired by Lu et al. [18,19] and Chen et al. [20], the main structure of the BLS designed for DMP is shown in Figure 7.

In Figure 7, the grey dots symbolise the output of the BLS. The hidden layer of the BLS is represented by blue, red, and orange dots. The blue dots denote mapping features, while the red and orange dots correspond to enhancement nodes and additional enhancement nodes, respectively. Therefore, the incremental force item is the hidden layer combined with the weight corresponding to

∆ F_{I n}

. Further explanations of the remaining labels in Figure 7 will be provided in the following sections.

This paper postulates that the force item of the

y_{d}

is

F_{o r i}

.

F_{o r i}

has been computed and parameterised by LWR [24] and is represented in Equation (12). The

{T r a}_{g e n}

should combined with sub-1 to form a complete trajectory (namely

{T r a}_{g e n}

-based complete demonstration) as shown in Figure 4c, namely

y_{d n}

. It is used to calculate the original target force item

F_{t a r g e t}

. Both

F_{o r i}

and

F_{t a r g e t}

are sets containing the force item value for each point in demonstrations. Replacing

y_{d}

with

y_{d n}

in Equation (10), the initial

{∆ F}_{I n}

can be written as Equation (15):

\begin{array}{l} ∆ F_{I n} = F_{t a r g e t} - F_{o r i} \\ = \frac{τ^{2} {\ddot{y}}_{d n} - α_{y_{d n}} (β_{y_{d n}} (g_{y_{d n}} - y_{d n}) - τ {\dot{y}}_{d n})}{K} + ϑ (g_{y_{d n}} - y_{d n 0}) - f (ϑ) \end{array}

(15)

The key components of the hidden layer are

Z_{i}

,

H

, and

E

, respectively.

Z_{i}

refers to the mapping features,

H

represents the enhancement nodes, and

E

indicates the additional enhancement nodes. This paper assumes that

u

is the number of mapping features,

l

denotes the number of enhancement nodes, and

s

indicates the number of additional enhancement nodes, as depicted from Equation (16) to Equation (18):

Z_{i} = δ (φ {(ϑ) W}_{Z_{i}} + β_{Z_{i}}), i = 1, 2, \dots, u

(16)

H = ξ (Ξ (ϑ) W_{H_{l}} + β_{H_{l}})

(17)

E = ξ (Ξ (ϑ) W_{E_{s}} + β_{E_{s}})

(18)

W_{Z_{i}}

is a randomly initialised weight matrix and subsequently fine-tuned via a sparse autoencoder [20]. Both

W_{H_{l}}

and

W_{E_{s}}

are initialised as orthogonal matrixes. The terms

β_{Z_{i}}

,

β_{H_{l}}

, and

β_{E_{s}}

represent biases, while

δ

and

ξ

are activation functions. As suggested in [20],

δ

can be a linear activation function, while

ξ

can be the Tansig function, as detailed in Equation (19):

ξ (\cdot) = \frac{2}{1 + e^{- 2 (\cdot)}} - 1

(19)

To determine the weight

W^{∆ F_{I n}}

corresponding to

∆ F_{I n}

, it is necessary to calculate the basis weight matrix

W^{b}

based on mapping features and enhancement nodes. Equations (20)–(22) represent the computation process:

\begin{array}{l} ∆ F_{I n} = [Z_{1}, \dots, Z_{u}| ξ (Ξ (ϑ) W_{H_{l}} + β_{H_{l}})] W^{b} \\ = [Z| H] W^{b} \end{array}

(20)

W^{b} = {[Z| H]}^{- 1} ∆ F_{I n} = A^{f} ∆ F_{I n}

(21)

A^{f} = \lim_{λ \to 0} {(λ I + A A^{T})}^{- 1} A^{T}

(22)

In Equation (21),

A

denotes matrix

[Z| H]

, where

A^{f}

represents the Moore–Penrose pseudo-inverse matrix of

A

.

A^{T}

is a transpose matrix of

A

, and

λ

is the regularisation parameter. I is an identity matrix. Once the

W^{b}

is obtained, the

E

is incorporated into Equation (20) to improve

W^{b}

for higher accuracy in approximating

∆ F_{I n}

. Then, Equation (20) is rewritten as Equation (23):

\begin{array}{l} ∆ F_{I n} = [Z| H | ξ (Ξ (ϑ) W_{E_{s}} + β_{E_{s}})] W^{∆ F_{I n}} \\ \begin{matrix} = [A| E] W^{∆ F_{I n}} \\ = A n e w W^{∆ F_{I n}} \end{matrix} \end{array}

(23)

where

A n e w = [A| E]

(hidden layer matrix), The advantage of incremental learning is that when

E

is added, there is no need to recalculate

{[A | E]}^{- 1}

, thereby accelerating the solution process and facilitating network expansion [20]. The

{[A | E]}^{- 1}

is defined as

{A n e w}^{f}

.

{A n e w}^{f}

is calculated by Equation (24):

{A n e w}^{f} = [\begin{matrix} A^{f} - D B^{T} \\ B^{T} \end{matrix}]

(24)

The temporary matrix

B

and

D

can be calculated according to Equation (25):

B = \{\begin{matrix} C^{f}, i f C \neq 0 \\ {(I + D^{T} D)}^{- 1} D^{T} A^{f}, i f C = 0 \end{matrix}, D = A^{f} E

(25)

where

C = E - A D

.

C^{f}

represents the Moore–Penrose pseudo-inverse matrix of

C

. Therefore, the

W^{∆ F_{I n}}

can be calculated according to Equation (23) as Equation (25):

W^{∆ F_{I n}} {= A n e w}^{f} ∆ F_{I n} = [\begin{matrix} W^{b} - D B^{T} ∆ F_{I n} \\ B^{T} ∆ F_{I n} \end{matrix}]

(26)

3.3.3. Iterative Update of Force Item under Multi-Demonstrations

Finally, the force item

f (ϑ)

can be rewritten as incremental force item

f_{i n} (ϑ)

. According to Equations (16)–(18),

ϑ

in DMP can modify each element within the BLS. Consequently, to maintain consistency in

f_{i n} (ϑ)

,

A n e w

is reformulated as

A n e w (ϑ) = [Z_{1}, \dots, Z_{u}| ξ (Ξ (ϑ) W_{H_{l}} + β_{H_{l}}) | ξ (Ξ (ϑ) W_{E_{s}} + β_{E_{s}})]

. Equation (27) shows the

f_{i n} (ϑ)

.

\begin{array}{l} f_{i n} (ϑ) = \frac{\sum_{i = 1}^{N} {ϑ ψ}_{i} (ϑ)}{\sum_{i = 1}^{N} ψ_{i} (ϑ)} ω_{i} + A n e w (ϑ) W^{∆ F_{I n}} \\ = φ (ϑ) W + A n e w (ϑ) W^{∆ F_{I n}} \end{array}

(27)

When there are additional demonstrations,

f_{i n} (ϑ)

should be revised to

{f_{i n}}_{P} (ϑ)

, where

P

is the number of demonstrations. Additionally, the

∆ F_{I n}

associated with the new demonstration equals the difference between the current

f_{i n} (ϑ)

and the new demonstration’s

F_{t a r g e t}

. The new difference can be calculated based on Equation (15). Then, the new

W^{∆ F_{I n}}

(namely

W_{p}^{∆ F_{I n}}, p = 1, 2, \dots P

) and new

A n e w (ϑ)

(namely

{A n e w (ϑ)}_{p}, p = 1, 2, \dots P

) are determined according to Equations (20)–(26) based on the new

∆ F_{I n}

. Consequently, the

{f_{i n}}_{P} (ϑ)

can be expressed as Equation (28). Moreover, when

P = 1

, Equation (28) simplifies to Equation (27).

{f_{i n}}_{P} (ϑ) = φ (ϑ) W + \sum_{p = 1}^{P} {A n e w (ϑ)}_{p} W_{p}^{∆ F_{I n}}, (p = 1, 2, \dots P)

(28)

4. Experiments and Case Studies

In simulations, trajectories from the LASA dataset [34] and authors’ handwriting trajectories were adopted to validate the algorithm.

4.1. Extended Demonstration Based on 1D-SEG and G-PRM

Handwriting A-shaped trajectories and M/S-shaped trajectories from LASA are selected to validate the 1D-SEG and G-PRM. Initially, 1D-SEG is applied to segment the original demonstrations based on their curvatures, as illustrated from Figure 8a,b and Figure 9a. Subsequently,

{T r a}_{i m i}

is calculated according to 3.1, depicted by the purple dashed curves in Figure 9b. Following the algorithm in 3.2, G-PRM generates

{T r a}_{g e n}

(yellow curves in Figure 9b). More specifically, the red and green dots in Figure 9b represent the segmentation and new target points of

{T r a}_{g e n}

, respectively. The blue dots are the target points of the original demonstrations.

4.2. Modelling of DMP

In this subsection, three trajectories are selected to construct DMP. As illustrated in Figure 10, the blue curve trajectories represent demonstrations, whereas the red dashed curve trajectories are the outcomes of DMP. The force item of the DMP will serve as a benchmark for the following incremental learning processes.

4.3. Bi-DMP for Extended Demonstration

Figure 11 intuitively expresses the geometric differences between original and extended demonstrations and the learning performance of Bi-DMP. The red dashed curves in Figure 11 represent the trajectories generated via Bi-DMP.

4.4. Case Study 1—Pick-and-Place-Based Bi-DMP

As the most common industrial step, pick-and-place is always used as a benchmark trajectory to validate algorithms. In this part of the study, operators of different ages and genders were selected to perform pick-and-place experiments to collect their demonstration trajectories. Specifically, the operators were asked to use a gripper to catch an object, drag the robot to move, and place the object into an orange box. Figure 12 shows the details:

Figure 12a shows the experimental platform, while Figure 12b presents the demonstrations from operators. These demonstrations were collected via the TM collaborative robot, and the trajectories are shown in Figure 13a. It can be observed that there are discrepancies in the demonstration trajectories when performing the same task, due to factors such as gender, height, and arm span [35,36]. These factors should not be ignored, as operators prefer to work with a robot that matches their habits and physical characteristics in HRC. For example, the trajectory demonstrated by a person of short stature may collide with the body of a person of tall stature. Therefore, using only one DMP to generalise this pick-and-place task is not appropriate. Bi-DMP, on the other hand, can model and retain the differences in all demonstrations.

In the experiment, the red curve trajectory is chosen as a benchmark demonstration for the construction of DMP, as shown in Figure 13b. The purple dashed curve in Figure 13c is the path generated by the DMP. In order to preserve the discrepancies between different operators, Bi-DMP is used to learn the two new trajectories based on the constructed DMP as shown in Figure 13d. The orange and blue dashed curves are generated by Bi-DMP.

According to the above results, Bi-DMP can effectively extend the force item of DMP. Starting from the constructed DMP, the force item is continuously updated so that the differences in the demonstrations can be fully learned.

4.5. Case Study 2—Unscrewing Nuts and Battery Cell Disassembly

4.5.1. Experiment Platform and Problem Descriptions

Unscrewing nuts and removing battery cells are critical steps in EV battery disassembly applications. The experimental platform is shown in Figure 14a. A multi-purpose adaptor and a vacuum chuck are designed and employed to perform the mentioned tasks, as shown in Figure 14b and Figure 14c, respectively.

Due to the hazardous nature of the EV battery and the constraints of the working space, it is imperative for operators to minimise the duration of the demonstration. This results in nuts closer to the original demonstration being easily disassembled, as illustrated by the red curves in Figure 14d. However, nuts positioned far from the demonstration bring challenges for disassembly due to generalisation issues, as illustrated by the red dashed trajectory in Figure 14d. In the case of battery cell removal, trajectories generated from the Bi-DMP previously used for unscrewing nuts exhibit a deviation due to the lack of specific demonstrations which may cause unforeseen collisions with the working environment. The algorithms proposed in this paper can address these problems.

4.5.2. Design and Use of End-Effectors

Figure 15a,b show the structural design and application of end-effectors for unscrewing nuts and removing battery cells, respectively. The end-effector designed for unscrewing nuts is a multi-purpose adaptor, comprising a series of steel cylinders and a spring system, as depicted in Figure 15a. During the unscrewing process, the steel cylinders adaptively enclose the nuts, allowing for their removal through rotation and friction induced by the multi-purpose adaptor. The end-effector for battery cell removal is a vacuum chuck. It consists of four chucks and pipelines, as shown in Figure 15b. To remove a battery cell, two chucks secure it, as highlighted in the red rectangular box in Figure 15b. Due to safety and payload considerations, only one battery cell should be removed at a time.

4.5.3. Demonstration and Implementation

To ensure the safety of the operator, only the nut at position No. 3 was unscrewed during the demonstration, as shown in Figure 16a. The demonstration adhered strictly to the disassembly rules: moving the multi-purpose adaptor from the start point to the target position (nut), unscrewing it, and then returning to the start place.

The diameter of the multi-purpose adaptor is 25 mm, and the length of the groove is 28.5 mm, as shown in Figure 16c. If the distance between the central area of the nut and the endpoint of the DMP-generated trajectory exceeds the threshold value of 1.75 mm, the unscrewing nuts task is likely to fail due to collisional interference, as highlighted in the red ellipse in Figure 16c. Figure 16b illustrates the generated trajectory via DMP. Positions No. 1 to No. 8 corresponding to the light blue trajectories generated by DMP were able to complete the unscrewing nuts task. In contrast, positions from No. 9 to No. 14 (purple trajectories) failed to complete the task due to excessive errors. Thus, the EV battery is categorised as having a successful area and a failed area. Table 1 shows the error associated with each position.

In this scenario, 1D-SEG and G-RPM are used to segment the demonstration and generate

{T r a}_{g e n}

. The outcomes of 1D-SEG are shown in Figure 17a. Typically, the new target point for

{T r a}_{g e n}

is selected to be at the centre of the failed nut positions, marked by the brown pentagram in Figure 17b. This position provides better coverage of the working area. The yellow trajectory in Figure 17b shows the

{T r a}_{g e n}

-based complete demonstration. Subsequently, Bi-DMP is trained based on the force item of previously constructed DMP. The trajectories generated by Bi-DMP are used to re-execute the unscrewing nuts task in the failed area as shown in Figure 17c. According to the error analysis provided in Table 2, all errors are smaller than the threshold. Therefore, the remaining tasks can be executed successfully. Figure 18 shows the implementation of unscrewing nuts.

Another significant application is the removal of battery cells. However, due to the height difference between the battery cells and nuts, the shape of the generated trajectories shows deviation, as shown in the purple trajectories in Figure 19a. To address the issue, the Bi-DMP previously used for unscrewing nuts needs to be updated. The 1D-SEG and G-PRM are adopted to generate new

{T r a}_{g e n}

for battery cell removal based on the yellow trajectory as shown in Figure 17b. The red trajectory in Figure 19c presents the new

{T r a}_{g e n}

-based complete demonstration for removing the battery cell, and the new target point for this demonstration is positioned at the geometric centre of B6, as illustrated in Figure 19b. The light blue trajectories are generated by Bi-DMP for battery cell removal.

Figure 20 shows the implementation of the battery cell removal process. In summary, the above case studies address two critical tasks in EV battery disassembly. It can be concluded that the algorithms proposed in this paper can meet the requirements of these applications.

5. Discussion

5.1. Time Complexity Analysis

Time complexity can be used to measure the complexity of an algorithm and serves as a reference when computational resources are limited or real-time performance is required. Compared to trajectory segmentation based on neural networks, the discrete curvature-based method offers a faster computation speed because it does not require the collection of large amounts of data for training. At the same time, the Gaussian distribution, as the most widespread form of data distribution in nature, is used in the roadmap construction process in PRM. The fundamental algorithms involved in this paper are DMP, PRM, and BLS. Gaussian mixture model (GMM), TP-GMM, and RRT have similar functionalities. Analysis of trajectory segmentation algorithms using a CNN as an example [29]. The

o (\cdot)

is used to denote the time complexity function. According to the studies on time complexity by Liang et al. [37], Curry et al. [38], and Bianchini et al. [39], the time complexity of the aforementioned algorithms is qualitatively expressed in Table 3.

The sorting in Table 3 is based on the time complexity of the algorithms from high to low, with a detailed analysis as follows: the convolution operations and the large number of parameters make the time complexity of the CNN the highest, especially in large-scale datasets and deep learning structures. Table 3 lists the time complexity of only a one-layer CNN. In actual computations, the time complexity of all CNN layers in the network must be summed. The time complexity of GMM is influenced by the number of Gaussian clusters [1], the total number and dimensionality of the samples, and the number of iterations of the EM algorithm. When TP is added, the data dimensionality

D

will increase, making the complexity of TP-GMM slightly higher than that of GMM with the same parameters. The time complexity of the BLS mainly depends on the number of samples, the data dimensionality, and the number of nodes in the BLS. Although the computation of the BLS relies on large-scale matrix operations, its time complexity is lower than that of a CNN. The time complexity of DMP is mainly determined by the number of samples and data dimensionality. The time complexities of PRM and RRT are related to the number of sampling points/knots. Their time complexities are similar, while PRM has an advantage in trajectory generation due to the construction of the roadmap.

5.2. Trajectory Generation Analysis among Different Algorithms

The most prominent LfD methods are the GMM-based and DMP-based algorithms. Table 4 discusses the differences among the four algorithms: GMM, TP-GMM, DMP, and Bi-DMP, during the execution of the unscrewing nuts task depicted in Figure 16. In model construction, the GMM-based algorithm requires a larger number of demonstrations, whereas DMP requires only one demonstration. Therefore, in terms of cost, DMP is the more optimal choice.

In the task of unscrewing nuts, there are differences in the performance of GMM and TP-GMM compared with DMP and Bi-DMP. GMM can generalise the distribution of demonstrations and then subsequently generate trajectories via Gaussian mixture regression (GMR) to perform tasks. However, since GMM cannot adapt to dynamic environments, it can only perform the demonstrated tasks. It means that it can only handle unscrewing a nut from a fixed position. The TP-GMM gains generalisation capabilities by adding frames of reference (frames). However, due to the sensitivity of the frames’ orientation and the inherent uncertainty of the probabilistic model, the success rate of TP-GMM is sub-optimal in high-precision applications. In addition, due to generalisation issues, DMP cannot complete all the tasks of unscrewing nuts. In contrast, DMP enhanced with a BLS can accomplish all the tasks, achieving the highest success rate. Due to differences in programming logic and computer hardware configuration, algorithm running times vary. Overall, GMM-based methods have longer running times than DMP-based methods. This is because the K-means algorithm and the EM algorithm used in GMM computations require multiple iterative operations, especially the EM algorithm. As previously mentioned, GMM and TP-GMM are probabilistic models. Thus, their generated trajectories have uncertainties, with maximum errors and difference fluctuations greater than those of DMP. The difference is the sum of the coordinates of the Euclidean distance between the generated trajectory and the demonstration trajectory for the same target. Particularly for TP-GMM, significant deviations can occur (maximum error and difference will be huge) if the frames are incorrectly specified. Therefore, for the issues discussed in this paper, DMP-based methods are more suitable.

Table 5 shows the differences in trajectory generation between the G-PRM and the classic PRM. As shown in the figures within the table, the blue curve represents the imitated trajectory, the purple dashed line is the trajectory generated by the classic PRM, and the red curve is the trajectory generated by G-PRM. The cyan dots are Gaussian distribution sampling points used to construct the roadmap. The blue dot is the starting point, while the red and purple dots represent the target point and the new target point, respectively. Since the classical PRM lacks imitation capabilities, the generated trajectory is merely a curve connecting the starting point and the new target point, with a similarity to the blue curve of less than 60%. Despite resampling in each iteration, the similarity to the blue curve does not improve. The G-PRM designed in this paper optimises the sampling points to continuously improve the similarity of the generated trajectory (red curve) to the blue trajectory. Its final similarity is 34.04% higher than the trajectory generated by the classical PRM.

The study has some limitations. The incremental learning process in Bi-DMP currently handles one task at a time, which may limit its efficiency in more complex applications requiring simultaneous task learning. Expanding the BLS structure to support multi-task incremental learning could significantly enhance its utility. In addition, the implementation of the method in more complex industrial applications, such as turbine and engine assembly, should be explored to fully understand its potential and limitations.

6. Conclusions

In summary, this paper proposed a systematic framework aimed at improving the application scope of DMP and BLSs. The research is organised into three main contents: (1) segmentation and extended demonstration: as a series of demonstration pre-processing methods, 1D-SEG and G-PRM are designed to segment the original demonstration and to generate

{T r a}_{g e n}

. These methods effectively preserve the geometrical characteristics of the demonstration and simplify the calculation. (2) DMP modelling and incremental learning updating: Bi-DMP is trained based on

{T r a}_{g e n}

along with the force item of constructed DMP. The incremental learning mechanism of Bi-DMP enables the DMP to continually update without training from scratch. (3) EV battery disassembly case study: this study established a decommissioned battery disassembly experimental platform. Unscrewing nuts and battery cell removal have shown that the proposed algorithms effectively support industrial applications.

Author Contributions

Conceptualisation, Y.W. and W.L.; methodology, Y.W.; software, Y.W.; validation, Y.W. and Y.L.; formal analysis, Y.W.; data curation, Y.W.; writing—original draft preparation, Y.W.; writing—review and editing, W.L. and Y.L.; visualisation, Y.W.; supervision, W.L.; project administration, W.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was sponsored by the National Natural Science Foundation of China (Project No. 51975444), the International Cooperative Project of the Ministry of Science and Technology of China (Project No. G2022013009), the Science and Technology Commission of Shanghai Municipality (Project No. 23010503700), the Engineering and Physical Sciences Research Council, UK (Project No. EP/N018524/1), the China Scholarship Council (CSC No. 202106950049), and China Postdoctoral Science Foundation (Project No. 2023M741426).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Part of the research data from https://cs.stanford.edu/people/khansari/download.html#LearningLyapunovFunctions (accessed on 2 June 2024). If more demonstration data are needed, please email the first author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zaatari, S.E.; Wang, Y.; Li, W.; Peng, Y. iTP-LfD: Improved task parametrised learning from demonstration for adaptive path generation of cobot. Robot. Cim-Int. Manuf. 2021, 69, 102109. [Google Scholar] [CrossRef]
Wang, Y.Q.; Hu, Y.D.; Zaatari, S.E.; Li, W.D.; Zhou, Y. Optimised Learning from Demonstrations for Collaborative Robots. Robot. Cim-Int. Manuf. 2021, 71, 102169. [Google Scholar] [CrossRef]
Zhu, J.; Gienger, M.; Kober, J. Learning Task-Parameterized Skills From Few Demonstrations. IEEE Robot. Autom. Lett. 2022, 7, 4063–4070. [Google Scholar] [CrossRef]
Chi, M.; Yao, Y.; Liu, Y.; Zhong, M. Learning, Generalization, and Obstacle Avoidance with Dynamic Movement Primitives and Dynamic Potential Fields. Appl. Sci. 2019, 9, 1535. [Google Scholar] [CrossRef]
Zhai, D.-H.; Xia, Z.; Wu, H.; Xia, Y. A Motion Planning Method for Robots Based on DMPs and Modified Obstacle-Avoiding Algorithm. IEEE Trans. Automat. Sci. Eng. 2022, 20, 2678–2688. [Google Scholar] [CrossRef]
Davchev, T.; Luck, K.S.; Burke, M.; Meier, F.; Schaal, S.; Ramamoorthy, S. Residual Learning from Demonstration: Adapting DMPs for Contact-Rich Manipulation. IEEE Robot. Autom. Lett. 2022, 7, 4488–4495. [Google Scholar] [CrossRef]
Lu, Z.; Wang, N.; Shi, D. DMPs-based skill learning for redundant dual-arm robotic synchronized cooperative manipulation. Complex Intell. Syst. 2022, 8, 2873–2882. [Google Scholar] [CrossRef]
Yang, C.; Chen, C.; He, W.; Cui, R.; Li, Z. Robot Learning System Based on Adaptive Neural Control and Dynamic Movement Primitives. IEEE Trans. Neural Netw. Learning Syst. 2019, 30, 777–787. [Google Scholar] [CrossRef] [PubMed]
Liao, Z.; Jiang, G.; Zhao, F.; Wu, Y.; Yue, Y.; Mei, X. Dynamic Skill Learning from Human Demonstration Based on the Human Arm Stiffness Estimation Model and Riemannian DMP. IEEE/ASME Trans. Mechatron. 2023, 28, 1149–1160. [Google Scholar] [CrossRef]
Arguz, S.H.; Ertugrul, S.; Altun, K. Experimental Evaluation of the Success of Peg-in-Hole Tasks Learned from Demonstration. In Proceedings of the 2022 8th International Conference on Control, Decision and Information Technologies (CoDIT), Istanbul, Turkey, 17–20 May 2022; pp. 861–866. [Google Scholar] [CrossRef]
Peng, J.-W.; Hu, M.-C.; Chu, W.-T. An imitation learning framework for generating multi-modal trajectories from unstructured demonstrations. Neurocomputing 2022, 500, 712–723. [Google Scholar] [CrossRef]
Li, X.; Gao, X.; Zhang, W.; Hao, L. Smooth and collision-free trajectory generation in cluttered environments using cubic B-spline form. Mech. Mach. Theory. 2022, 169, 104606. [Google Scholar] [CrossRef]
Hüppi, M.; Bartolomei, L.; Mascaro, R.; Chli, M. T-PRM: Temporal Probabilistic Roadmap for Path Planning in Dynamic Environments. In Proceedings of the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 23–27 October 2022; pp. 10320–10327. [Google Scholar] [CrossRef]
Saveriano, M.; Abu-Dakka, F.J.; Kramberger, A.; Peternel, L. Dynamic movement primitives in robotics: A tutorial survey. Int. J. Robot. Res. 2023, 42, 1133–1184. [Google Scholar] [CrossRef]
Si, W.; Wang, N.; Yang, C. Composite dynamic movement primitives based on neural networks for human–robot skill transfer. Neural Comput. Appl. 2023, 35, 23283–23293. [Google Scholar] [CrossRef]
Noohian, A.; Raisi, M.; Khodaygan, S. A Framework for Learning Dynamic Movement Primitives with Deep Reinforcement Learning. In Proceedings of the 2022 10th RSI International Conference on Robotics and Mechatronics (ICRoM), Tehran, Islamic Republic of Iran, 15–18 November 2022; pp. 329–334. [Google Scholar] [CrossRef]
Kim, W.; Lee, C.; Kim, H.J. Learning and Generalization of Dynamic Movement Primitives by Hierarchical Deep Reinforcement Learning from Demonstration. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 3117–3123. [Google Scholar] [CrossRef]
Lu, Z.; Wang, N.; Li, M.; Yang, C. Incremental Motor Skill Learning and Generalization From Human Dynamic Reactions Based on Dynamic Movement Primitives and Fuzzy Logic System. IEEE Trans. Fuzzy Syst. 2022, 30, 1506–1515. [Google Scholar] [CrossRef]
Lu, Z.; Wang, N.; Li, Q.; Yang, C. A trajectory and force dual-incremental robot skill learning and generalization framework using improved dynamical movement primitives and adaptive neural network control. Neurocomputing 2023, 521, 146–159. [Google Scholar] [CrossRef]
Chen, C.L.P.; Liu, Z. Broad Learning System: An Effective and Efficient Incremental Learning System without the Need for Deep Architecture. IEEE Trans. Neur. Net. Lear. Syst. 2018, 29, 10–24. [Google Scholar] [CrossRef]
Hu, J.; Xiong, R. Trajectory generation with multi-stage cost functions learned from demonstrations. Robot. Auton. Syst. 2019, 117, 57–67. [Google Scholar] [CrossRef]
Zhou, X.; Wang, X.; Xie, Z.; Li, F.; Gu, X. Online obstacle avoidance path planning and application for arc welding robot. Robot. Cim-Int. Manuf. 2022, 78, 102413. [Google Scholar] [CrossRef]
Park, D.H.; Hoffmann, H.; Pastor, P.; Schaal, S. Movement reproduction and obstacle avoidance with dynamic movement primitives and potential fields. In Proceedings of the 2008 8th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2008), Daejeon, Republic of Korea, 3 December 2008; pp. 91–98. [Google Scholar] [CrossRef]
Ijspeert, A.J.; Nakanishi, J.; Hoffmann, H.; Pastor, P.; Schaal, S. Dynamical Movement Primitives: Learning Attractor Models for Motor Behaviors. Neural Comput. 2013, 25, 328–373. [Google Scholar] [CrossRef]
Teng, T.; Gatti, M.; Poni, S.; Caldwell, D.; Chen, F. Fuzzy dynamical system for robot learning motion skills from human demonstration. Robot Auton. Syst. 2023, 164, 104406. [Google Scholar] [CrossRef]
Ding, G.; Liu, Y.; Zang, X.; Zhang, X.; Liu, G.; Zhao, J. A Task-Learning Strategy for Robotic Assembly Tasks from Human Demonstrations. Sensors 2020, 20, 5505. [Google Scholar] [CrossRef] [PubMed]
Si, W.; Yue, T.; Guan, Y.; Wang, N.; Yang, C. A Novel Robot Skill Learning Framework Based on Bilateral Teleoperation. In Proceedings of the 2022 IEEE 18th International Conference on Automation Science and Engineering (CASE), Mexico City, Mexico, 20–24 August 2022; pp. 758–763. [Google Scholar] [CrossRef]
Iturrate, I.; Roberge, E.; Ostergaard, E.H.; Duchaine, V.; Savarimuthu, T.R. Improving the Generalizability of Robot Assembly Tasks Learned from Demonstration via CNN-based Segmentation. In Proceedings of the 2019 IEEE 15th International Conference on Automation Science and Engineering (CASE), Vancouver, BC, Canada, 22–26 August 2019; pp. 553–560. [Google Scholar] [CrossRef]
Fan, J.; Chen, X.; Liang, X. UAV trajectory planning based on bi-directional APF-RRT* algorithm with goal-biased. Expert Syst. Appl. 2023, 213, 119137. [Google Scholar] [CrossRef]
Weinkauf, T.; Gingold, Y.; Sorkine, O. Topology-based Smoothing of 2D Scalar Fields with C1-Continuity. Comput Graph Forum. 2010, 29, 1221–1230. [Google Scholar] [CrossRef]
Ichter, B.; Schmerling, E.; Lee, T.-W.E.; Faust, A. Learned Critical Probabilistic Roadmaps for Robotic Motion Planning. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–15 August 2020; pp. 9535–9541. [Google Scholar] [CrossRef]
Eiter, T.; Mannila, H. Computing Discrete Fréchet Distance. 1994. Available online: https://www.researchgate.net/profile/Thomas-Eiter-2/publication/228723178_Computing_Discrete_Frechet_Distance/links/5714d93908aebda86c0d1a7b/Computing-Discrete-Frechet-Distance.pdf (accessed on 2 June 2024).
Wang, R.; Wu, Y.; Chan, W.L.; Tee, K.P. Dynamic Movement Primitives Plus: For enhanced reproduction quality and efficient trajectory modification using truncated kernels and Local Biases. In Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, Republic of Korea, 9–14 October 2016; pp. 3765–3771. [Google Scholar] [CrossRef]
Khansari, Billard, TRO 2011, LASA Handwriting Dataset, Stable Estimator of Dynamical Systems (SEDS). 24 March 2015. Available online: https://cs.stanford.edu/people/khansari/download.html#LearningLyapunovFunctions (accessed on 2 June 2024).
Avaei, A.; Van Der Spaa, L.; Peternel, L.; Kober, J. An Incremental Inverse Reinforcement Learning Approach for Motion Planning with Separated Path and Velocity Preferences. Robotics 2023, 12, 61. [Google Scholar] [CrossRef]
Xing, H.; Torabi, A.; Ding, L.; Gao, H.; Li, W.; Mushahwar, V.K.; Tavakoli, M. Human-Robot Collaboration for Heavy Object Manipulation: Kinesthetic Teaching of the Role of Wheeled Mobile Manipulator. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021; pp. 2962–2969. [Google Scholar] [CrossRef]
Liang, Y.C.; Li, W.D.; Lu, X.; Wang, S. Fog computing and convolutional neural network enabled prognosis for machining process optimization. J. Manuf. Syst. 2019, 52, 32–42. [Google Scholar] [CrossRef]
Curry, D.; Dagli, C. Computational complexity measures for multi-objective optimization problems. Procedia Comput. Sci. 2014, 36, 185–191. [Google Scholar] [CrossRef]
Bianchini, M.; Scarselli, F. On the complexity of neural network classifiers: A comparison between shallow and deep architectures. IEEE Trans. Neural Netw. Learn Syst. 2014, 25, 1553–1565. [Google Scholar] [CrossRef]

Figure 1. LfD-based learning for a pick-and-place application.

Figure 2. The framework of the research methodology.

Figure 3. A trajectory with curvature regions.

Figure 4. Demonstration segmentation and extended demonstration. (a) Original demonstration. (b) Demonstration segmentation. (c) Trajectory generation.

Figure 5. Schematic diagram of PRM. (a) Sampling. (b) Roadmap construction. (c) Trajectory generation.

Figure 6. Overall flowchart of the G-PRM.

Figure 7. The structure of BLS for DMP.

Figure 8. The results of 1D-SEG on handwriting demonstration trajectories. (a) Curvatures. (b) Curvature filter via 1D-SEG.

Figure 9. The results of G-PRM on handwriting demonstration trajectories. (a) Segmentation points on demonstrations. (b) G-PRM-based extended demonstrations.

Figure 10. Handwriting demonstration trajectories. (a) DMP-based generated A-shaped trajectory. (b) DMP-based generated M-shaped trajectory. (c) DMP-based generated S-shaped trajectory.

Figure 11. The results of Bi-DMP. (a) Bi-DMP for A-shaped trajectory. (b) Bi-DMP for M-shaped trajectory. (c) Bi-DMP for S-shaped trajectory.

Figure 12. Experimental platform for pick-and-place. (a) Main hardware equipment. (b) Demonstration trajectories’ collection.

Figure 13. Personalised demonstrations for pick-and-place. (a) Demonstration trajectories. (b) Selected trajectory. (c) Construction of a DMP. (d) Bi-DMP results.

Figure 14. Experimental platform for unscrewing nuts and battery cell removal. (a) Experimental platform. (b) Multi-purpose adaptor. (c) Vacuum chuck. (d) Generalisation issues.

Figure 15. The usage of different end-effectors. (a) The usage of the multi-purpose adaptor. (b) The usage of the vacuum chuck.

Figure 16. Measurement of success and failed areas. (a) Demonstration process. (b) Generated trajectory via DMP. (c) The successful and failed areas in unscrewing nuts.

Figure 17. The results of 1D-SEG, G-PRM, and Bi-DMP on unscrewing nut task. (a) The 1D-SEG for segmenting the original demonstration. (b) G-PRM-generated

{T r a}_{g e n}

. (c) Generated trajectory via Bi-DMP.

Figure 17. The results of 1D-SEG, G-PRM, and Bi-DMP on unscrewing nut task. (a) The 1D-SEG for segmenting the original demonstration. (b) G-PRM-generated

{T r a}_{g e n}

. (c) Generated trajectory via Bi-DMP.

Figure 18. Unscrewing nut implementation.

Figure 19. G-PRM and Bi-DMP results for battery cell removal. (a) Generated trajectories via former Bi-DMP. (b) The new target point of

{T r a}_{g e n}

for battery cell removal. (c) Generated trajectories via new Bi-DMP.

Figure 19. G-PRM and Bi-DMP results for battery cell removal. (a) Generated trajectories via former Bi-DMP. (b) The new target point of

{T r a}_{g e n}

for battery cell removal. (c) Generated trajectories via new Bi-DMP.

Figure 20. Battery cell removal implementation.

Table 1. 1. Error between the central area of nut and the target point generated by DMP. 2. Error between the central area of nut and the target point generated by DMP (Continued).

1
Error (mm)	No. 1	No. 2	No. 3	No. 4	No. 5	No. 6	No. 7
	0.404	0.260	0	0.989	1.023	1.069	1.487
2
Error (mm)	No. 8	No. 9	No. 10	No. 11	No. 12	No. 13	No. 14
	1.431	1.751	1.760	1.787	2.760	2.743	2.730

Table 2. Error between the central area of nut and the target point generated by Bi-DMP.

Error (mm)	No. 9	No. 10	No. 11	No. 12	No. 13	No. 14
	1.477	1.248	1.477	1.477	1.248	1.477

Table 3. Time complexity analysis for different algorithms.

Algorithms	Time Complexity	Notes
CNN-based	$o (N \cdot D \cdot F^{2} \cdot C \cdot L)$	$N$ : the number of samples, $D$ : the dimension of input data (feature number), $F$ : the size of convolution kernel, $C$ : the number of convolution kernels, $L$ : the number of layers, $I$ : the number of iterations of expectation–maximisation (EM) algorithm, $K$ : the number of Gaussian clusters, $k$ : the number of sampling knots, $n$ : the number of sampling points, $E$ : the number of BLS nodes
TP-GMM	$o (N \cdot K \cdot D \cdot I)$
GMM	$o (N \cdot K \cdot D \cdot I)$
BLS	$o (N \cdot D \cdot E)$
DMP	$o (N \cdot D)$
RRT	$o (k \cdot l o g (k))$
PRM	$o (n \cdot l o g (n))$

Table 4. Quantitative comparison of various algorithms.

Algorithms	Success Rate	Running Time	Maximum Error (mm)	Difference Range (mm)
GMM	7.14%	0.97s	11.21	132.15–180.41
TP-GMM	14.29%	1.36s	105.34	201.96–970.53
DMP	57.14%	0.16s	2.76	128.65–147.88
Bi-DMP	100%	0.67s	1.48	133.74–165.79

Table 5. Quantitative comparison of PRM-based algorithms.

Algorithms	Changes in Similarity (%)

G-PRM	79.61%	88.41%	92.00%	93.39%
PRM	59.35%	No change	No change	No change

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, Y.; Li, W.; Liang, Y. A Trajectory Optimisation-Based Incremental Learning Strategy for Learning from Demonstration. Appl. Sci. 2024, 14, 4943. https://doi.org/10.3390/app14114943

AMA Style

Wang Y, Li W, Liang Y. A Trajectory Optimisation-Based Incremental Learning Strategy for Learning from Demonstration. Applied Sciences. 2024; 14(11):4943. https://doi.org/10.3390/app14114943

Chicago/Turabian Style

Wang, Yuqi, Weidong Li, and Yuchen Liang. 2024. "A Trajectory Optimisation-Based Incremental Learning Strategy for Learning from Demonstration" Applied Sciences 14, no. 11: 4943. https://doi.org/10.3390/app14114943

APA Style

Wang, Y., Li, W., & Liang, Y. (2024). A Trajectory Optimisation-Based Incremental Learning Strategy for Learning from Demonstration. Applied Sciences, 14(11), 4943. https://doi.org/10.3390/app14114943

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Trajectory Optimisation-Based Incremental Learning Strategy for Learning from Demonstration

Abstract

1. Introduction

2. Review of Related Work

2.1. Trajectory Generation

2.2. DMP-Based Optimisation

3. Research Methodology

3.1. Segmentation of Demonstration

3.2. Extended Demonstration via G-PRM

3.2.1. Generation of Sampling Points

3.2.2. Bias Optimisation

3.3. DMP Modelling and Incremental Learning Updating

3.3.1. Modelling of DMP

3.3.2. Bi-DMP for Incremental Learning Updating

3.3.3. Iterative Update of Force Item under Multi-Demonstrations

4. Experiments and Case Studies

4.1. Extended Demonstration Based on 1D-SEG and G-PRM

4.2. Modelling of DMP

4.3. Bi-DMP for Extended Demonstration

4.4. Case Study 1—Pick-and-Place-Based Bi-DMP

4.5. Case Study 2—Unscrewing Nuts and Battery Cell Disassembly

4.5.1. Experiment Platform and Problem Descriptions

4.5.2. Design and Use of End-Effectors

4.5.3. Demonstration and Implementation

5. Discussion

5.1. Time Complexity Analysis

5.2. Trajectory Generation Analysis among Different Algorithms

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI