Contact State Recognition for Dual Peg-in-Hole Assembly of Tightly Coupled Dual Manipulator

Zhang, Jiawei; Bai, Chengchao; Guo, Jifeng; Cheng, Zhengai; Chen, Ying

doi:10.3390/electronics13183785

Open AccessArticle

Contact State Recognition for Dual Peg-in-Hole Assembly of Tightly Coupled Dual Manipulator

by

Jiawei Zhang

¹,

Chengchao Bai

^1,*,

Jifeng Guo

¹,

Zhengai Cheng

² and

Ying Chen

²

¹

School of Astronautics, Harbin Institute of Technology, Harbin 150001, China

²

Qian Xuesen Laboratory of Space Technology, China Academy of Space Technology, Beijing 100094, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(18), 3785; https://doi.org/10.3390/electronics13183785

Submission received: 21 August 2024 / Revised: 16 September 2024 / Accepted: 19 September 2024 / Published: 23 September 2024

(This article belongs to the Special Issue Selected Papers for the 2024 4th International Conference on Autonomous Unmanned Systems (4th ICAUS 2024))

Download

Browse Figures

Versions Notes

Abstract

:

Contact state recognition is a critical technology for enhancing the robustness of robotic assembly tasks. There have been many studies on contact state recognition for single-manipulator, single peg-in-hole assembly tasks. However, as the number of pegs and holes increases, the contact state becomes significantly more complex. Additionally, when a tightly coupled multi-manipulator is required, the estimation errors in the contact forces between pegs and holes make contact state recognition challenging. The current state recognition methods have not been tested in such tasks. This paper tested Support Vector Machine (SVM) and several neural network models on these tasks and analyzed the recognition accuracy, precision, recall, and F1 score. An ablation experiment was carried out to test the contributions of force, image, and position to the recognition performance. The experimental results show that SVM has better performance than the neural network models. However, when the size of the dataset is limited, SVM still faces generalization issues. By applying heuristic action, this paper proposes a two-stage recognition strategy that can improve the recognition success rate of the SVM.

Keywords:

state recognition; Support Vector Machines; neural network; peg-in-hole

1. Introduction

Peg-in-hole assembly is one of the basic operations in robotic automatic assembly tasks, which is involved in tasks ranging from on-orbit service [1] to electronic device assembly [2]. Depending on the number of pegs and holes and the number of manipulators, peg-in-hole assembly tasks can be categorized into the following: single peg-in-hole assembly of a single manipulator, multiple peg-in-hole assembly of a single manipulator, single peg-in-hole assembly of multiple manipulators, and multiple peg-in-hole assembly of multiple manipulators (MMPiH) [3]. The MMPiH tasks are primarily used for simultaneously aligning multiple pegs and holes of large-sized parts. This paper focuses on the contact state recognition problem in MMPiH tasks. The MMPiH tasks are shown in Figure 1.

Peg-in-hole assembly can generally be divided into two stages: hole searching and insertion. In the hole-searching stage, it is necessary to reduce positioning uncertainties to align the peg and the hole. The insertion stage occurs after the peg and hole are aligned. This paper primarily focuses on contact state recognition during the hole-searching stage. Currently, many methods are available for recognizing contact states during the assembly process, including contact state recognition methods, which are based on an analytical model, and contact state recognition methods, which are based on a statistical model. The analytical models are susceptible to uncertainties in the assembly process and become very complex as the number of pegs and holes increases. On the other hand, the statistical models are influenced by the diversity of the samples in the dataset, which easily become overfitted when the dataset is small.

Compared to single peg-in-hole assembly, dual peg-in-hole assembly has more complex contact states, and there is similar sensor information between different contact states. Additionally, when using tightly coupled dual manipulators for dual peg-in-hole assembly, the contact forces between the pegs and holes cannot be directly measured and must be estimated based on the force sensor information at the end of the two manipulators. Due to errors in the gripping position of the end effectors, the estimated contact forces also have errors. The above factors make the state recognition problem for dual-manipulator dual peg-in-hole assembly more complex. The existing state recognition methods are typically validated on single peg-in-hole assembly tasks of a single manipulator. In conclusion, there are two contributions of this paper. Firstly, we tested the recognition performance of SVM and several neural network models for dual-manipulator, dual peg-in-hole assembly tasks. Secondly, to improve the robustness of the SVM model, a two-stage recognition strategy combined with heuristic action is proposed.

2. Related Works

The basic idea with contact state recognition is to determine the current contact state based on the information from various sensors. The current contact state recognition methods can be divided into two categories: contact state recognition methods based on an analytical model [4] and contact state recognition methods based on a statistical model [5]. The analytical model requires the analysis and modeling of the contact process, while the statistical model learns to estimate contact states directly from the collected dataset.

2.1. Contact State Recognition Based on an Analytical Model

The analytical model of the peg-in-hole assembly process has been studied for many years. Whitney [4] first proposed a quasi-static model to clarify the relationship between forces and geometric constraints in the peg-in-hole assembly process and analyzed two potential failure contact situations: wedging and jamming. Whitney also introduced the concept of jamming diagrams, which can be used to analyze jamming conditions during the entire assembly process. These jamming diagrams were later extended to three-dimensional scenarios [6]. For multiple peg-in-hole assembly processes, Sathirakul et al. [7] were the first to study the possible equilibrium states in a two-dimensional dual-hole assembly task, providing the geometric conditions and force balance equations for each equilibrium state and developing jamming diagrams for dual-hole assembly tasks. Later, Fei et al. [8] analyzed the three-dimensional multiple peg-in-hole assembly process, and Zhang et al. [9] investigated the flexible dual peg-in-hole assembly process.

On the basis of the analytical model, contact states can be recognized by comparing the similarity between the actual information from the sensors and the information obtained through the analytical model. In order to adapt to the errors in the contact model, Kalman filters [10] and particle filters [11] can be used to estimate model parameters, thereby improving the recognition of contact states and state transitions. The analytical models are easily affected by the uncertainties in the assembly process. As the number of pegs and holes increases, the analytical models become increasingly complex and difficult to generalize to new assembly scenarios. In recent years, contact state recognition methods based on statistical models have garnered attention.

2.2. Contact State Recognition Based on a Statistical Model

Contact state recognition with a statistical model typically frames the contact state recognition process as a classification problem for a given set of possible contact states. The methods used for contact state classification include: Fuzzy Classifiers (FCs), Neural Networks (NNs), Support Vector Machines (SVMs), Gaussian Mixture Models (GMMs), and Hidden Markov Models (HMMs).

By establishing membership functions and rules, FCs can solve simple classification problems and have the advantage of short computation times [12,13]. However, it takes a lot of time to establish membership function and fuzzy rules, making them difficult to quickly deploy for complex classification problems. NNs can be used to directly fit the nonlinear relationship between the input sensor information and the output contact states [14], eliminating the need for manual feature extraction. However, NNs typically require a large dataset and are prone to overfitting. Additionally, numerous studies have explored combining fuzzy set theory with NNs [15,16].

The idea with SVM is to find a classification hyperplane by maximizing the interval in the feature space. Jakovljevic et al. [17] used a quasi-static contact model to generate the contact data, and the data were analyzed by Discrete Wavelet Transform (DWT) to extract representative features. Then, SVM was used to generate a Fuzzy Inference Mechanism (FIM) to classify the contact state. Yan et al. [18] used SVM with the nonlinear radial basis kernel function to classify the contact state and used adaptive impedance control to realize a robust key unlock process.

GMMs are a clustering method, so it is not necessary to manually label the dataset; instead, it is necessary to establish a mapping between the clustering results and the contact state. Jasim et al. [19] used the Distribution Similarity Measure (DSM) to determine the optimal number of GMM components, which improved the modeling performance and reduced the computing cost. Lee et al. [20] used two GMMs to separately cluster the joint torque information and the position/velocity information for the end effector; subsequently, a rule-based discriminator was used to integrate the clustering results and estimate the contact state. Compared with the above methods, HMMs can take temporal information into account when classifying contact states, giving them advantages in simultaneously identifying contact states and state transitions [21,22].

In summary, statistical-model-based contact state recognition methods can learn recognition models directly from contact data without modeling the complex assembly process, making them better suited for handling uncertainties in the assembly process. However, the performance of these methods is affected by the diversity of the dataset. This paper proposes a two-stage recognition process to improve generalization with a small dataset.

3. Methods

3.1. Introduction to the Problem

During the hole-searching stage of single peg-in-hole assembly, the states between the peg and the hole can be categorized into four categories: aligned, contact, gapped, and separated. The gapped state refers to the peg tip being above the hole surface, and the shaft and the hole are not in contact. The separated state indicates the peg tip is below the hole surface and the shaft and the hole are not in contact. In contrast, the hole-searching stage of dual peg-in-hole assembly involves 16 possible states, as illustrated in Figure 2, which can be grouped into four categories: success states, failure states, search states, and transition states. Here,

S_{A_{1} A_{2}}

represents the success states;

S_{S_{1} A_{2}}

,

S_{S_{1} C_{2}}

,

S_{A_{1} S_{2}}

,

S_{C_{1} S_{2}}

,

S_{S_{1} S_{2}}

represent the failure states;

S_{C_{1} A_{2}}

,

S_{A_{1} C_{2}}

,

S_{C_{1} C_{2}}

,

S_{G_{1} C_{2}}

,

S_{C_{1} G_{2}}

represent the search states; and

S_{A_{1} G_{2}}

S_{A_{1} G_{2}}

S_{G_{1} G_{2}}

S_{S_{1} G_{2}}

S_{G_{1} S_{2}}

represent the transition states. The transition states typically convert to other states over time. The goal of this paper is to automatically identify success states and failure states, and then end the assembly process in time.

To improve the recognition performance, this paper uses the following three types of information as inputs for the state recognition model:

Contact force/torque between pegs and holes. This is estimated based on the force/torque sensor installed at the end of the two manipulators.
Position of the end effectors. This is calculated from the joint angles of the manipulators and the forward kinematics model of the manipulators.
Images from hand–eye cameras. In this paper, the images are processed using a positioning neural network as described in [3], and the estimated distances between the pegs and the holes in the image coordinate system are used as the input of the state recognition model.

In MMPiH tasks, different contact states have similar observation information. For instance, the success state

S_{A_{1} A_{2}}

and the transition states

S_{G_{1} A_{2}}

,

S_{A_{1} G_{2}}

,

S_{G_{1} G_{2}}

have similar force/position/image observations, making it difficult to make decisions based on sensor information at a single moment. During state transitions, the transformation of sensor information contains richer features, which can be used to recognize the contact state after the transition. Therefore, the goal of this paper is to detect a specific state transition process based on the sensor information in a period of time rather than to detect the current contact state based on the sensor information at a single moment.

3.2. The Two-Stage Recognition Process

In order to improve the recognition performance, this paper proposes a two-stage recognition method, as shown in Figure 3. The first stage of the recognition model takes the contact force between the pegs and the holes, the position of the manipulators’ end effectors, and the image features as input. This stage is used to determine whether the pegs and holes are in a success state, a failure state, or another state.

The contact force between the pegs and holes at time t is denoted by

h_{c}^{t}

, which needs to be calculated based on the information of the force/torque sensors at the end of each manipulator

h_{c}^{t} = G h_{e}^{t} - [\begin{array}{l} m g \\ 0_{3} \end{array}]

(1)

where m is the mass of the object.

h_{e}^{t} = {[{(h_{e 1}^{t})}^{T}, {(h_{e 2}^{t})}^{T}]}^{T} \in R^{6 \times 2}

represents the external force and external torque applied to two end effectors. Here,

h_{e i} = [f_{e i}^{T}, μ_{e i}^{T}]^{T}

denotes the external force and external torque applied to the end effector of manipulator i, which can be measured by force/torque sensors, and

G

is the grasp matrix:

G = [\begin{array}{l} I_{3} & 0_{3 \times 3} & I_{3} & 0_{3 \times 3} \\ r_{1} \times & I_{3} & r_{2} \times & I_{3} \end{array}]

(2)

The position of the manipulator’s end effector at time t is denoted as

p_{e_{i}}^{t}

, l is used to denote the unit vector of the axis of the hole, and the position of the manipulator’s end effector in the direction of the hole axis is denoted as

z_{i}^{t} = p_{e_{i}}^{t} \cdot l

. A schematic of the notation is shown in Figure 4.

h_{c}^{t_{d}} = {[{(h_{c}^{t - t_{d}})}^{T}, {(h_{c}^{t - t_{d} + Δ t})}^{T}, \dots, {(h_{c}^{t})}^{T}]}^{T}

represents all contact forces in time

t_{d}

before the current time t,

z_{i}^{t_{d}} = [0, z_{i}^{t - t_{d} + Δ t} - z_{i}^{t - t_{d}}, \dots z_{i}^{t} - z_{i}^{t - t_{d}}]

represents the displacement of the end effector of the manipulator i in time

t_{d}

, and

Δ t

is the sampling time interval. We use the positioning neural network in [3] to estimate the position of the pegs and the holes in images;

p_{p_{i}}^{I}

and

p_{h_{i}}^{I}

denote the position of the i-th peg and the i-th hole in the image coordinate system, respectively. Let

d_{i}^{I} = ‖p_{p_{i}}^{I} - p_{h_{i}}^{I}‖

denote the distance between the peg and the hole in the image coordinate system at time t.

We take

h_{c}^{t_{d}}

,

z_{1}^{t_{d}}

,

z_{2}^{t_{d}}

,

d_{1}^{I}

, and

d_{2}^{I}

together as the input of the first state recognition model. If the result of recognition is a failure state, the task is terminated. If the result of recognition is a success state, the heuristic action is started. The heuristic action in this makes the pegs carry out an insertion movement along l. The displacement,

Δ z = [Δ z_{1}, Δ z_{2}]

, of the end effectors in the l direction before and after the heuristic action is used as the input of the second state recognition model to determine whether the assembly is successful. If the task is judged to be successful, the task is terminated; otherwise, the hole search process is continued.

3.3. Recognition Model

3.3.1. The Recognition Model for the First Stage

For the recognition model of the first stage, this paper compares four models, which are as follows: SVM, long short-term memory (LSTM) neural network, multi-layer perceptron (MLP), and 1D convolutional (1D-Conv) neural network. The structure of the four models is shown in Figure 5, which are introduced separately in the following.

Support Vector Machine (SVM)

SVM is a classical binary classification model that try to partition feature spaces by finding a surface with maximum margin. Figure 5a shows a diagram of SVM in two-dimensional scenarios. The solid line in the figure represents the surface that distinguishes the two classes of samples, and the distance between the dashed lines on either side of the solid line is called the margin. The input of SVM is a feature vector

x

, and the output is the binary result

y \in \{+ 1, - 1\}

. The feature vector

x = flat (h_{c}^{t_{d}}, z_{1}^{t_{d}}, z_{2}^{t_{d}}, d_{1}^{I}, d_{2}^{I}) \in ℝ^{402}

in this paper is the combination of

h_{c}^{t_{d}}

,

z_{1}^{t_{d}}

,

z_{2}^{t_{d}}

,

d_{1}^{I}

, and

d_{2}^{I}

. The function of

flat (\cdot)

is to concatenate all inputs into 1-dimensional vectors. The dataset used for training is denoted as

T = \{(x_{1}, y_{1}), (x_{2}, y_{2}), \dots, (x_{N}, y_{N})\}

. The classification model of SVM can be expressed as follows:

\{\begin{cases} f (x) = sign (\sum_{i = 1}^{N} α_{i}^{*} y_{i} K (x, x_{i}) + b^{*}) \\ b^{*} = y_{j} - \sum_{i = 1}^{N} α_{i}^{*} y_{i} K (x_{i}, x_{j}), 0 \leq α_{j} \leq C \end{cases}

(3)

where

K (x_{i}, x_{j})

is a positive definite kernel function, and

α^{*} = (α_{1}^{*}, α_{2}^{*}, \dots, α_{N}^{*})^{T}

is the solution of the following convex quadratic programming problem:

\begin{matrix} \min_{α} & \frac{1}{2} \sum_{i = 1}^{N} \sum_{j = 1}^{N} α_{i} α_{j} y_{i} y_{j} K (x_{i}, x_{j}) - \sum_{i = 1}^{N} α_{i} \\ s . t . & \sum_{i = 1}^{N} α_{i} y_{i} = 0 \\ 0 \leq α_{i} \leq C, i = 1, 2, \dots, N \end{matrix}

(4)

In this paper, a quintic polynomial is used as the kernel function of SVM, and the penalty parameter C is set to 0.8. The model in the first stage needs to solve three-classifying problems, so we adopt a one-versus-rest method to combine the binary classifiers.

2.: Multi-Layer Perceptron (MLP)

The input of the MLP model is the same as that of SVM; the MLP model can be expressed as follows:

y = Softmax (f_{θ} (x))

(5)

f_{θ} (x)

is a multi-layer neural network, and

θ

represents the parameters of the neural network. In this paper, the number of layers is 3, and the numbers of nodes are 256, 6, and 3. The activation function is the Relu function.

3.: Long Short-Term Memory (LSTM) Neural Network

The information used in this paper includes both temporal and non-temporal information. The contact force/torque and the position of the end effectors are temporal information, while the distance information between the pegs and holes is non-temporal information. LSTM is a commonly used model for processing temporal information, and it is used in this paper to encode contact force and end-effector position information. The classification model can be expressed as follows:

\{\begin{cases} v_{1} = f_{θ_{1}} (h_{c}^{t_{d}}, z_{1}^{t_{d}}, z_{2}^{t_{d}}) \\ v_{2} = f_{θ_{2}} (d_{1}^{I}, d_{2}^{I}) \\ y = Softmax (f_{θ_{3}} (v_{1}, v_{2})) \end{cases}

(6)

where

f_{θ_{1}} (\cdot)

represents the LSTM neural network to encode temporal data,

θ_{1}

is the parameter of the LSTM neural network, the number of layers of the LSTM neural network is 1, and the number of nodes is 256.

f_{θ_{2}} (\cdot)

represents the fully connected neural network used to encode the distance data,

θ_{2}

is the parameter of the neural network, the number of layers of the neural network is 1, and the number of nodes is 64.

f_{θ_{3}} (\cdot)

represents the neural network integrating temporal features and distance features,

θ_{3}

is the parameter of the neural network, the number of layers of the neural network is 2, and the numbers of nodes are 256 and 3. The Softmax function restricts the values of the output vector

y \in ℝ^{3}

to the interval

[0, 1]

. The sum of the elements in

y

is 1, and

y

can be seen as a probability distribution.

4.: One-Dimensional Convolutional (1D-Conv) Neural Networks

By moving the convolution kernel of the convolutional neural network along the time dimension, the convolutional neural network can be used to process temporal data. We use this kind of 1D-Conv neural network to encode the contact force/torque and the end effector’s position. The structure of the classification model is the same as that of the LSTM model, replacing

f_{θ_{1}} (\cdot)

with a 1D-Conv neural network. In this paper, the length of the convolution kernel is 5, the convolution step is 2, and the number of output channels is 3.

3.3.2. The Recognition Model for the Second Stage

The input of the classification model in the second stage is

z^{Δ t} \in ℝ^{2}

. Since the problem in the second stage is relatively simple, only the SVM is used for classification. Similarly, a quintic polynomial is used as the kernel function, and the penalty parameter C is set to 0.8.

3.4. Dataset Generation

For the recognition model in the first stage, we collected 8 contact states by manual teaching. The two manipulators used in this paper were Jaco2 (Kinova Robotics, Montreal, QC, Canada). The contact force/torque, the position of the end effectors, and the images were recorded at a frequency of 10 Hz. We used a sliding window to extract the data fragments containing the state transition from the data continuously recorded during the teaching process. The length of the sliding window was 5 s (contained the sensor data collected 50 times), and the sliding step was 0.25 s. Each teaching process can obtain about 9 fragments containing the state transition process. For fragments that did not include state transitions, we randomly sampled data from these fragments to represent other states. Examples of data fragments for each contact state are shown in Figure 6. The size of the fragments was as follows:

h_{c}^{t_{d}} \in ℝ^{6 \times 50}

,

z_{1}^{t_{d}} \in ℝ^{50}

,

z_{2}^{t_{d}} \in ℝ^{50}

. The number of fragments for each state in the dataset is shown in Table 1. A total of 80% of the collected fragments was used as the training set, and the remaining data were used as the test set.

In the experiment, we found that the recognition model in the first stage was often misclassified when the contact states were

S_{A_{1} A_{2}}

,

S_{C_{1} A_{2}}

,

S_{A_{1} C_{2}}

, and

S_{C_{1} C_{2}}

. Therefore, we applied 15 heuristic actions in these four contact states and recorded the displacement of each peg for training the recognition model in the second stage.

4. Results

4.1. Training Parameters and Evaluation Metrics

The learning step size of the LSTM neural network, MLP, and 1D-Conv was 0.0001; each model was trained for 20 epochs; and the batch size was 20. Each model was trained five times.

For binary classification problems, the results can be divided into four categories. TP (True Positive): correctly predicting the positive class; FP (False Positive): incorrectly predicting the positive class; TN (True Negative): correctly predicting the negative class; FN (False Negative): incorrectly predicting the negative class. For the classification model in the first stage, we compared four metrics on the test dataset: accuracy, precision, recall, and F1 score. Accuracy is the proportion of correctly classified samples out of the total number of samples:

A = (TP + TN) / (TP + FP + TN + FN)

; precision is the proportion of true positive samples out of the samples predicted as positive:

P = TP / (TP + FP)

; recall is the proportion of true positive samples out of the actual positive samples:

R = TP / (TP + FN)

; and F1 score considers both precision and recall, and is calculated as

F 1 = 2 PR / (P + R)

. Since the recognition task in the first stage is a multi-classification problem, we adopted the weighted average method to consider the imbalance in the number of samples in each class:

M_{W} = \sum_{i = 1}^{N} \frac{M_{i} \times S_{i}}{\sum_{i = 1}^{N} S_{i}}

(7)

where N represents the number of categories,

S_{i}

is the number of fragments of the i-th category,

M_{i}

is the metric to be calculated, and

M_{W}

is the metric after the weighted average.

4.2. Experimental Results

To evaluate the influence of three types of sensor information on the recognition results, we conducted ablation experiments testing four cases: 1. with force, image, and position; 2. without image; 3. without position; 4. without image and position. The average results of five trainings are shown in Figure 7. From the results, it can be seen that SVM performed the best, followed by LSTM, with the MLP and Conv models performing the worst. In the SVM and Conv models, the position information had a smaller effect, and the evaluation index only decreased slightly after removing the position information. However, in the MLP and LSTM models, the position information had a more significant influence. Image information played a key role in all models; removing the image information led to a noticeable decline in the metrics.

Considering the differences between the data in the training dataset collected via a manual teaching process and the real assembly process, coupled with the small size of the training dataset, there may have been an overfitting issue. To test the generalization ability of the models and the effectiveness of the two-stage recognition method proposed in this paper, we conducted 20 successful dual-hole assembly processes using the automatic assembly algorithm proposed in [3]. The experiments were performed using two manipulators, as shown in Figure 1. Each manipulator had a force/torque sensor and a monocular hand–eye camera installed at the end effector. We conducted experiments on two different holes, as shown in Figure 8. The inner diameters of the holes were 19.6 mm and 20.2 mm, and the diameters of the pegs were 19 mm and 20 mm. We randomly selected the initial positions of the two pegs at a height of 1 cm from the hole surface and within 3 cm from the hole center.

During the assembly process, we tested the number of times each model correctly recognized the success state. The results are shown in Table 2. The experimental results indicate that although all four models achieved good performance on the test set, their performance significantly decreased during the actual automatic assembly process. This is because the data distribution of the teaching process was inconsistent with that of the real automatic assembly process. The performance of the SVM model in this paper is better than that of the neural network; we think this is due to the properties of the feature space. As can be seen from the curve in Figure 6, when the state changes, the force information changes. There is a relatively obvious boundary between the different states in the feature space, and the SVM model is good at handling such problems. It can be seen from Figure 7 that the position information has little influence on the recognition results of the SVM model. Because the SVM model can only use part of the dimensions of the feature space for recognition, it naturally avoids the influence of interfering features on the recognition results. Compared with the SVM model, the neural network models have more parameters and stronger nonlinearity and are more likely to overfit and be affected by interfering features when the dataset is small. The SVM model has the best generalization ability among the four types of models, but the success rate is still low when only the first stage of the model is used. After adding the second stage recognition, the accuracy rate is significantly improved.

5. Conclusions

It is challenging to accurately recognize the contact state in MMPiH tasks. In this paper, we tested the performance of SVM and several neural network models in these tasks and conducted ablation experiments to test the contribution of three types of sensor information: force/torque, images, and position. The experimental results show that SVM has better performance than the neural network models when the size of the dataset is limited, but there is still a generalization problem. By imposing heuristic exploration, we proposed a two-stage recognition strategy that can improve the performance. In order to reduce the difficulty, in this paper, the contact states were divided into three categories: success states, failure states, and other states, and the statistical-based model was used to solve the three-class problem. However, how to accurately recognize 16 contact states is an unsolved problem that requires further research.

Author Contributions

J.Z. and C.B. conceived the research; C.B. and J.G. designed the algorithm; All authors contributed to the analysis of the results; J.Z. drafted a first version of the manuscript, C.B., J.G., Z.C. and Y.C. revised the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Civil Aerospace Technology Research Project (D010103).

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

The authors would like to thank the reviewers for the comments and suggestions, and the editor who helped to improve this article significantly.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Li, D.; Zhang, L.; Zhu, W.; Xu, Z.; Tang, Q.; Zhan, W. A survey of space robotic technologies for on-orbit assembly. Space Sci. Technol. 2022, 2022, 9849170. [Google Scholar] [CrossRef]
Chen, R.; Wang, C.; Wei, T.; Liu, C. A Composable Framework for Policy Design, Learning, and Transfer Toward Safe and Efficient Industrial Insertion. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Kyoto, Japan, 23–27 October 2022. [Google Scholar]
Zhang, J.; Bai, C.; Guo, J. Multiple Peg-in-Hole Assembly of Tightly Coupled Multi-manipulator Using Learning-based Visual Servo. arXiv 2024, arXiv:2407.10570. [Google Scholar]
Whitney, D.E. Quasi-static assembly of compliantly supported rigid parts. J. Dyn. Sys. Meas. Control 1982, 104, 65–77. [Google Scholar] [CrossRef]
Jakovljevic, Z.; Petrovic, P.B.; Hodolic, J. Contact states recognition in robotic part mating based on support vector machines. Int. J. Adv. Manuf. Technol. 2012, 59, 377–395. [Google Scholar] [CrossRef]
Xia, Y.; Yin, Y.; Chen, Z. Dynamic analysis for peg-in-hole assembly with contact deformation. Int. J. Adv. Manuf. Technol. 2006, 30, 118–128. [Google Scholar] [CrossRef]
Sathirakul, K.; Sturges, R.H. Jamming conditions for multiple peg-in-hole assemblies. Robotica 1998, 16, 329–345. [Google Scholar] [CrossRef]
Fei, Y.; Zhao, X. An assembly process modeling and analysis for robotic multiple peg-in-hole. J. Intell. Robot. Syst. 2003, 36, 175–189. [Google Scholar] [CrossRef]
Zhang, K.; Xu, J.; Chen, H.; Zhao, J.; Chen, K. Jamming analysis and force control for flexible dual peg-in-hole assembly. IEEE Trans. Ind. Electron. 2019, 6, 1930–1939. [Google Scholar] [CrossRef]
Lefebvre, T.; Bruyninckx, H.; De Schutter, J. Online statistical model recognition and state estimation for autonomous compliant motion. IEEE Trans. Syst. Man Cybern. Part C-Appl. Rev. 2005, 35, 16–29. [Google Scholar] [CrossRef]
Gadeyne, K.; Lefebvre, T.; Bruyninckx, H. Bayesian hybrid model-state estimation applied to simultaneous contact formation recognition and geometrical parameter estimation. Int. J. Robot. Res. 2005, 24, 615–630. [Google Scholar] [CrossRef]
Park, Y.K.; Cho, H.S. A fuzzy rule-based assembly algorithm for precision parts mating. Mechatronics 1993, 3, 433–450. [Google Scholar] [CrossRef]
Lee, H.; Park, J. Contact states estimation algorithm using fuzzy logic in peg-in-hole assembly. In Proceedings of the International Conference on Ubiquitous Robots, Kyoto, Japan, 22–26 June 2020; pp. 355–361. [Google Scholar]
Brignone, L.M.; Howarth, M. A geometrically validated approach to autonomous robotic assembly. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Lausanne, Switzerland, 30 September–4 October 2002. [Google Scholar]
Son, C. Optimal control planning strategies with fuzzy entropy and sensor fusion for robotic part assembly tasks. Int. J. Mach. Tools Manuf. 2002, 42, 1335–1344. [Google Scholar] [CrossRef]
Son, C. A neural/fuzzy optimal process model for robotic part assembly. Int. J. Mach. Tools Manuf. 2001, 41, 1783–1794. [Google Scholar] [CrossRef]
Jakovljevic, Z.; Petrovic, P.B.; Mikovic, V.D.; Pajic, M. Fuzzy inference mechanism for recognition of contact states in intelligent robotic assembly. J. Intell. Manuf. 2014, 25, 571–587. [Google Scholar] [CrossRef]
Yan, C.; Wu, J.; Zhu, Q. Learning-based contact status recognition for peg-in-hole assembly. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Prague, Czech Republic, 27 September–1 October 2021. [Google Scholar]
Jasim, I.F.; Plapper, P.W.; Voos, H. Contact-state modelling in force-controlled robotic peg-in-hole assembly processes of flexible objects using optimised Gaussian mixtures. Proc. Inst. Mech. Eng. Part B-J. Eng. Manuf. 2017, 231, 1448–1463. [Google Scholar] [CrossRef]
Lee, H.; Park, S.; Jang, K.; Park, J. Contact state estimation for peg-in-hole assembly using Gaussian mixture model. IEEE Robot. Autom. Lett. 2022, 7, 3349–3356. [Google Scholar] [CrossRef]
Hovland, G.E.; McCarragher, B.J. Hidden Markov models as a process monitor in robotic assembly. Int. J. Robot. Res. 1998, 17, 153–168. [Google Scholar] [CrossRef]
Lau, H.Y.K. A hidden Markov model-based assembly contact recognition system. Mechatronics 2003, 13, 1001–1023. [Google Scholar] [CrossRef]

Figure 1. The multiple peg-in-hole assembly of multi-manipulator (MMPiH) tasks studied in this paper.

Figure 2. Possible contact states during hole search. (a) Single peg-in-hole assembly. (b) Dual peg-in-hole assembly, where green background represents success states, yellow background represents transition states, blue background represents search states, red background represents failure states.

Figure 3. The flow chart of the two-stage contact state recognition process.

Figure 4. Schematic diagram of dual peg-in-hole assembly of dual manipulator.

Figure 5. Four models for contact state recognition. (a) SVM; (b) MLP neural network; (c) LSTM neural network; (d) 1D-Conv neural network.

Figure 6. Examples of data fragments for 8 contact states. The yellow background area is the observation of any state before the state transition, and the red area is the observation of the state after the state transition. The state names in the subheadings correspond to the red areas. (a) The fragment of

S_{S_{1} S_{2}}

; (b) the fragment of

S_{C_{1} A_{2}}

; (c) the fragment of

S_{S_{1} A_{2}}

; (d) the fragment of

S_{C_{1} S_{2}}

; (e) the fragment of

S_{A_{1} A_{2}}

; (f) the fragment of

S_{A_{1} C_{2}}

; (g) the fragment of

S_{A_{1} S_{2}}

; (h) the fragment of

S_{S_{1} C_{2}}

.

Figure 6. Examples of data fragments for 8 contact states. The yellow background area is the observation of any state before the state transition, and the red area is the observation of the state after the state transition. The state names in the subheadings correspond to the red areas. (a) The fragment of

S_{S_{1} S_{2}}

; (b) the fragment of

S_{C_{1} A_{2}}

; (c) the fragment of

S_{S_{1} A_{2}}

; (d) the fragment of

S_{C_{1} S_{2}}

; (e) the fragment of

S_{A_{1} A_{2}}

; (f) the fragment of

S_{A_{1} C_{2}}

; (g) the fragment of

S_{A_{1} S_{2}}

; (h) the fragment of

S_{S_{1} C_{2}}

.

Figure 7. Test results of evaluation metrics for each recognition model.

Figure 8. Two pairs of peg and hole used for the experiment. (a) Hole1; (b) Hole2.

Table 1. The number of fragments for each state in the dataset.

Contact States	$S_{A_{1} A_{2}}$	$S_{S_{1} A_{2}}$	$S_{S_{1} C_{2}}$	$S_{A_{1} S_{2}}$	$S_{C_{1} S_{2}}$	$S_{S_{1} S_{2}}$	$S_{C_{1} A_{2}}$	$S_{A_{1} C_{2}}$	Others
Number	423	90	90	90	90	90	339	297	599

Table 2. The number of times each model correctly recognized the success state.

	SVM (Two-Stage)	SVM (First Stage)	MLP	LSTM	Conv
Hole1	17/20	13/20	6/20	0/20	0/20
Hole2	16/20	9/20	4/20	0/20	0/20

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, J.; Bai, C.; Guo, J.; Cheng, Z.; Chen, Y. Contact State Recognition for Dual Peg-in-Hole Assembly of Tightly Coupled Dual Manipulator. Electronics 2024, 13, 3785. https://doi.org/10.3390/electronics13183785

AMA Style

Zhang J, Bai C, Guo J, Cheng Z, Chen Y. Contact State Recognition for Dual Peg-in-Hole Assembly of Tightly Coupled Dual Manipulator. Electronics. 2024; 13(18):3785. https://doi.org/10.3390/electronics13183785

Chicago/Turabian Style

Zhang, Jiawei, Chengchao Bai, Jifeng Guo, Zhengai Cheng, and Ying Chen. 2024. "Contact State Recognition for Dual Peg-in-Hole Assembly of Tightly Coupled Dual Manipulator" Electronics 13, no. 18: 3785. https://doi.org/10.3390/electronics13183785

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Contact State Recognition for Dual Peg-in-Hole Assembly of Tightly Coupled Dual Manipulator

Abstract

1. Introduction

2. Related Works

2.1. Contact State Recognition Based on an Analytical Model

2.2. Contact State Recognition Based on a Statistical Model

3. Methods

3.1. Introduction to the Problem

3.2. The Two-Stage Recognition Process

3.3. Recognition Model

3.3.1. The Recognition Model for the First Stage

3.3.2. The Recognition Model for the Second Stage

3.4. Dataset Generation

4. Results

4.1. Training Parameters and Evaluation Metrics

4.2. Experimental Results

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI