6.1. Implementation
As shown in
Figure 10, in the experiment, all the data were collected using the IMU mounted on the top of the charger. The charger was connected with the cable-driven manipulator using the compensator. Due to the very small deformation of the charger after collision, the collected data mainly included the vibration information from the compensator and the cable-driven manipulator. The manipulator was controlled using the PI scheme designed in [
6], which includes two controllers controlling motors and cables, respectively. In the collision simulation experiment, the end effector of the manipulator moved in a straight line at a speed of 16.7 mm/s.
In terms of the quantity of data collection, in order to reduce the impact of repeated positioning errors of the manipulator on the results, in each group, each
normal point was collected 40 times. For the same reason, each
acceptable point and each
vulnerable point were collected 30 times, respectively. Meanwhile, due to the obvious characteristic differences between
contact and
free, as shown in
Figure 6, not too many
free samples were needed. The ratio of
free and
contact samples was set as 1:2 in the experiment. The sample distribution in each group is listed in
Table 2. To explore the influence of different joint configurations on the results, one group was randomly selected as the testing data set, which is never used in a single training process. The remaining three groups were shuffled and divided into the training data set and validation data set in a ratio of 8:2.
To illustrate the effectiveness of the proposed DCNN–SVM algorithm, we compared the results with a long short-term memory (LSTM) model [
55] and the plain CNN in DCNN–SVM. These three methods were essentially automatic feature extraction methods, and the inputs were generated by normalizing the raw signals. For the DCNN–SVM, of which the structure is illustrated in
Figure 9, M was set as 1024, and N was set as equal to the total number of the classes. For maxpooling layers in DCNN–SVM, the stride was set as 2 and the padding method was set as “same”. For the SVM in DCNN–SVM, by grid search, the penalty coefficient was set as 6 and the kernel was set as “rbf”. The learning rate was set as 0.0001, and the optimizer of the extractor part was set as Adam. For the CNN model design, its parameters were set as consistent with those of the DCNN–SVM. Thus, we refer to the plain CNN as DCNN here. For the LSTM model design, the structure is composed of an LSTM layer, three fully connected layers and finally a softmax layer. The LSTM layer was set as the first layer. The number of the hidden units of the LSTM layer was set as 110. The LSTM layer converted the initial input into the high-dimensional output feature matrix. The output of the LSTM layer was flattened and then fed into three fully connected layers whose sizes were 5000-way, 500-way and N-way, respectively. N was equal to the number of the classes. The learning rate was set as 0.0001, and the optimizer was set as Adam. All these three models were trained and tested using the Tensorflow 2.0 library. Other than the above two comparison models, the results of our proposed model were also compared with SVM and k-nearest neighbors (kNN) models, which are both artificial feature extraction methods. The features selected in these two models are similar to those in [
17]. In more detail, the features are listed in
Table 3. In addition, using the grid search method, the penalty coefficient of the SVM here was set as 10. Using the same method, for kNN, the number of neighbors was set as 7, the leaf size was set as 1 and “distance” was chosen as weights. To test these two models, we used the machine learning library from scikit-learn. To clearly describe the hyper-parameters of the mentioned methods, relative settings are listed in
Table 4.
As mentioned in
Section 5.2, the computational complexity of the DCNN model can be expressed as follows:
On the other hand, the computational complexity of LSTM mainly depends on the number of weights per time step and the length of inputs. Given a number of weights
and the length of inputs
, the complexity of the LSTM layer can be expressed as
. Considering the fully connected layers, the complexity of the compared LSTM model can be expressed as
. Furthermore, given the number of training instances
and the dimensionality of training space
, the computational complexity of kNN can be expressed as
[
56].
In this experiment, we used three cross-validation to train and validate models. According to the accuracy of the validation results, we chose the best models in the DCNN–SVM, CNN and LSTM methods. For the SVM and the kNN models, we used the grid search method to obtain the optimal hyper-parameters. For each group, the process was repeated three times, and the results from each testing set were averaged. To explore the influence of on the prediction results, we also created several test sets with various values, which represented the segmented vibration signals with different proportions of . Here, we set 0.0667 s 0.3333 s and the interval was set as 0.0133 s.
6.2. Results and Discussion
To illustrate, we define the situation shown in
Figure 7a and the situation shown in
Figure 7b as Case 1 and Case 2, respectively. The accuracy scores of the testing data sets with different
values are illustrated in
Figure 11. In both cases, the automatic feature extraction methods performed much better than the artificial feature extraction methods, including the SVM and the kNN methods. The maximum accuracy scores of the two artificial feature extraction methods were lower than 85% in Case 1 and lower than 80% in Case 2. In contrast, the minimum accuracy scores of the three feature extraction methods were higher than 90% in Case 1 and higher than 80% in Case 2. Among the three automatic feature extraction methods, the DCNN–SVM worked best with various
values. As shown in
Figure 11a, when
, the accuracy scores of the three automatic feature extraction methods increased rapidly with the increase in the
value. When
, the accuracy scores tended to be stable. As shown in
Figure 11b, when
, the accuracy scores of the three automatic feature extraction methods increased rapidly with the increase in
values. When
, the accuracy scores tended to be stable. In both cases, in the stage of rapid growth in accuracy, the LSTM model performed poorly compared with the DCNN and the DCNN–SVM models, while, in the stage of accuracy tending to be stable, the LSTM model showed a similar effect to the DCNN and the DCNN–SVM models. To some extent, this means that the feature extractor that does not pay too much attention to time features has a better effect when collision information is insufficient. The reason for the above results is that the vulnerable and acceptable domains were divided based on spatial location. In the actual collision process, the contact mode of the two domains in the early stage of the collision was highly similar, so the collision signals of the two domains were also extremely similar in a very short period after collision, as shown in
Figure 6. Therefore, it is difficult to extract sufficient effective features that can distinguish this similarity using the artificial feature extraction method, as corroborated by the results. In contrast, the features extracted by automatic feature extraction methods have higher dimensions, and therefore the possibility of extracting effective features is greater. However, it should be noted that at the early stage of collision, vibration signals have a high variation frequency, and sufficient time-related information may be not able to be obtained at the current sampling frequency. Thus, excessive attention to time-related features may make the model more sensitive to the specific features of some samples, resulting in a decrease in the generalization ability of the model. This is also related to the poorer performance of LSTM compared with the other two models when there is less collision information. Based on this point, using DCNN alone is also an alternative choice for this CLC problem.
The accuracy scores in different groups achieved by the DCNN–SVM model are shown in
Figure 12. Here, we choose the situation in which
and
. As shown in
Figure 12a,b, there were significant differences in the accuracy of different groups. This means that the joint configuration has an impact on our proposed CLC method. Furthermore, calculating the standard deviation of the accuracy scores of different groups in both cases, we could deduce that when
, the standard deviations were 0.91% and 1.25%, respectively, in Cases 1 and 2, and when
, the standard deviations were 1.3% and 1.42%, respectively, in Cases 1 and 2. This indicates that our method is robust, to some extent, against the influence of different joint configurations. Moreover, the smaller the
was, the smaller the standard deviation was, which reveals that sufficient collision information reduces the influence of the joint configuration on the prediction results of the DCNN–SVM model. When
was the same, the standard deviations were also different in these two cases. This means that the robustness of the DCNN–SVM model on joint configurations varies in different CLC tasks. Furthermore, we observed that the model’s performance in Case 1 was better than that in Case 2. This may be because, when the collision occurred on the collision point in the same radial direction, the direction of the resultant torque of the elastic compensator was more similar. In contrast, the collision occurred on the collision point in the same circumferential direction, but could not contribute the same property to the collision information. From the results, the situation in which the model had a better CLC effect in the circumferential direction was Case 1, which is consistent with the actual physical situation.
Hereinbefore, the prediction results of the collision localization and the collision classification have been discussed at the same time. Here, we conducted a more specific analysis of the collision classification and the collision localization, respectively. We chose the DCNN–SVM model with the best effect compared with other models as the analysis object. The results above were averaged, and in the process, three models needed to be trained to obtain each result. To illustrate, we chose one model from the three. The confusion matrices of the collision localization and the collision classification in Cases 1 and 2 are shown in
Figure 13a,b, respectively. In
Figure 13a, as illustrated in
Section 4, C1 and C2 represent the situation when contact occurs in the acceptable domain, C3-C10 represent the situation when collision occurs in the vulnerable domain, and C0 represents the situation when no contact occurs. In
Figure 13b, R1 and R2 represent the situation when contact occurs in the acceptable domain, R3-R8 represent the situation when collision occurs in the vulnerable domain, and R0 represents the situation when no contact occurs. The prediction precisions of the DCNN–SVM model for the three situations in both cases are listed in
Table 5. The precisions of
free and
vulnerable were higher than 99% in both cases. Low misjudgment rates of
free and
vulnerable can reduce the possibility of misstopping the manipulator. In contrast, the prediction precisions of
acceptable were lower, and the minimum precision was as low as 94.94%. From the confusion matrices in
Figure 13, we can also see that the mistake mainly occurred in mispredicting some
vulnerable instances as
acceptable instances. The reason for this result is that our
vulnerable and
acceptable domains were defined based on their geometry position, and the collision modes in these two situations were much more similar at the boundary of these two domains, in contrast to
free and
normal instances. Thus, the likelihood of misjudgment at the boundary was greater. This can also be seen from
Figure 13b, in which more R3 instances were wrongly judged as R2 instances than other instances in the rest of the vulnerable domain. This kind of mistake may cause serious damage to the manipulator. Additionally, how to improve the prediction precision of
acceptable should be the focus of our follow-up research. In collision localization, we neglected the
free instances. The prediction precision in Cases 1 and 2 is, respectively, listed in
Table 6 and
Table 7. In Case 1, the mean precision of each group was higher than or equal to 96.82%, and in Case 2, the mean precision of each group was higher than or equal to 94.12%. This means that, to some extent, our proposed method can effectively deal with the collision localization problems that occur on the end effector. Note that in Case 1, the mean precision of each group was higher than that of the same group in Case 2. This means that the DCNN–SVM model performed better in collision localization along the circumferential direction than along the radial direction. Additionally, this result is consistent with the above overall analysis of CLC.
In order to explore the influence of the kernel size of the convolutional layer on our proposed method, we selected DCNN–SVM with three different convolution kernels to conduct CLC for collision signals with
and
. Our experiments used Windows on the following system: Processor: Intel (R) Core (TM) i7-10700K CPU @ 3.80 GHz, Memory: 31.9 GiB, GPU: NVIDIA GeForce RTX 3080. The results in different cases are listed in
Table 8 and
Table 9, respectively. By comparing the accuracy of CLC with different models, we can see that the model with the
convolution kernel was slightly better than that with the
convolution kernel, while it was significantly better than the model with the
convolution kernel. In terms of run time, to predict a single sample, the time consumed by models with different convolution kernels was similar. The above results indicate that an increase in convolution kernel size is helpful to improve the performance of the model. This may be because the convolution kernel with a large size can fuse vibration signals from more dimensions together in a single sampling, which is conducive to the extraction of more effective features.