Research on the Cable-to-Terminal Connection Recognition Based on the YOLOv8-Pose Estimation Model

Qu, Xu; Long, Yanping; Wang, Xing; Hu, Ge; Tao, Xiongfei

doi:10.3390/app14198595

Open AccessArticle

Research on the Cable-to-Terminal Connection Recognition Based on the YOLOv8-Pose Estimation Model

by

Xu Qu

^1,2,

Yanping Long

^1,2,

Xing Wang

^3,*,

Ge Hu

³ and

Xiongfei Tao

³

¹

State Grid Hunan Extra High Voltage Substation Company, Changsha 410029, China

²

Substation Intelligent Operation and Inspection Laboratory of State Grid Hunan Electric Power Co., Ltd., Changsha 410029, China

³

School of Integrated Circuits, Huazhong University of Science and Technology, Wuhan 430074, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(19), 8595; https://doi.org/10.3390/app14198595

Submission received: 30 July 2024 / Revised: 4 September 2024 / Accepted: 19 September 2024 / Published: 24 September 2024

Download

Browse Figures

Review Reports Versions Notes

Abstract

Featured Application

The research presented in this document focuses on the development and application of a cable-to-terminal connection recognition technology based on pose estimation, specifically utilizing the YOLOv8-pose model. This technology is designed to enhance the efficiency and accuracy of automated inspection systems in substations, which are critical nodes in power transmission and distribution networks. The technology is directly applicable in the routine inspection of substations where it can automate the process of detecting and diagnosing the connection status of cables and terminals. This real-time monitoring capability helps in early fault detection and prevention, thereby ensuring the reliability and safety of the power grid.

Abstract

Substations, as critical nodes for power transmission and distribution, play a pivotal role in ensuring the stability and security of the entire power grid. With the ever-increasing demand for electricity and the growing complexity of grid structures, traditional manual inspection methods for substations can no longer meet the requirements for efficient and safe operation and maintenance. The advent of automated inspection systems has brought revolutionary changes to the power industry. These systems utilize advanced sensor technology, image processing techniques, and artificial intelligence algorithms to achieve real-time monitoring and fault diagnosis of substation equipment. Among these, the recognition of cable-to-terminal connection relationships is a key task for automated inspection systems, and its accuracy directly impacts the system’s diagnostic capabilities and fault prevention levels. However, traditional methods face numerous limitations when dealing with complex power environments, such as inadequate recognition performance under conditions of significant perspective angles and geometric distortions. This paper proposes a cable-to-terminal connection relationship recognition method based on the YOLOv8-pose model. The YOLOv8-pose model combines object detection and pose estimation techniques, significantly improving detection accuracy and real-time performance in environments with small targets and dense occlusions through optimized feature extraction algorithms and enhanced receptive fields. The model achieves an average inference time of 74 milliseconds on the test set, with an accuracy of 92.8%, a recall rate of 91.5%, and an average precision mean of 90.2%. Experimental results demonstrate that the YOLOv8-pose model performs excellently under different angles and complex backgrounds, accurately identifying the connection relationships between terminals and cables, providing reliable technical support for automated substation inspection systems. This research offers an innovative solution for automated substation inspection systems, with significant application prospects.

Keywords:

cable-to-terminal connection; YOLOv8-pose; deep learning; pose estimation; object detection

1. Introduction

The application of 3D connection recognition technology in the modern engineering field is becoming increasingly important [1]. With the continuous advancement of industrial automation and intelligent manufacturing, there is a growing demand for precise detection and monitoring of complex equipment and systems [2]. Traditional 2D image processing techniques can no longer meet the requirements for accuracy and real-time performance in some application scenarios, while 3D connection recognition technology based on pose estimation provides a new solution [3].

With the development of Industry 4.0, industrial electrical automation has become an important part of intelligent manufacturing [4]. Electrical automation systems are widely used in various industrial scenes, such as power plants, substations, and manufacturing industries, achieving the intelligent, digital, and efficient production process through automation technology. In industrial electrical automation, automatic inspection is a key link. The automatic inspection system uses sensors, image processing, and data analysis technology to monitor and diagnose faults of electrical equipment in real time [5]. The automatic inspection system can regularly or continuously check the operating status of the equipment, detect and handle faults in a timely manner, and ensure the normal operation of the equipment. Compared with traditional manual inspections, automatic inspections not only improve efficiency but also reduce the occurrence of human errors. For example, in the power system, the automatic inspection system can use drones, robots, and other equipment for automated inspection of substations and power lines to detect and eliminate faults in a timely manner, improving the operational reliability of the power grid [6].

In the automatic inspection process, terminal connection recognition is an important task. Terminal connection recognition involves the accurate detection and classification of cable connectors, which is crucial for ensuring the normal operation and safety of electrical equipment. At present, many industrial sites still rely on manual inspections, which are not only time-consuming and labor-intensive but also prone to misjudgment and missed inspections. Through an automated terminal connection recognition system, inspection efficiency can be significantly improved, human errors can be reduced, and the inspection process can be automated and intelligent [7]. For example, in substation inspection, the terminal connection recognition system can automatically recognize various types of cable connectors, detect their connection status, and ensure the correct connection and safe operation of electrical equipment. Target recognition technology is one of the core technologies to achieve terminal connection recognition. Traditional manual recognition methods are not only inefficient but also susceptible to human factors, leading to misjudgment and missed inspections. By introducing target recognition technologies such as YOLO, DBNet, and CRNN, which have strong feature extraction and pattern recognition capabilities, the automation of terminal connection detection can be realized, reducing human intervention and improving inspection efficiency and accuracy.

In the process of machine vision recognition of the connection relationship between terminals and cables, when the camera’s shooting angle forms a significant angle with the three-dimensional connection surface of the terminals and cables, it is extremely difficult to judge the connection relationship by relying solely on the direction of the cables in the two-dimensional image, and the accuracy of recognition is low. As shown in Figure 1, in this figure, the straight direction of cable 1 points to terminal 12, and the straight direction of cable 2 points to terminal 17. However, cable 1 and cable 2 are actually connected to terminals 13 and 18, respectively. This picture shows that it is difficult to accurately determine the true connection relationship between the cable and the terminal by simply recognizing the two-dimensional image.

Despite the broad application prospects of target recognition technology in industrial electrical automation, there are still some challenges and issues:

(1): Complex backgrounds and occlusions: In practical applications, the terminal connection area may have complex backgrounds and occlusions, which can affect the accuracy of the recognition algorithm. For example, during the inspection of power equipment, cable connectors may be partially obscured by other equipment or obstacles, increasing the difficulty of recognition.
(2): Real-time performance and computational efficiency: In the inspection of large-scale electrical equipment, real-time performance and computational efficiency are key considerations. How to improve computational speed and reduce resource consumption while ensuring high accuracy is a hot and difficult point in current research. In practical applications, it is necessary to quickly process a large amount of image data to ensure that the system can respond and handle faults in real time.
(3): Diversity and quantity of training data: The performance of target recognition algorithms heavily depends on the diversity and quantity of training data. For specific application scenarios, a large number of annotated data are needed to train the model to ensure its performance in practical applications. However, obtaining a large number of high-quality annotated data in actual operations remains a challenge.
(4): Robustness and stability of the algorithm: Under different lighting conditions and environments, the recognition algorithm needs to maintain stable performance. This requires the algorithm to have strong robustness to cope with various complex application scenarios. For example, in outdoor environments, lighting conditions change greatly, and the recognition algorithm needs to work stably under both strong and weak light conditions.

At present, research on 3D recognition technology has made significant progress and can be used to solve the problems of complex backgrounds and occlusions encountered in 2D target recognition. Many scholars and research institutions are committed to developing more accurate and efficient pose estimation algorithms. For example: Cao et al. proposed the OpenPose model, which can estimate the 2D pose of multiple people in real time and has been widely used in the field of pose estimation [8]. Güler et al. proposed the DensePose model, which is a pose estimation method that maps every pixel of the human body to the 3D surface, achieving high-precision pose estimation under complex backgrounds [9]. Huang et al. proposed a multi-view convolutional neural network model, which achieves high-precision recognition of 3D shapes by combining image information from different perspectives [10]. Charles et al. proposed the PointNet model, which is a deep learning model based on point sets, used for 3D classification and segmentation tasks, showing superior performance in processing 3D data [11]. These models, through training on large-scale datasets, can achieve high-precision pose estimation and exhibit good robustness and stability in practical applications.

This paper analyzes and finds similarities between the connection relationship of terminals and cables and human posture; therefore, it treats the connection of cables and terminals as a whole and proposes a cable-to-terminal connection relationship recognition method based on the YOLOv8-pose model. With the help of the YOLOv8-pose model, the problem of identifying the connection relationship between terminals and cables through two-dimensional images under conditions of large-angle perspective distortion has been effectively solved.

2. YOLOv8-Pose Estimation Model

The YOLOv8-pose model represents the latest advancements in object detection and pose estimation technology, integrating these two capabilities to achieve simultaneous detection and pose recognition of objects [12]. Building upon the traditional YOLO architecture [13], YOLOv8-pose further introduces a pose estimation module, expanding the application scope of the model.

2.1. YOLOv8-Pose Network Structure

The structure of YOLOv8-pose is divided into four main parts [14], each optimized for specific functions to ensure that the model provides accurate and reliable recognition results under various environmental conditions. The specific network structure is shown in Figure 2.

The input layer (Input) is responsible for pre-processing the raw image data, including scaling transformation and color adjustment, thereby standardizing the input data and enhancing the model’s robustness to different environmental conditions such as lighting changes and occlusions. The pre-processing steps not only improve the model’s stability but also reduce the impact of external environments on the detection results. The backbone network (Backbone) is the core module for feature extraction, composed of depthwise separable convolutional layers. Compared with traditional convolutional layers, depthwise separable convolutional layers significantly reduce the computational burden and speed up the inference speed by dividing the convolution operation into depthwise convolution and pointwise convolution. The calculation formulas for depthwise convolution and pointwise convolution are as follows:

D e p t h w i s e C o n v o l u t i o n : (I \times K)_{C} = \sum_{i, j} I_{C (i, j)} K_{(i, j)}

(1)

Pointwise Convolution : {(I \times P)}_{C} = \sum_{C^{'}} I_{C^{'}} P_{(C^{'}, C)}

(2)

where I is the input feature map, K is the depthwise convolution kernel, P is the pointwise convolution kernel, and c represents the channels. The backbone network extracts multi-level image features by applying convolutional kernels of different sizes, providing rich information for precise object detection and pose estimation. The neck structure (Neck) connects the backbone network and the detection head, mainly through technologies such as the feature pyramid network and the path aggregation network to fuse features at different levels, enhancing the model’s ability to detect objects at multiple scales. The feature pyramid network fuses features at various scales, enabling the model to handle both large and small objects simultaneously, thereby improving the accuracy of object detection and pose estimation. The feature fusion formula of the feature pyramid network is as follows:

P_{i} = C o n v (C_{i}) + U p s a m p l e (P_{i + 1})

(3)

In this context, C_i represents the feature map of the i-th layer, and C_i also denotes the pyramid feature map of the i-th layer. The Path Aggregation Network further optimizes the transmission and combination of features, ensuring that the model still maintains good detection performance under high-density target environments. The formulas for feature transmission and combination in the Path Aggregation Network are as follows:

P_{i}' = C o n v (P_{i} + P_{i - 1}')

(4)

In this context, P_i′ represents the feature map of the i-th layer after being optimized by the Path Aggregation Network. The detection head (Head) is the output layer of the model, responsible for generating classification results, object bounding box coordinates, and pose information. YOLOv8-pose integrates key-point detection tasks in the detection head part, enabling the model to perform object detection and pose estimation simultaneously. By accurately outputting the spatial positions of the target key points, the detection head can effectively identify the connection points between terminals and cables in the power system, which is crucial for accurately judging the connection relationships. The detection head employs an anchor mechanism and a multi-task loss function, which not only detects the target but also accurately estimates the target’s pose, ensuring efficient recognition in various complex scenes. The YOLOv8-pose model, through a comprehensive and optimized network structure, not only improves the efficiency of object detection but also enhances the ability to extract detailed pose information of targets in complex environments. Compared with traditional YOLO models, YOLOv8-pose performs excellently in handling multi-task scenarios, ensuring high-precision detection. The improvements to this model provide more accurate and reliable detection results in various complex environments, such as industrial inspections.

2.2. Pose Estimation

The most significant feature of pose estimation compared to other computer vision tasks is its ability to analyze the three-dimensional joints in a two-dimensional image through neural networks.

When dealing with the problem of joint occlusion, the key for YOLOv8-pose lies in fully utilizing contextual information. Contextual information includes the positions of detected joints, structural constraints of human posture, and the relative positional relationships between joints.

Any object has a specific anatomical structure, and these structural constraints can help the model perform accurate pose estimation even when some joints are occluded. Suppose we have an object model where the positions of the joints are represented as:

J = {j_{1}, j_{2}, \dots, j_{n}}

(5)

where ji is the position vector of the i-th joint point. The structure of human posture can be represented as a set of distance constraints between joints:

D_{i j} = | | j_{i} - j_{j} | |, \forall i, j \in {1,2, \dots, n}

(6)

These distance constraints

D_{i j}

can be learned from training data and used to infer the positions of occluded joints during the pose estimation process.

YOLOv8-pose uses a convolutional neural network (CNN) to extract feature maps from the input image. Suppose the input image is I, and the feature maps extracted by CNN are F:

F = C N N (I)

(7)

The feature map F contains multi-scale features and contextual information in the image. When predicting joint positions, the model relies not only on local features but also on global features and contextual information. For example, for an occluded joint j_k, the model can infer its position through other detected joints and contextual information in the feature map:

j_{k} = f (F, j_{1}, j_{2}, \dots, j_{k - 1}, j_{k + 1}, \dots, j_{n})

(8)

In some advanced models, graph convolutional networks (GCNs) can be used to further integrate and utilize contextual information. Suppose the joints form an undirected graph G = (V, E), where V is the set of joints, and E is the connections between joints. Through GCNs, message passing and feature updating for joints can be performed:

H^{(l + 1)} = σ (\sum_{v \in N (u)} W^{(l)} H_{v}^{(l)} + b^{(l)})

(9)

where H^(l) represents the feature representation of the l-th layer, W^(l) and b^(l) are the weights and biases, respectively, σ is the activation function, and N(u) represents the set of neighboring nodes of node u, v represents a node in set N(u). Through the iteration of multiple layers of GCN, the feature representation of joints can continuously integrate contextual information, thereby improving the estimation accuracy of occluded joints.

3. Wiring Recognition Methods

Recognizing the connections between objects in complex industrial and automation scenarios is a challenging task. Traditional methods often detect each object separately and then infer their connections through post-processing steps. This approach is susceptible to occlusions, lighting changes, and other environmental factors when facing complex scenes, leading to a decrease in recognition accuracy [15].

In the research of cable-to-terminal connection relationship recognition, it has been observed that the connection structure of terminals and cables bears a resemblance to human body postures. The terminal number can be regarded as the head of the human body, the numbering on the cable can be seen as the trunk of the human body, and the junction between the terminal and the cable can be considered as the joint connecting the trunk and the head (as shown in Figure 3). Drawing on this similarity, this paper innovatively applies posture estimation methods to the recognition of cable-to-terminal connection relationships. This method effectively addresses the issue of misalignment between terminal numbers and cables under conditions of significant perspective distortion, thereby significantly enhancing the accuracy of recognition.

3.1. Data Annotation Strategy

To achieve this innovation, the training data must first undergo special annotation processing. In traditional object detection tasks, each object is annotated with its bounding box and category label separately. In the method of this paper, for each pair of objects that form a connection, we annotate them as a whole. This means that the annotators need to define a new overall label for each connection and annotate its overall bounding box. Specifically, for an image containing multiple connections, the annotation process is as follows:

(1): Annotate the overall bounding box: First, the overall area of the connected components is determined and annotated with a red bounding box. This bounding box includes all relevant components and their connection points, ensuring that the scope of the connection structure is fully covered.
(2): Annotate individual objects: Next, we annotate each individual object within the overall connection area. The green bounding boxes in the figure identify the terminals in the two connected wholes, and the blue bounding boxes identify the cables in the two connected wholes. This refined annotation helps to clarify the specific location and shape of each component.
(3): Annotate connection relationships: Finally, we annotate the structural relationships between the connected wholes. By using yellow lines and marking points, the connection relationships between individual objects are further clarified, revealing their relative positions and interaction modes within the entire structure.

The specific annotation process is shown in Figure 4 below:

The detailed description of this series of annotation processes helps to deeply understand and accurately record the components and structural relationships in complex connection systems. This annotation method is suitable for application scenarios that require high-precision recognition and management of connection points and their components, such as industrial automation inspection, equipment maintenance, and electrical circuit management.

3.2. Model Training

After the completion of data annotation, the annotated dataset is input into the YOLOv8-pose model for systematic training. Unlike traditional training methods, the model training process proposed in this paper fully utilizes the holistic annotation information, enabling the model to consider the connection relationships between objects in a comprehensive manner during the learning process. Specifically, by analyzing the holistically annotated data, the model can effectively capture the spatial relationships and connection features between objects, thereby enhancing its ability to recognize connection relationships.

During the training process, the model first uses the pre-processed annotated data for preliminary feature extraction. At this stage, the model extracts multi-level feature representations of objects in the image through a convolutional neural network. On this basis, the model further introduces holistic annotation information for in-depth learning of connection relationships. This holistic learning strategy allows the model not only to recognize individual objects but also to understand and infer complex connection relationships between objects.

To optimize the model’s performance, a multi-task loss function is used during the training process, combining the tasks of object detection and pose estimation. This multi-task learning approach enables the model to simultaneously optimize the detection accuracy of objects and the accuracy of connection relationship recognition. At each stage of training, the model continuously adjusts parameters to gradually improve its ability to recognize connection relationships between objects.

In summary, by introducing holistic annotation information and multi-task learning strategies, the YOLOv8-pose model can fully learn the connection relationship features between objects during the training process, thereby significantly improving its recognition accuracy and robustness in complex scenarios.

3.3. Recognition Process

After obtaining the optimal trained model, it is applied to recognition and inference. In the inference phase, the model performs holistic recognition and pose estimation on the input image. The specific recognition process is shown in Figure 5:

(1): Object Detection: The model first outputs the bounding box and category label for each connection as a whole.
(2): Pose Estimation: Through the pose estimation module, the model further outputs the key point positions for each whole, providing detailed spatial location information.
(3): Connection Relationship Inference: Combining the detection results and pose estimation information, the specific connection relationships between objects are inferred through geometric calculations and rule matching. This holistic recognition method can more accurately identify connection relationships in complex scenes, improving the accuracy and robustness of recognition.

Figure 5. Diagram of the identification process.

4. Experimental Testing and Result Analysis

4.1. Dataset and Experimental Environment

This study relies on an annotated dataset for model training and evaluation, which is derived from actual photos taken inside a substation electrical panel, with over 2500 connections between cables and terminals labeled in 350 images, and the input image size is 640 × 640. The annotation details are shown in Figure 6, with each connection point clearly marked, ensuring consistent labeling and high quality during the training process. To achieve effective model training and fair evaluation, the dataset is divided into training, validation, and test sets in a 7:2:1 ratio.

The experimental environment used in this paper is shown in Table 1.

The basic model for this experiment is YOLOv8-pose, with a batch size set to 4 and an initial learning rate of 0.01.

4.2. Experimental Evaluation Metrics

The evaluation metrics for YOLOv8-pose include overall and key-point recall (Recall), precision (Precision), mean Average Precision (mAP), and mAP_50–95 [16]. Recall represents the proportion of the number of correctly detected targets to the total number of actual targets, and precision represents the proportion of correctly detected targets out of all detected targets. mAP requires the calculation of the average precision for each category, that is, the area under the PR curve, and mAP is the average of APs across all categories. mAP_50–95 calculates the average precision at ten different IoU thresholds, from IoU = 0.50 to IoU = 0.95, with a step size of 0.05. In addition, the miss rate (MR) is also considered, which is the proportion of the number of missed targets to the total number of targets.

The formulas for calculating each metric are as follows:

Recall:

Recall = \frac{T P}{T P + F N}

(10)

Precision:

Precision = \frac{T P}{T P + F P}

(11)

mAP, mean Average Precision:

m A P = \int_{0}^{1} p (r) d r

(12)

mAP_50–95:

m A P_{50 - 95} = \frac{1}{10} \sum_{k = 1}^{10} A P_{I o U_{k}}

(13)

where:

TP represents the number of correctly identified targets;
FP represents the number of incorrectly identified non-targets;
FN represents the number of actual targets that are incorrectly identified as non-targets [16].

4.3. Experimental Results and Analysis

After training to obtain the optimal model, we conducted a comprehensive evaluation of the model on an independent test set. The test set included a variety of situations with large perspective distortions to verify the model’s adaptability and accuracy in different scenarios. As shown in Figure 7, the model was able to accurately identify the connection relationships between terminals and cables under complex perspective conditions. This indicates that the YOLOv8n model performs well in handling large perspective distortions, effectively addressing the misalignment issues between terminals and cables, thereby enhancing the accuracy and reliability of recognition.

To further quantify the model’s performance, we calculated several evaluation metrics, including precision, recall, mean Average Precision (mAP), etc. These metrics indicate that the YOLOv8n model’s performance on the test set has achieved the expected effect, with high detection accuracy and robustness. The specific evaluation results are shown in Table 2 below:

To assess the model’s performance in practical applications, we also conducted field tests in multiple scenarios, including different lighting conditions, varying background complexities, and various cable and terminal arrangement methods. The field test results show that the YOLOv8n model’s performance in real environments is consistent with laboratory test results, capable of stably and accurately identifying the connection relationships between terminals and cables.

In summary, the YOLOv8n model performs excellently in identifying the connection relationships of terminals and cables, especially under conditions of large perspective distortion, providing reliable technical support for the automatic inspection system of substations. Future research will further optimize the model structure and training methods to improve its performance in more practical application scenarios and explore its potential application value in other industrial automation fields.

4.4. Optimization of the YOLOv8-Pose Loss Function

The loss function of the YOLOv8-pose model is at the core of its training [17], ensuring that the model can accurately learn key point information. This loss function integrates object localization loss

(λ_{b o x})

, pose estimation loss

(λ_{p o s e})

, and classification loss

(λ_{c l s})

to optimize the model’s predictive performance.

The overall loss function of the YOLOv8-pose model can be described as:

L = \frac{λ_{b o x}}{N_{p o s}} \sum L_{c I o U} (b_{i}, {\hat{b}}_{i}) + \frac{λ_{c l s}}{N_{p o s}} \sum L_{c l s} (c_{i}, {\hat{c}}_{i}) + \frac{λ_{p o s e}}{N_{p o s}} \sum L_{p o s e} (p_{i}, {\hat{p}}_{i})

(14)

where:

L_cIoU is the complete IoU (Intersection over Union) loss, used to measure the consistency between the predicted bounding box and the actual bounding box.
L_cls is the classification loss, using cross-entropy to measure the difference between the predicted category and the real category.
L_pose is the pose loss, using the Euclidean distance between key point coordinates.
N_pos represents the number of positive samples, and the $λ$ series are the weight coefficients for each loss component.
b_i, p_i, and c_i represent the ground truth values for object localization, pose estimation, and classification for each sample, respectively.
${\hat{b}}_{i}$ , ${\hat{p}}_{i}$ , and ${\hat{c}}_{i}$ are the predicted values for object localization, pose estimation, and classification for each sample, respectively.

From Figure 7, it can be seen that although the interconnected terminals and cables can be accurately identified, the recognition box does not always fully cover the cable numbers and terminal numbers. Therefore, it is necessary to adjust the training parameters to improve the recognition effect.

To enhance the recognition accuracy, parameter optimization was designed in this experiment. In the YOLOv8-pose model, the key points’ loss is a key part in locating the connection relationship between terminals and cables. To improve the accuracy of the recognition box for terminal numbers and cable numbers, a series of experiments were designed to adjust the proportion of the key-point loss weight in the loss function.

Firstly, while keeping other parameters unchanged, only the key-point loss weight was changed to observe the performance change of the model. In the initial experiment, the key-point loss weight was set to the model’s default value. Subsequently, the weight was modified from 5 to 30 with a step size of 1, and the changes in the model’s overall and key-point recognition accuracy, recall rate, and mAP in the detection task of terminals and cables were recorded. The experimental results are shown in Figure 8.

Figure 8 and Figure 9 show the experiments result for key-point recognition and the overall recognition result for the connection between terminals and cables, respectively. To further explore the relationship between the weight of the key-point loss function and the model’s recognition performance, all the results shown in Figure 8a–d and Figure 9a–d are normalized, then we calculate the sum of the eight normalized values corresponding to each weight. The results are shown in Figure 10. From the figure, it is evident that the model’s comprehensive performance is best when the weight is 22.

Looking at the actual recognition effect, in practical detection, the comprehensive recognition effect is best when the weight is 22. If the weight is too small, the recognition accuracy of both cable numbers and terminal numbers is low. If the weight is too large, overfitting will occur, reducing the recognition rate of terminal numbers. When the weight is moderate, the recognition accuracy is the highest.

The recognition results after parameter adjustment are shown in Table 3. Compared with the original model, there is a certain improvement in precision, recall rate, and mean Average Precision (mAP) o1n the test set.

We conducted the experiments with the AlphaPose and OpenPose models over the same dataset, achieving recognition accuracies of 91.7% and 89.4%, respectively. All pose estimation models have a relatively high recognition accuracy, indicating that using human pose estimation models for the identification of the connection status between cables and terminals is effective. The dataset used in the experiments of this paper is relatively small, and to prevent overfitting, a smaller model, YOLO8n-pose, was adopted. To further improve recognition accuracy, more complex models from the YOLOv8 series could be used, but this would also require a larger dataset for training.

5. Conclusions

In this study, we explored the three-dimensional connection recognition method based on pose estimation and conducted a detailed technical analysis and experimental verification by introducing the YOLOv8-pose model. The results of the study indicate that the technology of three-dimensional connection recognition has significant potential and broad prospects in the field of modern engineering.

Through the optimized design of the YOLOv8-pose model, we successfully achieved an efficient integration of target detection and pose estimation, improving the recognition accuracy and real-time performance in complex backgrounds. Specifically, the model shows excellent performance in the recognition of the connection relationship between terminals and cables, maintaining a high level of accuracy and robustness under various experimental conditions. In addition, by optimizing data pre-processing, annotation strategies, and model training, we significantly enhanced the performance of the model, providing reliable technical support for practical applications in complex industrial scenarios.

Despite the achievements of this study, there are still some challenges to be addressed. First, the accuracy of pose estimation still needs to be further improved under complex backgrounds and occlusion conditions. Second, the real-time performance and computational efficiency of the model still need to be optimized to meet the needs for efficient, low-resource recognition systems in industrial automation and intelligent manufacturing.

Future research directions can focus on the following aspects: further optimizing the pose estimation algorithm to enhance the model’s recognition accuracy in complex scenes; introducing more efficient deep learning techniques and hardware acceleration to enhance the model’s real-time performance; expanding the application range of three-dimensional connection recognition technology to explore its potential application value in other areas of industrial automation.

In summary, the technology of three-dimensional connection recognition based on pose estimation provides a new solution for the intelligent process in the field of engineering. With continuous technological innovation and optimization, this technology is bound to play an important role in more practical applications in the future, promoting the development of industrial automation and intelligent manufacturing.

Author Contributions

Conceptualization, X.Q.; methodology, X.W.; software, X.W. and G.H.; validation, X.W. and G.H.; resources, Y.L.; data curation, X.Q. and Y.L.; writing—original draft preparation, G.H.; writing—review and editing, Y.L. and X.T.; visualization, X.T. and G.H.; supervision, X.T.; project administration, Y.L.; funding acquisition, X.Q. and Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was financially supported by the Science and Technology Project of State Grid Hunan Electric Power Co., Ltd., grant number 5216A321N00H.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The training data and source code are available by https://github.com/RainyMissing/ele_pose.git, accessed on 29 July 2024.

Conflicts of Interest

Author Xu Qu and Yanping Long are employed by the State Grid Hunan Extra High Voltage Substation Company and the Substation Intelligent Operation and Inspection Laboratory of State Grid Hunan Electric Power Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Yun, H.; Kim, E.; Kim, D.M.; Park, H.W.; Jun, M.B.G. Machine learning for object recognition in manufacturing applications. Int. J. Precis. Eng. Manuf. 2023, 24, 683–712. [Google Scholar] [CrossRef]
Zheng, P.; Wang, H.; Sang, Z.; Zhong, R.Y.; Liu, Y.; Liu, C.; Mubarok, K.; Yu, S.; Xu, X. Smart manufacturing systems for Industry 4.0: Conceptual framework, scenarios, and future perspectives. Front. Mech. Eng. 2018, 13, 137–150. [Google Scholar] [CrossRef]
Liu, W.; Bao, Q.; Sun, Y.; Mei, T. Recent advances of monocular 2D and 3D human pose estimation: A deep learning perspective. ACM Comput. Surv. 2022, 55, 1–41. [Google Scholar] [CrossRef]
Wollschlaeger, M.; Sauter, T.; Jasperneite, J. The future of industrial communication: Automation networks in the era of the internet of things and industry 4.0. IEEE Ind. Electron. Mag. 2017, 11, 17–27. [Google Scholar] [CrossRef]
Abd Al Rahman, M.; Mousavi, A. A review and analysis of automatic optical inspection and quality monitoring methods in electronics industry. IEEE Access 2020, 8, 183192–183271. [Google Scholar]
Larrauri, J.I.; Sorrosal, G.; González, M. Automatic system for overhead power line inspection using an Unmanned Aerial Vehicle RELIFO project. In 2013 International Conference on Unmanned Aircraft Systems (ICUAS); IEEE: Piscataway, NJ, USA, 2013; pp. 244–252. [Google Scholar]
Prieto, F.; Redarce, T.; Lepage, R.; Boulanger, P. An automated inspection system. Int. J. Adv. Manuf. Technol. 2002, 19, 917–925. [Google Scholar] [CrossRef]
Cao, Z.; Simon, T.; Wei, S.E.; Sheikh, Y. Realtime multi-person 2D pose estimation using part affinity fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7291–7299. [Google Scholar]
Güler, R.A.; Neverova, N.; Kokkinos, I. Densepose: Dense human pose estimation in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7297–7306. [Google Scholar]
Huang, G.B.; Mattar, M.; Berg, T.; Learned-Miller, E. Labeled faces in the wild: A database forstudying face recognition in unconstrained environments. In Proceedings of the Workshop on Faces in ‘Real-Life’ Images: Detection, Alignment, and Recognition, Marseille, France, 17 October 2008. [Google Scholar]
Charles, R.Q.; Su, H.; Mo, K.; Guibas, L.J. Pointnet: Deep learning on point sets for 3D classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 652–660. [Google Scholar]
Wang, G.; Chen, Y.; An, P.; Hong, H.; Hu, J.; Huang, T. UAV-YOLOv8: A small-object-detection model based on improved YOLOv8 for UAV aerial photography scenarios. Sensors 2023, 23, 7190. [Google Scholar] [CrossRef] [PubMed]
Li, Y.; Fan, Q.; Huang, H.; Han, Z.; Gu, Q. A modified YOLOv8 detection network for UAV aerial image recognition. Drones 2023, 7, 304. [Google Scholar] [CrossRef]
Terven, J.; Córdova-Esparza, D.M.; Romero-González, J.A. A comprehensive review of yolo architectures in computer vision: From yolov1 to yolov8 and yolo-nas. Mach. Learn. Knowl. Extr. 2023, 5, 1680–1716. [Google Scholar] [CrossRef]
Stjepandić, J.; Sommer, M. Object recognition methods in a built environment. In DigiTwin: An Approach for Production Process Optimization in a Built Environment; Springer: Berlin/Heidelberg, Germany, 2022; pp. 103–134. [Google Scholar]
Magalhães, S.A.; Castro, L.; Moreira, G.; Dos Santos, F.N.; Cunha, M.; Dias, J.; Moreira, A.P. Evaluating the single-shot multibox detector and YOLO deep learning models for the detection of tomatoes in a greenhouse. Sensors 2021, 21, 3569. [Google Scholar] [CrossRef] [PubMed]
Sekharamantry, P.K.; Melgani, F.; Malacarne, J. Deep learning-based apple detection with attention module and improved loss function in YOLO. Remote Sens. 2023, 15, 1516. [Google Scholar] [CrossRef]

Figure 1. Picture of terminal and cable connections.

Figure 2. YOLOv8-pose network architecture diagram.

Figure 3. The analogy between human body postures and wiring conditions.

Figure 4. Picture of data annotation.

Figure 6. Data annotation pictures.

Figure 7. Recognition result image.

Figure 8. Experiments result for the key-point recognition. (a) Precision (b) Recall (c) mAP₅₀ (d) mAP_50–95.

Figure 9. Experiments result for the recognition of cable-to-terminal connection. (a) Precision (b) Recall (c) mAP₅₀ (d) mAP_50–95.

Figure 10. Normalized metrics sum over experiments.

Table 1. Experimental environment.

Parameters	Experimental Environment
Operating System	Windows 10, 64 bit
CPU	13th Gen Intel^® Core™ i7-13700K CPU @ 3.40 GHz (Intel, Santa Clara, CA, USA)
GPU	GeForce RTX 4060 (16.0 GB)
Memory	32 GB
Python version	3.11.8
Deep Learning Framework	PyTorch 2.2.2, CUDA 12.1

Table 2. Detection results of the test set.

Evaluation Metrics	Result
Precision	92.3%
Recall	90.7%
Mean Average Precision	89.5%
Average Inference Time(s)	0.074

Table 3. Detection results of the test set after parameter adjustment.

Evaluation Metrics	Result
Precision	92.8%
Recall	91.5%
Mean Average Precision	90.2%
Average Inference Time(s)	0.074

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Qu, X.; Long, Y.; Wang, X.; Hu, G.; Tao, X. Research on the Cable-to-Terminal Connection Recognition Based on the YOLOv8-Pose Estimation Model. Appl. Sci. 2024, 14, 8595. https://doi.org/10.3390/app14198595

AMA Style

Qu X, Long Y, Wang X, Hu G, Tao X. Research on the Cable-to-Terminal Connection Recognition Based on the YOLOv8-Pose Estimation Model. Applied Sciences. 2024; 14(19):8595. https://doi.org/10.3390/app14198595

Chicago/Turabian Style

Qu, Xu, Yanping Long, Xing Wang, Ge Hu, and Xiongfei Tao. 2024. "Research on the Cable-to-Terminal Connection Recognition Based on the YOLOv8-Pose Estimation Model" Applied Sciences 14, no. 19: 8595. https://doi.org/10.3390/app14198595

APA Style

Qu, X., Long, Y., Wang, X., Hu, G., & Tao, X. (2024). Research on the Cable-to-Terminal Connection Recognition Based on the YOLOv8-Pose Estimation Model. Applied Sciences, 14(19), 8595. https://doi.org/10.3390/app14198595

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research on the Cable-to-Terminal Connection Recognition Based on the YOLOv8-Pose Estimation Model

Abstract

Featured Application

Abstract

1. Introduction

2. YOLOv8-Pose Estimation Model

2.1. YOLOv8-Pose Network Structure

2.2. Pose Estimation

3. Wiring Recognition Methods

3.1. Data Annotation Strategy

3.2. Model Training

3.3. Recognition Process

4. Experimental Testing and Result Analysis

4.1. Dataset and Experimental Environment

4.2. Experimental Evaluation Metrics

4.3. Experimental Results and Analysis

4.4. Optimization of the YOLOv8-Pose Loss Function

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI