Vibrator Rack Pose Estimation for Monitoring the Vibration Quality of Concrete Using Improved YOLOv8-Pose and Vanishing Points

Ren, Bingyu; Zheng, Xiaofeng; Guan, Tao; Wang, Jiajun

doi:10.3390/buildings14103174

Open AccessArticle

Vibrator Rack Pose Estimation for Monitoring the Vibration Quality of Concrete Using Improved YOLOv8-Pose and Vanishing Points

State Key Laboratory of Hydraulic Engineering Intelligent Construction and Operation, Tianjin University, 135 Yaguan Road, Tianjin 300350, China

^*

Author to whom correspondence should be addressed.

Buildings 2024, 14(10), 3174; https://doi.org/10.3390/buildings14103174 (registering DOI)

Submission received: 1 September 2024 / Revised: 2 October 2024 / Accepted: 3 October 2024 / Published: 5 October 2024

(This article belongs to the Section Building Energy, Physics, Environment, and Systems)

Download

Browse Figures

Versions Notes

Abstract

:

Monitoring the actual vibration coverage is critical for preventing over- or under-vibration and ensuring concrete’s strength. However, the current manual methods and sensor techniques fail to meet the requirements of on-site construction. Consequently, this study proposes a novel approach for estimating the pose of concrete vibrator racks. This method integrates the Linear Spatial Kernel Aggregation (LSKA) module into the You Only Look Once (YOLO) framework to accurately detect the keypoints of the rack and then employs the vanishing point theorem to estimate the rotation angle of the rack without any 3D datasets. The method enables the monitoring of the vibration impact range for each vibrator’s activity and is applicable to various camera positions. Given that measuring the rotation angle of a rack in reality poses is challenging, this study proposes employing a simulation environment to validate both the feasibility and accuracy of the proposed method. The results demonstrate that the improved YOLOv8-Pose achieved a 1.4% increase in accuracy compared with YOLOv8-Pose, and the proposed method monitored the rotation angle with an average error of 6.97° while maintaining a working efficiency of over 35 frames per second. This methodology was successfully implemented at a construction site for a high-arch dam project in China.

Keywords:

pose estimation; concrete vibrator; YOLOv8-Pose; vanishing point; simulation

1. Introduction

The strength of concrete, which is the primary construction material for dams, directly affects the safety of these structures [1,2]. Many studies have found that the strength of concrete is affected not only by the type of concrete, temperature, and age, but also by adequate vibration [3,4,5,6,7]. However, insufficient distribution of vibration coverage can also have a detrimental impact on the effectiveness of vibrations. Vibration coverage, as shown in Figure 1, refers to the impact area of each vibration, as over-vibrations can lead to a sand layer forming on the concrete’s surface, while under-vibrations may result in inadequate compaction. To prevent over- or under-vibration, monitoring the range of vibrations at construction sites often relies on subjective judgment. However, human subjectivity is influenced by factors such as attention and experience, which hinder the systematic evaluation of vibration quality [8].

Numerous studies have been conducted in the field of construction machinery monitoring to ensure construction quality [9,10]. Sensor-based methods are employed for monitoring excavators or vibrators during construction (as shown in Figure 1), while sensor-based methods require preinstallation and meticulous maintenance. Field experiments and available related papers [11] indicated that heading-angle monitoring sensors are affected by the self-magnetic field generated by construction machines. Therefore, the sensor-based approach fails to achieve monitoring of the rotation angle of the vibration rack, i.e., the vibration coverage. In recent years, computer vision-based methods have gained popularity for monitoring construction machinery’s activity due to their ability to perform multiple tasks and ease of maintenance. However, training or validating a computer vision-based 3D pose estimation model typically necessitates an extensive collection of 3D datasets or depth information corresponding to 2D images, posing significant costs and challenges at construction sites.

In this study, we proposed a monocular camera-based method using computer vision to estimate the 3D pose of a vibration rack. Our method is straightforward and includes two main stages. Firstly, we integrated the Linear Spatial Kernel Aggregation (LSKA) module into the You Only Look Once v8-Pose (YOLOv8-Pose) model to accurately detect the keypoints of the vibration rack. Then, a rotation angle estimation algorithm was developed based on the vanishing point theorem. The keypoint information from improved YOLOv8-Pose was analyzed to estimate the rotation angle of the vibration rack. To overcome the challenge of collecting a large number of 3D datasets, we constructed a simulation environment using Unity3D to generate 3D verification datasets. Finally, both simulation experiments and field applications were conducted to validate the effectiveness of the proposed method.

The main contributions of this work are given as follows.

A keypoint detection algorithm based on YOLOv8-Pose was improved by LSKA. The algorithm enhanced the accuracy by 1.4% compared with the original algorithm while maintaining real-time detection.
A novel framework for estimating the rotation angle of the rack at construction sites was proposed. This framework enabled a direct improvement of 2D pose estimation results to 3D pose estimation results, even in the absence of depth information. The average error achieved by this method was approximately 6.97°.
A simulation environment based on Unity3D was proposed for generating a dataset to validate the 3D pose estimation. This methodology effectively circumvented the inherent error of 8.32° associated with manual data acquisition, resulting in a more precise and efficient dataset generation process.

2. Related Works

2.1. Deep Learning-Based Pose Estimation

Pose estimation is an important but challenging task in the field of computer vision [12]. Depending on the model’s outcome, pose estimation methods can be classified as 2D or 3D [13].

The 2D pose estimation based on deep learning involves feeding RGB images into a neural network [14]. Subsequently, the neural network performs convolution calculations on the input RGB image to generate an image-feature map. This map is then processed using the feature-map processing module to extract the location information for each keypoint. Then an accurate position of the keypoint is obtained through direct regression [15] or a heat map [16,17]. Finally, the deep learning method employs specific strategies to associate keypoints and derives the 2D pose estimation results of the target [18].

Direct 3D pose estimation from a single 2D image presents significant challenges because of the absence of depth information [19]. Current approaches to addressing this issue can be broadly categorized into two types. One approach involves directly regressing the keypoints’ positions from 2D monocular images to predict 3D poses [20]. For example, Wang [21] proposed a distribution-aware single-stage model that represented a 3D human pose as an offset between 2.5D human center points and 3D human keypoints. This model was used to estimate the 3D pose of a target in a 2D image accurately. Pavlakos [22] trained a deep learning network by utilizing the relative depths among keypoints and achieved 3D pose estimation of the target. Tekin [23] inputted 2D image features and 3D information into two convolutional neural networks. The 3D pose was then extracted directly based on the 2D keypoint estimation in the images. Zhang [24] utilized region proposals to estimate the 3D pose of single-depth images and achieved satisfactory results across various benchmark datasets. Moon [25] proposed a depth estimation network called RootNet for monocular 3D pose estimation. This network leveraged camera parameters, the target’s actual area, and the target image area to calculate the absolute depth of the target accurately. The other approach involves utilizing a 2D pose estimator to acquire the 2D information of the keypoints of a human, subsequently enabling the estimation of 3D human pose coordinates based on a 2D representation. For instance, Chen [26] initially conducted 2D pose estimation on an image and then employed nearest-neighbor matching to estimate the 3D pose. Cheng [27] proposed an occlusion-aware deep learning framework that effectively filtered out unreliable estimations of occluded keypoints, which were then fed into both 2D and 3D temporal convolutional networks for further processing. Ultimately, this methodology yielded accurate results for 3D pose estimation.

To conclude, regardless of the aforementioned methods, it is imperative to acquire depth information or establish a certain number of information repositories. Undoubtedly, this endeavor incurs exorbitant costs for implementation.

2.2. Application and Development of Simulations in Pose Estimation

Deep learning-based pose estimation methods generally require extensive datasets to train the model [28]. To enhance the robustness, generalization and accuracy of the model, these datasets usually need to have different viewpoints, subjects, backgrounds and poses. Consequently, numerous researchers have created datasets by capturing indoor photographs from multiple perspectives or conducting field research. However, datasets for construction machinery are not as readily available as those for the human body [29]. Obtaining a diverse range of machine pose data typically requires extended periods of on-site time. However, data collection at construction sites can be challenging owing to limited visibility and high complexity [30]. Thus, capturing sufficient 3D pose information from a mechanical device is a costly and challenging task involving safety concerns.

To address these challenges, researchers have integrated simulation and modeling techniques into the generation of datasets. Tian [31] utilized Cinema4D (C4D), a 3D software package, to produce excavator data for training a pose estimation algorithm. Papaioannidis [32] proposed an image-mapping model that could map real images to synthetic ones. This model was used to train the pose estimation models. Compared with models trained solely with real data, the framework enhanced the accuracy of 3D target pose estimation. Liu [33] utilized 3D human data that were zero-realistic to train a pose estimation model. The model was comparable with those trained using real data. Rogez [34] presented an image synthesis engine that generated synthetic images with 3D pose annotations for training pose estimation models. Although this method required significant effort and cost, it provided a solution to the problem of limited and homogeneous datasets. It has been shown that pose estimation models trained using simulation models are comparable with models trained on real images.

The direct evaluation of a model’s performance in real-world settings can pose challenges owing to the unique operational scenarios encountered by specific targets. Simulations were conducted to verify the accuracy, rationality and effectiveness of the models. For example, due to the unique characteristics of a spacecraft’s working environment, directly testing the model’s performance is costly and risky. Therefore, Han [35] designed a pose estimation algorithm specifically for spacecraft with two spacecraft models designed to acquire image data using the 3D software Blender. Qiao [36] proposed a monocular method to estimate the poses of satellites in 3D. To verify the accuracy of the algorithm, Qiao used a simulation method to construct satellite pose datasets called BUAA-SID-POSE 1.0. In addition to the aforementioned applications, other researchers [37,38] proposed similar simulation methods for constructing image datasets. These datasets can be used to validate the models and solve the problems of high testing costs and difficult experimental conditions.

Therefore, the construction of a validation dataset using a simulation environment is feasible. However, it is currently necessary to develop a simulation environment capable of accurately simulating the mechanical construction process.

2.3. Monitoring Method of Mechanical Construction Processes

Existing methods for monitoring concrete vibrators require various sensors mounted on a machine [9], such as millimeter-wave radar, laser rangefinders and so on. However, sensor-based approaches are costly and suffer various inconveniences regarding power supply, installation and maintenance [39]. In addition, their accuracy is usually affected by multiple factors, such as the magnetic field and installation accuracy [11]. Computer vision methods have been introduced in engineering to address these issues. For example, Chen [40] identified the operational status of an excavator by analyzing the position and size of a target bounding box. This was achieved by inputting the target recognition and tracking model results into a 3D ResNET network. Kim [41] utilized a convolutional neural network to identify the location of the transporter and unloading point, enabling the recognition of the unloading activity. However, the target detection model could not accurately determine the pose of the target because of the large size of the target bounding box.

Many deep learning-based methods for pose estimation have been designed for human subjects. Therefore, modifications to the model are required for its application in the pose estimation of construction machinery [42]. Tian [31] proposed a model that enhanced pose estimation from 2D to 3D for real-time excavator monitoring during operation to prevent collisions between excavators and workers. Assadzadeh [43] utilized pose estimation methods to monitor excavators’ activity. Considering the challenge of obtaining complete 3D excavator pose datasets, this study suggested the use of mixed datasets (simulation and real) for network training. Zhao [39] introduced the YOLOv5-FastPose (YFP) model for the pose monitoring and safety management of construction machinery. Wen [44] proposed utilizing the dynamic constraints between robot arms and 2D pose estimation results to achieve 3D excavation pose estimations with keypoint location estimates and an error margin of approximately 0.66 m. However, these approaches require extensive 3D datasets for training or validation. This requirement poses significant costs and challenges for the construction industry. Li [45] proposed an excavator pose estimation method based on monocular RGB images with an error margin of 3°. This method, however, treats angle estimation as a regression problem, which yields higher accuracy but necessitates the network’s retraining whenever there is a change in the camera’s position.

Therefore, the objective of this study was to develop a methodology that can effectively reduce the workload associated with data acquisition and field maintenance, aiming to accurately estimate the rotation angle of the vibration rack.

3. Methodology

The primary research framework is presented in Figure 2, including two parts. Firstly, a framework integrating keypoint detection using improved YOLOv8-Pose and angle reasoning based on the vanishing point theorem was developed to monitor the operation of a vibration rack. The proposed method used a monocular camera to estimate the rotation angle of a 3D vibration rack, eliminating the need for depth information acquisition and camera parameter measurements. Secondly, a 3D simulation method based on Unity3D and a control algorithm were designed to validate both the feasibility and accuracy of the proposed method. The simulation method utilized 3D modeling software to recreate an arch dam construction scene and Unity3D 5.6 software to simulate the relevant equipment’s movement. In addition, this study utilized the Unity Graphics User Interface (UGUI) to create a vibrator interaction module, which could be used by the operator to control the vibrator model. The pose of the vibrator was also displayed in the UGUI.

3.1. Rotation Angle Estimation Model Based on Improved YOLOv8-Pose

3.1.1. Selection of Keypoints

This study selected 10 key target points, as illustrated in Figure 3. The reference points are the two keypoints (Points 1 and 2 in Figure 3) on either side of the bucket. The connection between the two points was parallel to the upper and lower planes of the vibration rack. The other eight points were positioned between the outermost vibration rod and the rack at the upper and lower junctions. These eight points can be considered as the vertices of a cuboid, implying that the lines connecting any upper connection points will run parallel to the lower plane of the rack. In contrast, the lines connecting the corresponding upper and lower connection points, such as Points 3 and 7, will be perpendicular to the vibration rack plane.

3.1.2. YOLOv8-Pose Model

Existing pose estimation models that exhibit satisfactory performance are networks with a multiresolution cascade, such as DEKR [46], HigherHRNet [47] and improved pose estimation models based on target detection models, such as YOLO [48] and Faster R-CNN [49]. Maji et al. [50] compared these algorithms using the COCO2017 dataset. It was shown that the YOLO-based pose estimation models facilitate end-to-end training, enabling simultaneous detection of the target bounding box and corresponding 2D pose in a single forward pass, thereby surpassing the speed and accuracy achieved by most existing methods.

YOLOv8 builds on the successful lineage of previous YOLO models and uses Ultralytics as its framework. Ultralytics, an open-source library, exhibits exceptional extensibility and versatility by supporting various tasks, such as classification, segmentation and pose estimation. This remarkable feature offers developers the utmost convenience in terms of customization and deployment. Moreover, YOLOv8 demonstrated outstanding object detection accuracy across the COCO and Roboflow datasets, surpassing most contemporary models.

YOLOv8-Pose consists of three main parts: the backbone, neck and head networks. The backbone network conducts convolutional calculations and feature fusion processes on the input images at multiple scales to obtain feature maps at different scales. The neck network up-samples the feature maps at various scales and combines them with the original feature maps. After passing through the C2F module, some of these results are directly fed into the head network. In contrast, the remaining portion is down-sampled and combined with other feature maps before being fed into the head network. In this module, the YOLOv8 model replaces the C3 structure in the YOLOv5 backbone and neck networks with a C2F structure, incorporating additional feature map splices. This replacement significantly enhances the performance of the model. The head network uses these inputs to calculate the loss function using the fully connected layers.

The YOLOv8-Pose loss function consists of four components: classification, box, keypoint and keypoint confidence losses. Furthermore, YOLOv8 turns the original coupling head into an uncoupling head and introduces a Distribution Focal Loss function into the box loss function. Thus, the box loss function of YOLOv8 consists of two parts: CIoU and DFL.

3.1.3. Keypoint Detection Model

The keypoint detection task shares fundamental principles with object detection, as both require convolutional neural networks to capture broad contextual information within an image. However, as the network’s depth increases, down-sampling operations inevitably result in the loss of some relevant information, which can negatively impact the model’s accuracy. To address this issue, this study integrates the Large Separable Kernel Attention (LSKA) [51] module into the YOLOv8-Pose model.

The LSKA module decomposes the 2D convolution kernel into a cascade of 1D convolution kernels, allowing for the separate extraction of feature map characteristics in the horizontal and vertical directions. It captures contextual information through a spatial dilation convolution layer, thereby enhancing the model’s understanding of the spatial relationships in the image. This enhancement leads to improved keypoint detection accuracy with reduced computational complexity and memory requirements. The structure of the LSKA module is illustrated in Figure 4.

In this study, the LSKA (Large Separable Kernel Attention) module was integrated into the backbone network of YOLOv8-Pose to enhance the processing of both vertical and horizontal information within the image. The first LSKA module was positioned downstream of the second C2f module in the backbone network, thereby enhancing the initial contextual information. The second LSKA module was inserted after the Concat module in the SPPF (Spatial Pyramid Pooling Fast) module, enabling horizontal and vertical deepening of all the preceding information. This effectively strengthened the model’s ability to aggregate features at multiple scales. The new SPPF module was named LSKA-SPPF, which was introduced to replace the original SPPF module. The improved network structure is illustrated in Figure 5.

3.1.4. Estimation of the Rotation Angle of the Vibration Rack

The camera captures an image using central projection, resulting in the convergence of parallel lines from the real world into a single point within the image. This point is called the vanishing point (e.g., Figure 6). The main inferences related to the vanishing points are shown as follows.

In the same image, the set of parallel lines originally situated in the 3D space converges towards a common vanishing point.
The vanishing points of all lines parallel to the same plane can be connected to form a vanishing line.
The vanishing points of the three sets of parallel lines that are originally perpendicular to each other in space are perpendicular to each other at the line of the optical center of the camera.

In this study, the rotation angle of the vibration rack was estimated on the basis of this principle. The specific procedure is illustrated in Figure 7.

First, the improved YOLOv8-Pose network detected the keypoints selected in Section 3.1.1, such as those shown in Figure 3. Subsequently, the line connecting Point 5 to Point 6, as well as the line connecting Point 3 to Point 4, shared the common vanishing point, B. Similarly, on the basis of the connection mode shown in Figure 8a, we identified two additional vanishing points, A and C.

The subsequent step involved determining the precise location of the camera’s optical center by utilizing the image coordinates of the three vanishing points. According to the vanishing point theorem (3), the three vanishing points mentioned above were perpendicular to each other at the line of the optical center of the camera (AO, BO and CO). Therefore, the projection of the camera’s optical center onto the image plane was precisely located at the pendant center D of the triangle ABC. The coordinates of the three points A, B and C were known, making it straightforward to determine the location of point D, which was the center of the perpendicular, according to the geometric properties of the triangle ABC. Subsequently, by utilizing the coordinates A, B and D and the vertical foot E on side AB, we determined the focal length f of the camera using Equation (1).

f = \sqrt{l_{A E} \times l_{B E} - {l_{D E}}^{2}}

(1)

where

l_{A E}

,

l_{B E}

and

l_{D E}

are the lengths of the lines AE, BE and DE, respectively.

The connection of the reference point parallel to the top surface of the vibration rack allowed us to determine the intersection point F between the connection and the line AB, which could then be used as the directional line for the vibration rack. In this section, the optical center O was assigned the coordinates (0, 0, 0), and all the points located in the image plane were assigned the coordinates (x, y, f), where f is the focal length.

Finally, the angle between the FO and AO or BO was determined. This angle was approximately equal to the angle of rotation of the vibration rack.

3.2. Construction of the Simulation Environment

Currently, the rotation angle of a vibration rack is primarily determined through manual and sensor measurements. However, this approach raises concerns regarding the power supply’s stability, the reliability of data transmission and potential magnetic field interference, which may affect the precision of the sensors’ measurements. Moreover, manual measurements require time-consuming and labor-intensive angle-measuring instruments. Unity3D software was employed to construct a simulation platform for vibratory devices to validate the proposed approach’s efficacy and precision. The following steps outline this process.

Modeling the construction blocks of arch dams: This study utilized 3D modeling software to generate a comprehensive model of an arch dam block and its corresponding construction machinery, which was then imported into Unity3D. In Unity3D, the initial positions of all the construction equipment and character models were randomized. Each model autonomously generated a target point during the experiment and moved at a predetermined velocity to simulate movements of humans and equipment in the construction environment, creating diverse contextual backgrounds.

Sensor simulation: The sensing information of each sensor in this experiment was simulated based on mathematical calculations, considering the relative position relationship of the model (Figure 9). Equation (2) was used to calculate the angle between the boom, bucket rod and bucket, and the horizontal plane to simulate the inclination sensor’s information.

θ_{b} = \tan^{- 1} \frac{{z^{'}}_{a} - {z^{'}}_{b}}{\sqrt{{({y^{'}}_{a} - {y^{'}}_{b})}^{2} + {({x^{'}}_{a} - {x^{'}}_{b})}^{2}}}

(2)

where

θ_{b}

is the angle between the boom and the horizontal plane in the simulation environment, (x′_a, y′_a, z′_a) are the coordinates of the hinge point of the arm in the simulation environment and (x′_b, y′_b, z′_b) are the coordinates of the hinge point of the bucket in the simulation environment.

The calculations derived from Equation (3) were employed to simulate the acquisition of the rotational angle information for the vibration rack.

θ_{r} = θ_{v} - \tan^{- 1} \frac{{y^{'}}_{1} - {y^{'}}_{2}}{{y^{'}}_{1} - {x^{'}}_{2}}

(3)

where

({x^{'}}_{1} {, y^{'}}_{1}) a n d ({x^{'}}_{2} {, y^{'}}_{2})

are the positions of the vibration rods on both sides of the same row on the vibration rack,

θ_{v}

is the heading angle of the concrete vibrator and

θ_{r}

is the rotational angle of the vibration rack.

Because the coordinates’ reading yields information without fluctuations, this study introduced Gaussian distribution noise to simulate the fluctuation of actual perception information in the simulated perception data. Two cameras were set up in the simulation environment. The first camera was set on the right side of the cab of the concrete vibrator to simulate the camera’s actual installation position. Another camera was placed above the concrete vibrator to observe the movement. The simulation information for each sensor was integrated and displayed on the Unity Graphics User Interface (UGUI), as shown in Figure 10.

Simulation of the vibration process: This study presents the basic vibration logic for a vibrator model to simulate the construction process of a real concrete vibrator. There are three steps involved in the process. First, the vibration rack was aligned directly above the concrete. Second, the vibration rack was lowered vertically, and vibration rods were inserted into the concrete to a specific depth. Subsequently, it maintained its position for complete vibration. Finally, after completing the vibration process, the vibration rack was raised vertically, and the vibration rods were removed from the concrete. The overall motion of the vibrator was achieved using the proportional-integral-derivative (PID) control algorithm. The control deviation was the inclination angle between each vibrating arm and the horizontal plane.

4. Experiment and Applications

The image data used in this section to train the keypoint detection algorithm were gathered from a high-arch dam construction site in southwest China. The cameras were strategically positioned at various locations on the concrete vibrator during image acquisition to ensure a diverse dataset. In total, 3425 images were collected for this study.

The model training in this study was performed on a Windows 10 system with the following configurations: CUDA11.1, CUDNN8005, a consumer graphical processing unit (GPU) (NVIDIA GeForce GTX 1660Ti), a central processing unit (CPU) (11th Gen Intel(R) Core(TM) i7-11700F @ 2.50GHz) and 16 GB RAM. The model was subsequently trained for 120 epochs. The batch size was two, and the patency was set to five. The training time of the model was approximately 10 h.

4.1. Simulation

Before the simulation experiments, the proposed model was trained using transfer learning on a dataset comprising 648 simulated scenarios. The rationality of the motion of the vibrator model was first verified. By selecting the “single vibration” button on the UGUI, the vehicle body autonomously executed the predefined control logic to accomplish the desired behavior. The Unity platform recorded the positions of individual keypoints during the movement. The simulation results were compared with the trajectory of the vibrator’s movement recorded by the sensor, as shown in Figure 11. These findings suggested that the simulated motion trajectory of the vibrator closely approximated the actual motion trajectory, as evidenced by the results obtained from our simulation model.

During the simulation experiments, the camera captured images continuously and transmitted them to the vibration rack angle estimation module at a frequency of two images per second. The accuracy of the angle estimation algorithm was validated through multiple iterations, wherein adjustments were made to the initial position and pose. The data obtained from the three final validations are listed in Table 1. The table shows that the rotational angle detection method proposed in this study has an average estimated angle of 92.57° between lines OA and OB, as shown in Figure 6, with an average error of 2.57°. The algorithm detected an average rotation angle error of 6.97°, with a maximum error of 7.63° and a minimum error of 6.43° for the vibration rack. The statistics of the final three validation results are shown in Figure 12.

Figure 13 depicts a single cycle of the completion of a concrete vibration task in the simulation environment. The adjustment process for the vibration rack is illustrated in Figure 13a,b. The vibration rack was gradually moved above the target point while the angle of rotation was changed. Figure 13b,c show the process of inserting the vibration rods vertically into the concrete. Typically, the angle of rotation was not adjusted once. Figure 13c,d illustrate the vibration process that maintains the vibration rod motion without any other movement. Figure 13d,e illustrate the process of extracting the vibration rods that remain vertical and maintain a constant rotation angle. Figure 13e,f show the adjustment phase of the next cycle.

The detection results for the vibration rack’s rotation angle during this cycle are presented in Table 2. The results presented in Table 2 demonstrate that the algorithm proposed in this study exhibits a precision of ±2.51° (2.51° is half of the difference between the maximum value and the minimum value) when the vibration rack is kept constant.

4.2. Case Study

An enhanced intelligent vibration monitoring system was proposed, based on the method of monitoring the rotation angle of a vibration rack and IoT, as shown in Figure 14.

The system utilized the Global Positioning System (GPS) to acquire the vibrator’s position, employed an inclinometer to obtain the pitch angle of each robot arm and used an industrial camera to capture images of the rack. Additionally, to prevent obstructions during the rack’s rotation, this study proposed using a dual laser range finder for measuring the insertion depth of the vibration rods. The collected information was transmitted to the vehicle terminal via RS232 serial communication components or a direct connection and subsequently relayed to the remote server through radio transmission. The data then underwent processing based on specific algorithms and were ultimately visually displayed on the monitoring system’s screen.

The system was implemented at the construction site of a high-arch dam in southwest China, as shown in Figure 15. At the construction site, a camera was mounted on a platform on the right side of the vibrator’s cab. Other intelligent monitoring devices were installed at the corresponding positions on the concrete vibrator.

Figure 16 shows the cycle of concrete vibration during the field construction process. The real-time estimation method presented in this study encompasses target detection, keypoint localization and rotational angle estimation. Figure 16a–f depict the sequential processes involved in the adjustment of the vibration rack, insertion, extraction of the vibration rod and the subsequent cycle adjustment stage. These processes are associated explicitly with Figure 13.

The estimation results of the rack rotation angle indicated a fluctuation of approximately ±4° in the results, provided that the rotation angle of the vibration rack itself remained relatively stable.

The estimation value of the rotation angle in each second was obtained by taking the average value of x (where x represents the image transmitted within one second), aiming to mitigate significant errors that may arise during practical application. The angle estimation results under different fps in the process of Figure 16d,e are depicted in Figure 17. These findings demonstrated that a higher fps leads to a more stable angle estimation. This method fully utilized the high fps advantage of YOLOv8-Pose. Using the average value improved the detection accuracy of the algorithm and brought it closer to the real results. In the construction site, taking into account various factors such as the construction conditions, the information transmission volume and the airborne control terminal load, the data acquisition rate we chose was 4 fps.

5. Discussion

5.1. Comparison of Pose Estimation Algorithms

This study selected and compared several state-of-the-art pose estimation methods with significant current efficacies. During the comparison, the relevant parameters of each algorithm were adjusted to values suitable for detecting the keypoints in the vibration racks. The results of these comparisons are summarized in Table 3. In terms of detection accuracy (AP.5) and recall (AR), the improved YOLOv8-Pose had the best performance, with an improvement of 6.3%, 3.8%, 2.3% and 1.4% in accuracy compared with HRNet, HigherHRNet, YOLOv5-Pose and YOLOv8-Pose, respectively. In terms of the inference speed, HRNet and HigherHRNet both exhibited an inference time exceeding 100 ms for a single photo, whereas YOLOv5-Pose stood out as the swiftest with a mere 17.1 ms, closely followed by YOLOv8-Pose at only 21 ms and improved YOLOv8-Pose at 25.4 ms. The accuracy of rotation angle estimation for a vibration rack relies heavily on the keypoint detection algorithm, thus emphasizing the significance of precise keypoint detection in achieving accurate angle estimation. In terms of the number of parameters, the improved YOLOv8-Pose demonstrated improved detection accuracy with a relatively smaller increase in parameters while maintaining an acceptable inference speed (the method achieved a frame rate of 58, whereas YOLOv8-Pose achieved 39.4, meeting the real-time requirement of 35 fps). In summary, the method proposed in this study is a more reasonable choice for keypoint detection, provided it satisfies the real-time frame rate requirement.

Figure 18 displays the Precision–recall (P-R) curves for the algorithms, providing a comprehensive evaluation of their accuracy and recall rate. The area under the curve (AUC) between the P-R curves and the two coordinate axes serves as an indicator of algorithmic performance; a larger AUC indicates superior performance. The results demonstrate that the improved YOLOv8-Pose outperforms other algorithms.

5.2. Evaluation of the Angle Estimation Algorithm

Owing to the failure of the heading angle sensor, most observers estimated the angle of the vibration rack and manually input it into the monitoring system. However, this approach is time-consuming and labor-intensive, making it challenging to achieve continuous detection. In this study, a group of 30 volunteers was recruited from a construction site to visually estimate the rotation angle of the vibration rack from the camera’s perspective in the simulated environment. The resulting estimation errors are presented in Table 4. The experimental results showed that the average error of all manual visual estimations of the rack’s rotation angle was approximately 8.32°, the maximum error was 23.5° and the minimum error was 0°. Additionally, the average error of each individual was computed; the average error of the estimation value of the person who estimated the most accurately was 3.28°, and the average error of the estimation value of the person who estimated the most inaccurately was 15.5°. The manual estimation of the rotation angle of a vibration rack is subject to personal subjective factors, leading to uneven levels.

Based on a survey of volunteers, it was found that when manually estimating the rotation angle of a vibration rack, individuals tended to assign more sensitive values to the initially estimated picture, such as 0°, 30°, 45°, 60° and 90°. The guessed results of the pictures within a range of ±2° around the aforementioned sensitive values were selected for statistical analysis in this study. The statistical results indicated that the volunteers’ guessing accuracy for pictures near the sensitive values was higher than that for the other pictures. However, in reality, the actual likelihood of being in close proximity to a sensitive value is only 17.7%. In summary, the manual estimation method has a margin of error due to various factors that are difficult to reduce. Additionally, during the construction process, it was not feasible for the recorder to document the rotation angle of the vibration rack every second. Considering the large number of concrete vibrators present on-site, achieving a one-to-one correspondence for data, vehicles, time and position is another significant challenge for systematically evaluating the vibration quality at the site.

Table 5 lists the performance evaluations of this method and several comparable methods for comparative analysis.

Method 1, trained on simulation datasets, achieved a direct leap from 2D to 3D pose mapping. This approach is distinctive in that it is not constrained by the camera’s installation location. However, this advantage comes with a high dependency on a substantial amount of 3D datasets, and its estimation error is approximately 9.63°.

Method 2 was trained directly on 2D images to generate pose estimation results. In the application scenario of this method, the excavator’s arm moved in only two dimensions relative to the camera, making the process essentially a 2D pose estimation task. This is one of the reasons why the method’s error is only 1.7°. Furthermore, the applicability of Method 2 is limited to specific mechanical device types and camera mounting configurations, and any changes require a retraining of the method.

Compared with the manual estimation error, the proposed method in this study exhibited smaller maximum and average errors. However, the minimum error was not as good as the manual error.

The results demonstrate that the proposed method outperforms existing 3D estimation methods in terms of accuracy. The method’s advantage is further highlighted by its ability to record the rotation angle of the vibration rack in real time and establish one-to-one correspondences among the collected data, the vehicle, the timestamp and the position. Moreover, the method avoids the complex steps of constructing 3D pose datasets and calibrating the cameras. There is no need to retrain the algorithm when the camera’s installation position is adjusted, and no additional assistance from other sensors is required. These features reduce costs in application of the process to construction and greatly facilitate its deployment on construction sites.

5.3. Implementation of the Analysis and Limitations of This Work

This study proposed a method based on keypoint detection and the vanishing point theorems that enabled the estimation of the rotation angle of a vibration rack. Compared with the manual estimation approach, this method allows for the recording of vibration coverage for each individual instance. In terms of maintenance, a camera requires less upkeep than other sensors due to its placement on the vibrator’s body rather than the robot arm, which can often damage a sensor’s power supply wire. Nevertheless, this method encounters several challenges in practical implementation. Firstly, it is essential for the camera to possess an optimal field of view to comprehensively capture the entire sequence of actions, namely “insert–vibrate–pull out”, involving the vibration rack. Consequently, we conducted multiple experiments regarding the positioning of the camera during practical applications and ultimately decided to install it on the side of the driver’s cabin. Moreover, the YOLO-based algorithm operates as a multitarget recognition system. When multiple vibration racks are present within the camera’s field of view, the algorithm must contend with potential confusion among the targets. Thus, our approach filters detected targets so that only those closest, i.e., those with the largest target detection box, are included in the final results.

Although advanced, there are still limitations of the proposed method. Firstly, the accuracy of angle estimation using this method relies heavily on the detection precision of the keypoints. However, occlusion during detection is inevitable due to the selection of the vertex as the keypoint. Despite the ability of YOLOv8-Pose to preserve the positional relationships among the keypoints, the detection results will inevitably exhibit deviations. Secondly, the algorithm exhibits a minimal number of detection failures during the experiments. According to the statistical analysis, the probability of encountering an algorithm’s detection failure was approximately 0.3%. The reasons for this situation are mainly attributed to the intrinsic characteristics of the camera. When the connecting lines between the keypoints are perpendicular to the optical axis of the camera, they no longer converge. The estimation of the rotation angle is thus prevented because of the inability to derive a certain vanishing point.

The following aspects can be improved in future research. (1) Enhancing the accuracy of the keypoint detection algorithm and its ability to detect occlusion points necessitates further comprehension and improvement of deep learning algorithms. (2) Further refinement of the angle estimation algorithm by replacing the projection method with neural networks or other algorithms and establishing a relationship between the keypoint position and angle through extensive data training would enhance the algorithm’s robustness and generalization.

6. Conclusions

To enable the monitoring of the vibration coverage of a vibrator, this study proposed a computer vision-based estimation method for the rotation angle of the rack. The YOLOv8-Pose improved by the LSKA was utilized to detect the keypoints of the vibration rack. The estimation algorithm, based on the vanishing point theorem, which was self-created, continued to calculate the positional information of keypoints and derived an estimate for the rotation angle of the vibration rack. To reduce the labor cost in the verification stage for estimating the rotation angle of the rack, a simulation environment using Unity3D was employed to provide datasets for validation. This method utilized mathematical calculations in 3D space, eliminating the error of 8.5° associated with manual data acquisition and significantly reducing both labor and financial costs involved in data collection. Numerous simulation experiments and practical tests were conducted to evaluate this model. The YOLOv8-Pose model improved by LSKA achieved an AP.5 of 92.5% with a reasoning speed of 25.4 ms. Compared with YOLOv8-Pose, our proposed algorithm demonstrated a 1.4% increase in accuracy while only slightly increasing the calculation speed by 4.4 ms. Our rotation angle estimation method demonstrated an average error rate of 6.97° which was 2.66° lower than Mahmood’s method. In addition, the model performance was outstanding at construction sites, demonstrating that the model is a good method to monitor vibration coverage.

Author Contributions

Conceptualization, J.W. and B.R.; methodology, X.Z.; software, X.Z. and J.W.; validation, B.R., T.G. and X.Z.; formal analysis, B.R. and T.G.; investigation, X.Z.; resources, J.W. and B.R.; data curation, X.Z.; writing—original draft preparation, B.R. and X.Z.; writing—review and editing, B.R. and X.Z.; visualization, B.R. and T.G.; supervision, B.R.; project administration, B.R. and T.G.; funding acquisition, B.R. and T.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Bingyu Ren, grant number “National Natural Science Foundation of China: 52222907”; This research was funded by Tao Guan, grant number “National Natural Science Foundation of China: 52379131”.

Data Availability Statement

Some or all data, models, or code that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Gao, X.; Ji, M.; Hu, Y.; Wu, C.; Ruan, W.; Tan, Y.; Zheng, J. Determination of dam concrete strength parameters considering the effects of ambient environment, member size and aggregate size: A case study of Baihetan Dam. Constr. Build. Mater. 2024, 421, 135707. [Google Scholar] [CrossRef]
Aniskin, N.A.; Shaytanov, A.M. Optimization of the temperature and thermo-stressed state of a concrete dam constructed from particularly lean roller-compacted concrete. Buildings 2023, 13, 914. [Google Scholar] [CrossRef]
Vembu, P.R.S.; Ammasi, A.K.A. Comprehensive review on the factors affecting bond strength in concrete. Buildings 2023, 13, 577. [Google Scholar] [CrossRef]
Cao, G.; Bai, Y.; Shi, Y.; Li, Z.; Deng, D.; Jiang, S.; Xie, S.; Wang, H. Investigation of vibration on rheological behavior of fresh concrete using CFD-DEM coupling method. Constr. Build. Mater. 2024, 425, 135908. [Google Scholar] [CrossRef]
Chen, L.; Chen, Z.; Xie, Z.; Wei, L.; Hua, J.; Huang, L.; Yap, P.-S. Recent developments on natural fiber concrete: A review of properties, sustainability, applications, barriers, and opportunities. Dev. Built Environ. 2023, 16, 100255. [Google Scholar] [CrossRef]
Zhou, F.; Li, W.; Hu, Y.; Huang, L.; Xie, Z.; Yang, J.; Wu, D.; Chen, Z. Moisture diffusion coefficient of concrete under different conditions. Buildings 2023, 13, 2421. [Google Scholar] [CrossRef]
Torres, P.P.; Ghorbel, E.; Wardeh, G. Towards a new analytical creep model for cement-based concrete using design standards approach. Buildings 2021, 11, 155. [Google Scholar] [CrossRef]
Baek, J.; Kim, D.; Choi, B. Deep learning-based automated productivity monitoring for on-site module installation in off-site construction. Dev. Built Environ. 2024, 18, 100382. [Google Scholar] [CrossRef]
Wang, D.; Ren, B.; Cui, B.; Wang, J.; Wang, X.; Guan, T. Real-time monitoring for vibration quality of fresh concrete using convolutional neural networks and IoT technology. Autom. Constr. 2021, 123, 103510. [Google Scholar] [CrossRef]
Vahdatikhaki, F.; Hammad, A.; Siddiqui, H. Optimization-based excavator pose estimation using real-time location systems. Autom. Constr. 2015, 56, 76–92. [Google Scholar] [CrossRef]
Ye, F.; Shi, F.; Lai, Y.; Zhou, X.; Li, K. Heading angle estimation using rotating magnetometer for mobile robots under environmental magnetic disturbances. Intell. Serv. Robot. 2020, 13, 459–477. [Google Scholar] [CrossRef]
Gong, W.; Zhang, X.; Gonzalez, J.; Sobral, A.; Bouwmans, T.; Tu, C.; Zahzah, E.-H. Human pose estimation from monocular images: A comprehensive survey. Sensors 2016, 16, 1966. [Google Scholar] [CrossRef] [PubMed]
Chen, Y.; Tian, Y.; He, M. Monocular human pose estimation: A survey of deep learning-based methods. Comput. Vis. Image Underst. 2020, 192, 102897. [Google Scholar] [CrossRef]
Zhang, Y.; Wen, G.; Mi, S.; Zhang, M.; Geng, X. Overview on 2D human pose estimation based on deep learning. J. Softw. 2022, 33, 4173–4191. [Google Scholar]
Sun, X.; Shang, J.; Liang, S.; Wei, Y. Compositional human pose regression. In Proceedings of the 16th IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
Hua, G.; Li, L.; Liu, S. Multipath affinage stacked-hourglass networks for human pose estimation. Front. Comput. Sci. 2020, 14, 144701. [Google Scholar] [CrossRef]
Sun, K.; Xiao, B.; Liu, D.; Wang, J. Deep high-resolution representation learning for human pose estimation. In Proceedings of the 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
Jin, S.; Liu, W.; Ouyang, W.; Qian, C. Multi-person articulated tracking with spatial and temporal embeddings. In Proceedings of the 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
Mehta, D.; Sotnychenko, O.; Mueller, F.; Xu, W.; Sridhar, S.; Pons-Moll, G.; Theobalt, C. Single-shot multi-person 3D pose estimation from monocular RGB. In Proceedings of the 6th International Conference on 3D Vision (3DV), Verona, Italy, 5–8 September 2018. [Google Scholar]
Mehta, D.; Rhodin, H.; Casas, D.; Fua, P.; Sotnychenko, O.; Xu, W.; Theobalt, C. Monocular 3D human pose estimation in the wild using improved CNN supervision. In Proceedings of the International Conference on 3D Vision (3DV), Qingdao, China, 10–12 October 2017. [Google Scholar]
Wang, Z.; Nie, X.; Qu, X.; Chen, Y.; Liu, S. Distribution-aware single-stage models for multi-person 3D pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022. [Google Scholar]
Pavlakos, G.; Zhou, X.; Daniilidis, K. Ordinal depth supervision for 3D human pose estimation. In Proceedings of the 31st IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
Tekin, B.; Marquez-Neila, P.; Salzmann, M.; Fua, P. Learning to fuse 2D and 3D image cues for monocular body pose estimation. In Proceedings of the 16th IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
Zhang, Y.; Mi, S.; Wu, J.; Geng, X. Simultaneous 3D hand detection and pose estimation using single depth images. Pattern Recognit. Lett. 2020, 140, 43–48. [Google Scholar] [CrossRef]
Moon, G.; Chang, J.Y.; Lee, K.M. Camera distance-aware top-down approach for 3D multi-person pose estimation from a single RGB image. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
Chen, C.-H.; Ramanan, D. 3D human pose estimation = 2D pose estimation plus matching. In Proceedings of the 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
Cheng, Y.; Yang, B.; Wang, B.; Yan, W.; Tan, R.T. Occlusion-aware networks for 3D human pose estimation in video. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
Cano-Ortiz, S.; Iglesias, L.L.; Ruiz del Arbol, P.M.; Castro-Fresno, D. Improving detection of asphalt distresses with deep learning-based diffusion model for intelligent road maintenance. Dev. Built Environ. 2024, 17, 100315. [Google Scholar] [CrossRef]
Lee, J.G.; Hwang, J.; Chi, S.; Seo, J. Synthetic image dataset development for vision-based construction equipment detection. J. Comput. Civ. Eng. 2022, 36, 04022020. [Google Scholar] [CrossRef]
Wang, J.; Tan, S.; Zhen, X.; Xu, S.; Zheng, F.; He, Z.; Shao, L. Deep 3D human pose estimation: A review. Comput. Vis. Image Underst. 2021, 210, 103225. [Google Scholar] [CrossRef]
Tian, Z.; Yu, Y.; Xu, F.; Zhang, Z. Dynamic hazardous proximity zone design for excavator based on 3D mechanical arm pose estimation via computer vision. J. Constr. Eng. Manag. 2023, 149, 04023048. [Google Scholar] [CrossRef]
Papaioannidis, C.; Mygdalis, V.; Pitas, I. Domain-translated 3D object pose estimation. IEEE Trans. Image Process. 2020, 29, 9279–9291. [Google Scholar] [CrossRef] [PubMed]
Liu, S.; Sehgal, N.; Ostadabbas, S. Adapted human pose: Monocular 3D human pose estimation with zero real 3D pose data. Appl. Intell. 2022, 52, 14491–14506. [Google Scholar] [CrossRef]
Rogez, G.; Schmid, C. Image-based synthesis for deep 3D human pose estimation. Int. J. Comput. Vis. 2018, 126, 993–1008. [Google Scholar] [CrossRef]
Han, H.; Kim, H.; Bang, H. Monocular pose estimation of an uncooperative spacecraft using convexity defect features. Sensors 2022, 22, 8541. [Google Scholar] [CrossRef] [PubMed]
Qiao, S.; Zhang, H.; Meng, G.; An, M.; Xie, F.; Jiang, Z. Deep-learning-based satellite relative pose estimation using monocular optical images and 3D structural information. Aerospace 2022, 9, 768. [Google Scholar] [CrossRef]
Yang, X.; Sun, G. Simulation analysis of two kinds of algorithm of pose estimation based on hand-eye vision. Comput. Simul. 2012, 29, 168–170, 222. [Google Scholar]
Tang, H.-J.; Wen, J.; Ma, C.-W.; Zhou, R.-K. A comparative study on model-based pose estimation of flying objects with different feature descriptors. In Proceedings of the International Symposium on Photoelectronic Detection and Imaging 2011—Space Exploration Technologies and Applications, Beijing, China, 24–26 May 2011. [Google Scholar]
Zhao, J.; Cao, Y.; Xiang, Y. Pose estimation method for construction machine based on improved AlphaPose model. Eng. Constr. Archit. Manag. 2022, 31, 976–996. [Google Scholar] [CrossRef]
Chen, C.; Zhu, Z.; Hammad, A. Automated excavators activity recognition and productivity analysis from construction site surveillance videos. Autom. Constr. 2020, 110, 103045. [Google Scholar] [CrossRef]
Kim, H.; Bang, S.; Jeong, H.; Ham, Y.; Kim, H. Analyzing context and productivity of tunnel earthmoving processes using imaging and simulation. Autom. Constr. 2018, 92, 188–198. [Google Scholar] [CrossRef]
Zhang, J.; Gong, K.; Wang, X.; Feng, J. Learning to augment poses for 3D human pose estimation in images and videos. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 10012–10026. [Google Scholar] [CrossRef]
Assadzadeh, A.; Arashpour, M.; Li, H.; Hosseini, R.; Elghaish, F.; Baduge, S. Excavator 3D pose estimation using deep learning and hybrid datasets. Adv. Eng. Inform. 2023, 55, 101875. [Google Scholar] [CrossRef]
Wen, L.; Kim, D.; Liu, M.; Lee, S. 3D excavator pose estimation using projection-based pose optimization for contact-driven hazard monitoring. J. Comput. Civ. Eng. 2023, 37, 04022048. [Google Scholar] [CrossRef]
Li, J.; Liu, Y.; Wang, L.; Sun, Y. A vision-based end pose estimation method for excavator manipulator. Multimed. Tools Appl. 2024, 83, 68723–68741. [Google Scholar] [CrossRef]
Geng, Z.; Sun, K.; Xiao, B.; Zhang, Z.; Wang, J. Bottom-up human pose estimation via disentangled keypoint regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Virtual, 19–25 June 2021. [Google Scholar]
Cheng, B.; Xiao, B.; Wang, J.; Shi, H.; Huang, T.S.; Zhang, L. HigherHRNet: Scale-aware representation learning for bottom-up human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020. [Google Scholar]
McNally, W.; Vats, K.; Wong, A.; McPhee, J. Rethinking keypoint representations: Modeling keypoints and poses as objects for multi-person human pose estimation. In Proceedings of the 17th European Conference on Computer Vision (ECCV), Tel Aviv, Israel, 23–27 October 2022. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the 29th Annual Conference on Neural Information Processing Systems (NIPS), Montreal, QC, Canada, 7–12 December 2015. [Google Scholar]
Maji, D.; Nagori, S.; Mathew, M.; Poddar, D. YOLO-Pose: Enhancing YOLO for Multi person pose estimation using object keypoint similarity loss. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022. [Google Scholar]
Lau, K.W.; Po, L.-M.; Rehman, Y.A.U. Large separable kernel attention: Rethinking the large kernel attention design in CNN. Expert Syst. Appl. 2024, 236, 121352. [Google Scholar] [CrossRef]
Mahmood, B.; Han, S.; Seo, J. Implementation experiments on convolutional neural network training using synthetic images for 3D pose estimation of an excavator on real images. Autom. Constr. 2022, 133, 103996. [Google Scholar] [CrossRef]

Figure 1. The concrete vibrator and the positions of individual sensors.

Figure 2. The framework for estimating the rotation angle of a vibration rack.

Figure 3. A vibration rack’s keypoints. Among them, Points 5, 6, 9 and 10 are located on the opposite side of the vibration rack, corresponding, respectively, to Points 4, 3, 8 and 7.

Figure 4. Large Separable Kernel Attention.

Figure 5. Improved YOLOv8-Pose framework.

Figure 6. Vanishing points and vanishing lines in central projection.

Figure 7. Logic of rotation angle estimation.

Figure 8. Principle of recognizing the rotation angle of the vibration rack: (a) vanishing point diagram; (b) angle solution diagram.

Figure 9. Relative positional relationships of the models.

Figure 10. Simulation environment.

Figure 11. Comparison between the actual vibration rack’s position recorded by the sensor and the vibration rack’s position in the simulation.

Figure 12. Experimental data statistics: (a) the first test; (b) the second test; (c) the third test; (d) all the tests.

Figure 13. Vibration rack rotation angle detection process in the simulation: (a) move; (b) stop; (c) insert; (d) vibrate; (e) pull out; (f) move.

Figure 14. Enhanced intelligent vibration monitoring system.

Figure 15. The construction site.

Figure 16. Vibration rack rotation angle detection process in reality: (a) move; (b) stop; (c) insert; (d) vibrate; (e) pull out; (f) move.

Figure 17. Angle estimations of different fps.

Figure 18. Precision–recall (P-R) curves.

Table 1. Validation results of the rotation angle detection algorithm for the vibration rack.

Experiment No.	∠AOB			Rotation Angle of the Vibration Rack
Experiment No.	Average Error	Max Error	Min Error	Average Error	Max Error	Min Error
1	2.93	8.47	0.64	7.63	17.00	0.53
2	2.35	7.65	0.05	6.90	13.63	0.76
3	2.45	7.43	0.52	6.43	12.66	0.56
Sum	2.57	8.47	0.05	6.97	17.00	0.53

Table 2. The algorithm’s results for detecting the rotation angle of the vibration rack during a single-cycle process.

	Image a	Image b	Image c	Image d	Image e	Image f
Real angle (°)	0	22	22	22	22	29
Estimated angle (°)	1.85	21.80	23.02	23.72	26.82	29.79
Difference (°)	1.85	−0.2	1.02	1.72	4.82	0.79

Table 3. Comparison of the results of pose estimation algorithms.

Algorithm	Size (Pixels)	Weight Size (MB)	#params (M)	AP.5 (%)	AR (%)	Speed (ms)
HRNet-W32	640	55.3	28.4	86.2	83.7	>100 ms
HigherHRNet-W32	640	55.5	28.6	88.7	86.3	>100 ms
YOLOv5m-Pose	640	41.6	21.3	90.2	90.1	17.1
YOLOv8m-Pose	640	50.7	26.4	91.1	91.5	21
LSKA-YOLOv8-Pose (our)	640	64.7	33.7	92.5	92.6	25.4

Table 4. Personal estimations.

Angle Type	Personal Max Error	Personal Ave Max Error	Personal Ave Min Error	Ave Error
All	23.5	15.5	3.28	8.32
Sensitive values (±2°)	16	9.33	2.5	5
Other values	23.5	12.83	3.15	8.47

“All” indicates the statistical results of the guessed value for the rotation angle of the vibration rack. The term “sensitive value” refers to the volunteers’ guesses for pictures taken at angles of 0°, 30°, 45°, 60° and 90°, while “other values” refer to the volunteers’ guesses for pictures taken at non-sensitive angles.

Table 5. Comparison of the end pose estimation methods.

Method	Source	Camera’s Fixed Position	Any Other Sensor	Accuracy (°)	Pose Estimation	3D Training Dataset
Manual	/	/	No	8.32	3D	No
Method 1	Mahmood et al. [52]	Outside car	No	9.63	3D	Yes
Method 2	Li et al. [45]	On car	Yes	1.70	2D	No
This study	/	On car	No	6.97	3D	No

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ren, B.; Zheng, X.; Guan, T.; Wang, J. Vibrator Rack Pose Estimation for Monitoring the Vibration Quality of Concrete Using Improved YOLOv8-Pose and Vanishing Points. Buildings 2024, 14, 3174. https://doi.org/10.3390/buildings14103174

AMA Style

Ren B, Zheng X, Guan T, Wang J. Vibrator Rack Pose Estimation for Monitoring the Vibration Quality of Concrete Using Improved YOLOv8-Pose and Vanishing Points. Buildings. 2024; 14(10):3174. https://doi.org/10.3390/buildings14103174

Chicago/Turabian Style

Ren, Bingyu, Xiaofeng Zheng, Tao Guan, and Jiajun Wang. 2024. "Vibrator Rack Pose Estimation for Monitoring the Vibration Quality of Concrete Using Improved YOLOv8-Pose and Vanishing Points" Buildings 14, no. 10: 3174. https://doi.org/10.3390/buildings14103174

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Vibrator Rack Pose Estimation for Monitoring the Vibration Quality of Concrete Using Improved YOLOv8-Pose and Vanishing Points

Abstract

1. Introduction

2. Related Works

2.1. Deep Learning-Based Pose Estimation

2.2. Application and Development of Simulations in Pose Estimation

2.3. Monitoring Method of Mechanical Construction Processes

3. Methodology

3.1. Rotation Angle Estimation Model Based on Improved YOLOv8-Pose

3.1.1. Selection of Keypoints

3.1.2. YOLOv8-Pose Model

3.1.3. Keypoint Detection Model

3.1.4. Estimation of the Rotation Angle of the Vibration Rack

3.2. Construction of the Simulation Environment

4. Experiment and Applications

4.1. Simulation

4.2. Case Study

5. Discussion

5.1. Comparison of Pose Estimation Algorithms

5.2. Evaluation of the Angle Estimation Algorithm

5.3. Implementation of the Analysis and Limitations of This Work

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI