1. Introduction
According to the Ministry of Transport of the People’s Republic of China, the total number of highway bridges surpassed one million at the end of 2022 in China [
1]. These bridges include large-scale structures that span seas, rivers, and lakes, as well as bridges of various sizes. As the operation time of these bridges increases, issues related to their structural health and carrying capacity have become increasingly prominent [
2,
3]. Regular operation and maintenance management of bridges are essential to ensure traffic safety, extend the lifespan of bridges, and reduce maintenance costs. There are several methods available to obtain structural dynamic parameters that can reflect the health condition of bridges. One commonly used approach is the eigen perturbation method [
4], which enables the extraction of modal parameters. In addition to analytical methods, manual inspections remain essential for certain tasks such as crack inspections, settlement detection, bridge alignment checks. However, traditional manual maintenance methods are expensive, time-consuming, and subjective and relies on inspectors to obtain all information accurately [
5].
Recently, the three-dimensional (3D) laser scanning technique has opened up new possibilities for bridge health monitoring and maintenance [
6]. The terrestrial laser scanner (TLS) can acquire 3D point cloud data (PCD) of bridges in a short period of time. PCD provides a comprehensive and accurate representation of bridges, allowing for a detailed assessment of its current condition. By analyzing the acquired PCD, valuable insights can be gained to guide maintenance efforts and ensure the longevity and safety of the bridge structure. Furthermore, PCD can be leveraged to create building information modelling (BIM) models. A BIM model provides a digital representation of a bridge and can be valuable for bridge maintenance and digital twins [
5,
7]. To obtain PCD of the entire bridge, it is necessary to perform multiple scans from different locations because of the limited view of TLS. However, each scan has its own coordinate system, making it essential to align all the scans into a unified coordinate system in order to generate an accurate 3D model of the bridge. Because the accuracy of the 3D model is heavily influenced by the registration accuracy, PCD registration therefore becomes a crucial preprocessing step for TLS-based bridge health monitoring and maintenance.
Although some researchers have studied registration methods of bridge PCD, they mainly focused on pairwise registration [
8,
9]. However, to align multiple scans of a bridge to a unified coordinate system, the utilization of multi-view registration techniques is essential. In engineering practice, artificial markers such as target spheres [
10] and target papers [
11] have been commonly used to assist registration and improve registration accuracy when dealing with multiple scans. Although the registration results using these artificial markers are highly reliable, placing artificial markers is time-consuming and costly [
12], especially for large infrastructure like bridges. Extensive research has been conducted on marker-free multi-view registration techniques. However, when it comes to applying these techniques to the registration of bridge PCD, there are two notable challenges that need to be addressed. The first challenge involves recovering the overlaps between unordered multiple scans. This typically requires extensive pairwise matching and the creation of a fully connected graph that encompasses all the scans [
13]. However, when deal with TLS scans of large bridges, the process of extensive pairwise matching can become quite time-consuming. The second challenge is related to determining the merging order of the scans. Typically, the creation of a fully connected graph that encompasses all the scans is necessary to establish the merging order [
14]. However, when dealing with a large number of scans, creating such a fully connected graph can be inefficient. To tackle this challenge, Wu et al. [
12] proposed a method to simplify this heavy registration task by subdividing all the scans into several scan-blocks. In this way, the number of pairwise registration procedures is significantly reduced. However, they constructed scan-blocks according to the scanning order, which is sometimes impractical in bridge scenes. Because bridges are essential components of urban transportation infrastructure, minimizing the disruption to urban traffic during the scanning process is crucial. As a result, the PCD of bridge scans cannot always be registered in a sequential order. Manual specification of the registration sequence becomes necessary in such cases. However, for large bridges, where dozens of scans may be acquired, the manual specification process can become quite tedious and inefficient.
To address these issues, this paper proposes a template-guided hierarchical multi-view registration framework that can register unordered bridge terrestrial laser scanning data without any artificial targets. Firstly, the overlaps of multiple scans are recovered using a template-guided initial pose estimation method and extensive pairwise matching is avoided. Next, all the scans are partitioned into different scan-blocks based on their locations and overlaps. Subsequently, the pairwise coarse registration is conducted in each scanning block, and the transformations are obtained using an intelligent optimization algorithm. After that, the fine registration is then performed to further refine the poses within each block. Finally, the above steps are repeated between blocks until all scans are merged into a unified common coordinate system. The main contributions of the proposed method are as follows:
- (1)
A marker-free multi-view registration framework is proposed to hierarchically align unordered bridge terrestrial laser scanning data.
- (2)
A template-based initial pose estimation method is proposed to recover the overlaps of unordered PCD, which avoids extensive pairwise matching and improves the efficiency.
- (3)
To group scans with high overlaps into the same block, a graph partition algorithm based on the overlaps and scanning locations is utilized to construct scan-blocks.
3. Methodology
The proposed method aims to addresses the challenge of automatically aligning unordered scans of bridges without any makers. The overview of the registration framework is plotted in
Figure 1. For unordered scans, a template-guided approach is employed in
Section 3.1 to recover the overlaps. Subsequently, all the scans are partitioned into different scan-blocks based on their scanning locations and overlaps in
Section 3.2. Within each block, pairwise coarse registration is conducted using an intelligent optimization algorithm, as described in
Section 3.3. Finally, the fine registration is performed to further refine the poses within each block in
Section 3.4. These steps are iteratively executed between blocks until all scans are merged into a unified common coordinate system.
3.1. Template-Guided Initial Pose Estimation
Typically, bridges are long and symmetrical structures, and their geometric features on the side view are distinct and can be leveraged to approximate the location of each scan. By comparing the geometric features of the side view of an individual scan of the entire bridge, the approximate position of that scan along the bridge’s length can be determined. Therefore, using the as-designed side view as the template, an initial pose estimation method is proposed.
3.1.1. Acquisition of Side View Geometric Features
The side view geometric features of each scan can be obtained by projecting the PCD into a binary image along the direction perpendicular to the traffic direction of a bridge. First, the traffic direction of a bridge can be determined using the principal component analysis (PCA) algorithm. Considering the non-uniformity of scans, only the points that form the minimum convex hull of two-dimensional (2D) PCD on the
xOy plane are utilized in the PCA algorithm. Second, the PCD is rotated so that its traffic direction aligns with the
x axis, as shown in
Figure 2a. Then, the 2D PCD can be obtained by projecting the PCD into the
xOz plane. Next, the 2D PCD is voxelized using the grid sizes
δx and
δz (their calculation methods will be introduced later), and then it is converted to a binary image with each grid corresponding to a pixel. If the number of points within a grid is more than one, the grey value of the corresponding pixel is set to 1; Otherwise, it is set to 0. The binary image of the PCD in
Figure 2a is shown in
Figure 2b.
3.1.2. Unified Scale of the Template and Binary Images
We employ an as-designed side view as the registration template. To ensure that the scale of the template is same as the binary images converted by PCD, the grid sizes
δx and
δz are determined using the following two formulas.
where
L and
H represent the length and height of the bridge, respectively;
Nx and
Ny are the width and height of the minimum bounding box of the non-blank region in the template.
3.1.3. Image Matching
Image matching algorithms can be utilized to obtain the approximate positions of scans relative to the bridge. Based on the utilized primitives, image-matching algorithms can be classified as area-based matching (ABM) or feature-based matching (FBM) algorithms [
24]. ABM is based on the idea that grey values of pixels of conjugate points have similar radiometric characteristics [
25], while FBM is based on feature extraction, feature description, and correspondence feature matching. We opted for the ABM for image matching, which has high efficiency and reliability [
26]. Given two images, ABM considers one as the reference image, the other one as the matching image. The matching image slides within a search window on the reference image, and a similarity measure is calculated at each position. The location of the matching image is assumed to be the position of the best agreement [
27]. In our study, the registration template is set to the reference image and the search window is the entire image. Binary images are the matching images. An example of image matching is shown in
Figure 3a. Considering that the image matching step can only determine the positions of PCDs but cannot adjust their orientations, the matching score between the horizontally mirrored image of the binary image and the registration template is also calculated. The case with the higher matching score is selected as the final result. Based on the matching relationship between each scan and the registration template, the relative positional relationships on the
xOz plane between different scans can be obtained, as shown in
Figure 3b. The alignment in the
y-axis direction is determined based on the centres of the scans.
3.1.4. Correction of False Matching
In some cases, same geometric patterns may appear multiple times in a bridge, which may result in false matching. As shown in
Figure 4, the geometric elements within the red boxes appear four times. When the coverage area of a scan is small, these repeated patterns may cause false matching.
To address this issue, a method for identifying and correcting false matching is proposed. A good scanning plan should provide a uniform coverage of a bridge while avoiding over-coverage in any particular area. When a scan is matched to a false location, the scan is considered “redundant” for the false location because it results in an over-coverage for the false location. Therefore, we can quantify the redundancy of a scan by comparing it with its adjacent scans. For the
ith scan, its projected image is denoted as
Ii, and its areas intersecting with all the other scans are calculated based on the locations obtained in the template matching step. The
n scans with the largest intersection areas are selected, their projected images denoted as
I1,
I2, …,
In are utilized to calculate the redundancy score
Si for the
ith scan using Formula (3).
where
A and
B are the height and width of
Ii, respectively; (
) denotes the Hadamard product. The scores for all the scans are then statistically analyzed to calculate the coefficient of variation. A higher coefficient of variation indicates a greater discreteness of scans, meaning false matching may be present. The top 20% scans with the highest scores are selected as potentially mismatched PCDs, awaiting further validation.
Considering that there may be differences between the as-designed side view and the actual state of the bridge, the similarity between a scan with its adjacent scans in the right location will be higher than the similarity between the scan with its adjacent scans in the false location. Therefore, further validation of whether a scan is mismatched can be determined based on its positional relationship with adjacent scans. For the
ith scan which is labelled as potentially mismatched, three locations with the top three highest matching scores in the template matching step are considered as candidate positions. All the candidate positions are the pixels with the highest matching scores in a certain neighbourhood area. For each candidate position, the intersection areas with other scans that have not been labelled as mismatched are calculated. Three scans with the largest intersection areas are selected, and their projected images are added to the registration template with a proportion of 5% to modify the template, as shown in
Figure 5. The matching score between the modified registration template and the
ith scan is computed at each candidate position. The candidate position with the highest matching score is selected as the final position for the
ith scan.
3.2. Overlap-Based Scan-Block Construction
The main focus of our work lies in the efficiency of processing large bridges, such as suspension bridges, which may require dozens of TLS scans. The fully connected graph that arises from all these scans can be quite large, making the registration task computationally intensive. To address this challenge, a hierarchical registration strategy can be employed. This strategy can recursively subdivide and fuse the heavy registration task into smaller, more manageable subsets. By breaking down the registration process into hierarchical levels, the number of pairwise registration procedures can be significantly reduced [
12]. In more detail, the hierarchical registration strategy involves partitioning all the scans into different scan-blocks and then locally aligns the scans in each scan-block, and finally performs the global block-to-block registration. The local registration accuracy of each scan-block plays a vital role in determining the overall registration accuracy of all the scans. To improve the registration accuracy, it is crucial to ensure significant overlaps within each block. Therefore, the block partitioning step plays a critical role in improving both registration precision and efficiency.
Partitioning scan-blocks must take into account both the scanning locations and overlaps between scans, and the overlaps of two scans can be qualified by the overlaps of their bounding boxes. The normalized cut (
Ncut) algorithm [
28] is utilized in our work for scan-block construction. First, a fully connected graph is created with each scan treated as a node. The edge weight
wij between the nodes
vi and
vj is defined as
where Δ
x represents the ratio between the difference in
x-coordinate values of the scanner positions and the width of the bridge, and Δ
z the ratio between the interpolated
z-coordinate values and the height of the bridge.
IoU is defined as the intersection ratio of the bounding boxes of two PCDs, i.e.,
Bj and Bi in Equation (5) represent the bounding boxes of two scans.
The objective of the
Ncut algorithm is to minimize the cut between different blocks while maximizing the sum of the edge weights within each block. The objective function can be formulated as
where
Ai represents the
ith scan-block,
the complement of
Ai,
k the total number of blocks, and
cut(
Ai,
Aj) the cut between
Ai and
jth scan-block
Aj, i.e.,
vol(
Ai) represents the sum of degrees of each node within
Ai, i.e.,
Based on the normalized Laplacian matrix, the objective function can be simplified, and the Rayleigh–Ritz theorem can be employed for the solution. Finally, the partitioning results can be obtained using the
k-means clustering algorithm [
29]. Using the TLS scans of a suspension bridge as an example, the scan-block construction result is shown in
Figure 6, where the numbers within the purple circles indicate the scanner positioned above the bridge deck, while the numbers within the black circles represent the scanner positioned below the bridge deck. It can be observed that our block construction method is capable of grouping scans with closer proximity and higher overlap into the same block.
3.3. Pairwise Coarse Registration by Optimization Algorithms
The pairwise coarse registration is commonly formulated as the maximum consensus set (MCS) problem [
12]. Given two scans,
P and
Q, the point
pi in
P is paired with its nearest neighbour
qj in
Q, forming a point pair (
pi,
qj). The set of point pairs is represented as
H = {(
pi,
qj)}
1k, where
k is the number of point pairs. If the distance between
pi and
qj is less than a threshold
δ, the point pair is considered as a true correspondence. The MCS problem [
30] aims to find the transformation matrix corresponding to the maximum number of true correspondences, i.e.,
The feature-based coarse registration method is the most commonly used coarse registration method [
20]. It first extracts key points from two PCDs, and computes features of key points. Then a subset of key points is randomly selected from one PCD, and their corresponding points are searched in the other PCD using the feature similarity. After that, a transformation matrix can be obtained based on the matched point pairs, along with the number of true correspondences, which is called a consensus set. This process is usually iterated multiple times, and the transformation matrix associated with the highest number of true correspondences, which is called the maximum consensus set, will be output as the final transformations. However, existing feature extraction methods are susceptible to variations in point density and noise, and it makes the above methods less robust [
20]. Considering that all the scans already have rough relative position relationships after the template matching step, performing pairwise coarse registration on the basis of this can significantly reduce the search space. We employ the particle swarm optimization (PSO) algorithm to search for the maximum consensus set.
The PSO algorithm is an iterative intelligent optimization algorithm that relies on collaboration and information sharing among particles to search for the optimal solution. During the searching process, each particle records its current position as well as its historical best solution, and the population also records its historical best solution. Based on the historical best solutions, the positions and velocities of the particles are updated, enabling the particle swarm to iteratively evolve and converge towards the global optimum. In this study, the particles denote the combinations of
R and
T, and each particle is described by speed
vij and position
xij, which are updated by Equations (10) and (11).
where
i is the particle number and
j is a variable dimension;
vij(
t) and
vij(
t + 1) are the particle speeds at times
t and
t + 1, respectively;
xij(
t) and
xij(
t + 1) are the particle positions at times
t and
t + 1, respectively;
pij is the historical optimal solution of the current particle; and
pgj is the historical optimal solution of the swarm;
c1 and
c2 are acceleration constants and both set to 2;
w is the inertia weight and is set to 0.8;
r1 and
r2 are random numbers in the closed interval [0,1].
For the initial pose shown in
Figure 7a, the PSO algorithm can obtain an accurate transformation, as shown in
Figure 7b. Although the PSO algorithm can avoid becoming trapped in local optima, there may still be cases where two scans within the same scan-block cannot obtain the correct transformation matrix due to low overlap or slightly larger scanning distances. To identify false matches, Wu et al. [
12] proposed a method based on the loop closure constraint and proved to be effective. This method is utilized in this study to reject false scan-to-scan matches obtained by the PSO algorithm.
3.4. Fine Registration and Pose Optimization
To obtain the optimal merging order of scans within each scan-block, the minimum spanning tree (MST) is utilized to extract a cycle-free and well-pairwise-registered graph [
30]. The MST relies on edge weights to define the shortest path. In this study, the weight of each edge is defined as the number of true correspondences after the pairwise coarse registration, and the weights of the edges with false matches are set to 0. Along the edges of the MST, all the other nodes in a scan-block can be merged into the root node using the coarse registration matrix calculated by the PSO algorithm and the fine registration matrix computed by the ICP algorithm [
31]. However, the errors will accumulate along the edges from the root to the leaf node since the MST is a cycle-free graph [
13]. To address the issue of error accumulation, the Lu–Milios algorithm [
32] is utilized to further optimize the pose.
Treating each scan-block as a new scan, the registration between scan-blocks can be accomplished using the same methods described in
Section 1,
Section 2,
Section 3 and
Section 4. This process is repeated until all scans are merged into a single scan. To avoid excessive point growth, PCD down-sampling is performed after the scan merging step.
4. Experiments and Analysis
4.1. Datasets Description and Evaluation Criteria
We evaluated the performance of the proposed method using two bridge point cloud datasets, including a suspension bridge and a continuous rigid frame bridge. The suspension bridge named Cuntan Yangtze River Bridge is located in Chongqing with a total length of approximately 1.6 km and a width of about 38 m (
Figure 8a). It was scanned using Leica P40 with a ranging error of 1.2 mm + 10 ppm and an angular accuracy of 8″ [
33]. The continuous rigid frame bridge named Huanghuayuan Jialing River Bridge is located in Chongqing with a total length of approximately 1.2 km and a width of about 31 m. It was scanned using Faro S350 with a ranging error of 1 mm between 10 m to 25 m and an angular accuracy of 19″ [
34]. More details about the two datasets are listed in
Table 1.
For the Huanghuayuan Jialing River Bridge, only the area below the bridge deck was scanned using TLS, while the bridge deck area was scanned using a mobile scanning system. Prior to inputting the data into the algorithm, background points were roughly removed. In addition, the ground truth result was obtained through manual registration.
The performance of the method is evaluated by the axis-angle rotation error
er and translation error
et of all transformations among the scans [
35], which are as follows:
where
Re and
te represent the estimated rotation matrix and translation vector, respectively, and
Rg and
tg are the those of the ground truth. In addition, the successful registration rate (SRR) is also utilized and defined by
where
N represents the total number of scans, and
Ns the number of successfully aligned scans. A scan is considered successfully aligned when its rotation error and translation error are both less than the specified thresholds
σr and
σt, respectively.
4.2. Results of Template-Guided Initial Pose Estimation
Based on the bridge datasets, we evaluated the accuracy and efficiency of the template-guided initial pose estimation method. The proposed method is implemented in Python through an Intel Core i7-7700K CPU (Intel, Santa Clara, CA, USA). The initial pose estimation results of the Cuntan Yangtze River Bridge and Huanghuayuan Jialing River Bridge are shown in
Figure 9a and
Figure 9b, respectively. It can be seen that this step can roughly align the scans and obtain the relative scanning positions.
The average rotation error, average translation error and running time for two bridges are listed in
Table 2. For the Cuntan Yangtze River Bridge, the average rotation error is 19.1 mdeg, and the average translation error in the three coordinate axis directions are 0.66 m, 3.62 m, and 1.88 m, respectively. For the Huanghuayuan Jialing River Bridge, the average rotation error is 19.7 mdeg, and the average translation error in the three coordinate axis directions are 0.64 m, 5.16 m, and 2.35 m, respectively. Due to the alignment in the
y-axis direction being solely based on the centres of the PCD, translation errors are more pronounced in the
y-coordinate.
The running times of the two bridges are 6.08 min and 4.43 min, respectively. Compared with extensive pairwise matching, template-guided initial pose estimation has a higher efficiency, primarily due to two key reasons. Firstly, in our method, only N template matching is required to create a graph, where N represents the number of scans. In contrast, extensive pairwise matching necessitates matches. This reduction in the number of matches significantly improves the efficiency of our approach. Secondly, in template-guided initial pose estimation, the problem scale in each individual match is determined by the number of points in the scan, denoted as n. The fundamental operation is mapping each point to a 2D grid. As a result, the time complexity for processing a single scan in the initial pose estimation step is O(n), and the time complexity for processing all scans is O(Nn). The quadratic time complexity ensures that our method has high efficiency and is adaptable to larger datasets. In summary, the data showed that our method can achieve a relatively high alignment accuracy, provide good initial poses for subsequent steps, and it helps to improve efficiency by avoiding extensive pairwise matching.
4.3. End-to-End Performance Evaluation
During the experiments, we set the average number of scans in a scan-block to 5, the down-sampling voxel size to 0.1 m, and σr and σt to 100 mdeg and 100 mm, respectively. The experimental setup utilized a hybrid programming approach, with the template-guided initial pose estimation step implemented in Python and the remaining parts in C++. Our method was tested on a laptop with 32 GB RAM and an Intel Core i7-7700K CPU.
The registration results for the two bridges are plotted in
Figure 10. The points are coloured according to the difference in height. It can be seen that the mismatching in
Figure 9 has been removed. This demonstrates that our method can deal with PCD of large bridges, and obtain a good performance in accuracy.
The accuracy and efficiency of the proposed method was evaluated, and the rotation and translation errors with the root mean square error (RMSE), SRR, and running time are listed in
Table 3. The average rotation errors of our method are 0.96, 0.74 (mdeg) and average translation errors are 28.04, 43.25 (mm), with the SSR of 100% and 100%. The results show that our method achieves relatively small rotation errors, but slightly larger translation errors. Compared with the Cuntan Yangtze River Bridge, the Huanghuayuan Jialing River Bridge has a relatively simple geometric shape and lacks complex features and structures, resulting a larger translation error. In conclusion, the registration results listed above prove that our method performs well in registering the TLS scans of varying bridges, and the accuracy can satisfy the requirements of component extraction and 3D reconstruction.
4.4. Discussion
Although our method has demonstrated good performance, it still has two limitations. Firstly, prior to registration, the manual removal of background points is required. This step can become cumbersome when dealing with a large number of scans. Therefore, future research will focus on developing automatic background-point removal techniques. Secondly, our method is currently only applicable to bridges, which limits its application. Future research will aim to expand this method to more diverse scenarios and environments.