1. Introduction
The remote six-degrees-of-freedom (6-DOF) pose, i.e., the position and orientation estimation of a cooperative target plays an important role in many industries. It helps to achieve accurate and efficient machine operations and, therefore, the 6-DOF pose measurement has been widely used in autonomous robotics, precision machining, on-orbit servicing, and spacecraft docking [
1]. A few of the most widely used sensors for remote 6-DOF pose measurement include GPS [
2], laser tracker [
3], Lidar [
4], and stereo-camera [
5]. However, as
Table 1 shows, all these sensors have their own disadvantages.
Monocular vision is one of the most commonly used 6-DOF pose measurement methods for pose measurement of a cooperative target. It has low hardware complexity, high accuracy, and wide measurement range. However, due to its inherent limitations, the measurement accuracy along the optical axis of the camera is generally considerably lower than that in the other two directions [
6]. This low accuracy is one of the significant challenges of obtaining high precision measurements with the monocular vision-based system.
Recently, significant works have been carried out on monocular vision-based algorithms. According to the target attributes, those works can be classified into two categories: cooperative and noncooperative target measurement. The former problem is also known as the perspective-n-point (
PnP) problem and consists of pose determination of a calibrated monocular camera from
n correspondences between 3D reference points and their 2D projections [
7]. In the past few decades, researchers have invested a large amount of research effort to solve the
PnP problem [
8].
There are two types of strategies for solving the
PnP problem: iterative and noniterative. Most of the strategies of the latter type [
9,
10] obtain the 3D positions of points in the camera coordinate system using efficient linear operations [
11,
12]. However, their noise resistance is poor, especially when the number of 3D points
n ≤ 5. In 2011, Li et al. [
12] proposed a perspective similar triangle (
PST) method that reduced the number of unknown parameters using a geometric constraint. Kneip et al. [
13] proposed a novel closed-form solution to the
P3P problem. Fischler et al. [
14] divided the
n-point problem into 3 point subsets and eliminated the multiplicity by checking the consistency. Zhi et al. [
15] proposed an easy-to-implement linear algorithm for a 4 point problem. Lepetit et al. [
11] proposed the first linear-complexity method that used four virtual control points with weights to represent the 3D reference points. Li et al. [
12] also proposed a noniterative
O(n) method, named the
RPnP method, which could robustly obtain the optimum solution by solving a seventh order polynomial.
The iterative methods are more accurate than the noniterative methods. Among the former methods, the classical methods formulated the
PnP problem into a nonlinear least-squares problem [
16], and then solved it using iterative optimization methods. However, the performance of these methods depended heavily on initialization, due to which they could easily get trapped into local minima, resulting in poor accuracy, especially in the absence of redundant points, i.e.,
n ≤ 6.
The fusion of monocular camera with other sensors, i.e., Lidar, 2D LRF, and 1D LRF, has become a new research hotspot, because the sensor fusion technology can integrate the advantages of different sensors. The majority of research in this area is on the fusion of 3D Lidar and monocular cameras. Meng et al. [
17] presented a navigation framework for mobile robots with a monocular camera and 3D Lidar. The V-LOAM [
18] simultaneously refined the motion estimation and point cloud registration by combining monocular visual odometry and subsequent Lidar scans matching. LIMO [
19] proposed a depth extraction algorithm from Lidar measurements for camera feature tracks and estimating motion using a robust keyframe-based bundle adjustment.
A significant amount of research combines a camera and a 2D LRF. Frueh et al. [
20] developed a set of data processing algorithms for generating textured facade meshes of cities utilizing a series of laser scanners and digital cameras data. Bok et al. [
21] presented a novel approach based on a hand-held LRF and camera fusion sensor system to reconstruct a massive-scale structure. Haris et al. [
22] proposed a fusion scheme combining the accuracy of laser sensors with the broad visual fields of cameras for extracting accurate scene structure information.
Recently, many systems and applications combining camera and 1D LRF for measurements were proposed. Most of the literature on 1D LRF and camera fusion used the 1D LRF to compensate the scale information lost by the camera. Zhang et al. [
23] synthesized line segments obtained from 1D LRF and line features extracted from a monocular camera for SLAM. Riccardo et al. [
24] used a 1D LRF enhanced monocular system for scale correct pose estimation. Zhuang et al. [
25] presented a method for fusing a 1D LRF and a monocular camera to restore unknown 3D structures and 6-DOF camera poses. This method could overcome a known limitation of the absolute scale of monocular cameras. Ordonez [
26] proposed an approach that combines a camera and a 1D LRF to estimate the length of a line segment on an unknown plane.
In this paper, we focus on the compensation and correction of monocular camera 6-DOF pose measurement results using a high precision 1D LRF. We propose a feasible and straightforward remote and precise 6-DOF pose measurement algorithm based on the fusion of 1D LRF and monocular camera for cooperative targets. The
SRPnP [
7] method is used to obtain the initial 6-DOF pose of the cooperative target in the camera coordinate system, and accurate depth information obtained by the 1D LRF is used to correct the initial 6-DOF pose based on our proposed 1D LRF and monocular camera fusion strategy. Subsequently, an optimization model was proposed based on the nonlinear optimization method [
27] to optimize the 6-DOF pose and parameters of the fusion system synchronously. The extrinsic parameters of each sensor are critical to the accuracy of multi-sensor data fusion results. Therefore, we also propose an extrinsic parameters calibration pipeline for invisible 1D LRF and camera.
The main contributions of this paper are as follows:
A novel remote and ultra-high precision 6-DOF pose measurement method for cooperative target based on monocular camera and 1D LRF fusion is proposed.
A complete calibration pipeline for the camera and 1D LRF fusion system is proposed. To the best of our knowledge, this work is the first method to calibrate the parameters of the camera and the 1D LRF that is not visible under the camera.
A novel optimization model based on the nonlinear optimization method is presented to optimize measured 6-DOF pose results and system parameters synchronously.
The accuracy and usability of the proposed method are verified by numerical and real-world experiments. The experimental results show that the proposed method completely eliminates the large measurement error along the optical axis of the camera in monocular vision 6-DOF pose measurement. In addition, our experimental prototype achieves ultra-high accuracy. The maximum errors of translational and rotational DOF are less than 0.02 mm and 15″, respectively, at a distance of 10 m.
The rest of this paper is organized as follows:
Section 2 analyzes the 6-DOF pose measurement accuracy with monocular vision and proves that the accuracy in the sight direction is exceptionally low. The system overview of our approach and the definition of the coordinate systems are outlined in
Section 3.
Section 4 illustrates the proposed algorithm in detail. The experiments are presented in
Section 5, along with comprehensive comparisons and analysis. The conclusion is provided in
Section 6.
2. Accuracy Analysis of Pose Measurement with Monocular Vision
There are many factors that affect the accuracy of measuring cooperative targets via monocular vision. These factors include the measurement distance, the size of the cooperative target, the calibration accuracy of the camera, the extraction accuracy of the centroids of feature points, etc. We consider the example of a
P3P problem for analyzing the theoretical derivation of position accuracy in each direction of monocular pose measurement based on the work carried out by Zhao et al. [
28].
Figure 1 shows a calibrated camera and three known 3D points,
A,
B, and
C. These points are projected onto the normalized image plane as
a,
b, and
c [
8]. One way to solve the
P3P problem is to use simultaneous equations based on the law of cosines to first calculate the distances between three spatial points
A,
B, and
C and the camera’s optical center. This is then followed by the determination of the locations of spatial points in the camera coordinate system according to the coordinates of projection points
a,
b, and
c on the image plane. Without loss of generality, we choose points
A and
B for analysis.
Based on the work in [
28], we assume that the coordinate of point
A in the camera coordinate system is
. We further assume that the coordinate of point
a in the image plane has an error
. In addition, we define the focal length of camera is as
f. According to the imaging theorem, the errors of point
A in the
X and
Y directions in the camera coordinate system are
We introduce another space point,
, to analyze the error in the
Z direction. When
A and
B are simultaneously moving along the
Z direction, and we assume the error of line segment
is
; then, according to [
28], we obtain
Based on Equations (1)–(3), we can intuitively conclude that the error along the Z-axis is significantly higher than the errors along the other two axes. The position error along the Z-axis is directly proportional to the quadratic distance between the camera and the object. In contrast, the errors in the other two directions are directly proportional to the distance. Thus, when the measurement distance is significantly larger than the size of cooperative object, the translational error along the optical axis is higher than the errors along the other two axes.
We demonstrate the aforementioned difference in errors by means of an example: We consider a cooperative object of length
, the distance between the camera and object is 10 m, and the extraction error of the centroids of feature spots points is
. The pixel size is 5.5 μm, the image size is 4096 × 3072 pixels, and the focal length of the camera is
f = 600 mm. Using these parameters in Equations (1)–(3), we can find that the translational measurement accuracy along the optical axis of the camera is approximately 50 times lower than that along the other two axes. The actual experimental analysis is presented in
Section 5.3.
4. 6-DOF Pose Estimation
In this section, we will systematically present several key algorithms of our proposed method, including system calibration, data fusion, and system optimization.
4.1. Parameters Calibration
The proposed fusion system calibration includes two parts, as mentioned in
Section 3.3. Most of the existing calibration methods [
1,
29,
30] cannot be directly applied to the relationship between the
BCS and the
TCS, because the LRF uses invisible light. Thus, we introduce a laser tracker (Leica AT960) for the system parameter calibration. We assume that the accurate intrinsic matrix and radial distortion of the camera have been obtained using Zhang’s algorithm [
31].
4.1.1. Extrinsic Parameters between LCS and BCS
As illustrated in
Figure 3a, the laser beam of the 1D LRF can be represented as a half-line with the origin
in the
BCS, and we define the direction of the half-line by a unit norm vector
. To reduce the number of parameters, we can use the origin
and unit norm vector
to represent the relationship between the
LCS and the
BCS; therefore, the extrinsic parameters of the laser tracker beam are as follows:
Here, all vectors and points are expressed in the BCS.
We move the corner cube prism along the direction of the LRF laser beam several times. The position of the corner cube prism measured by AT460 and the measured values of the LRF are all recorded each time. As the laser beam is invisible, the feedback signal of the LRF can be used to ensure that the corner cube prism is along the laser beam. The direction of the laser beam in the ASC can be calculated via straight-line fitting when we obtain all laser spots’ coordinates. Subsequently, the origin in the ASC can be determined based on the ranging values measured by the LRF each time.
As
Figure 3a shows, the four calibrated control points on the camera housing can be tracked using the Laser tracker to determine the conversion relations
. Thus, the extrinsic parameters of the
LCS to the
BCS are
4.1.2. Extrinsic Parameters between CCS and BCS
As illustrated in
Figure 3a, similar to the camera housing, there are also four control points on the cooperative target that can be used to determine the relationship
between the
ACS and the
TCS. The camera can determine the relationship
between the
CCS and the
TCS. Based on the relationship
relating the
ASC to the
BCS and the relationship
relating the
ACS to the
TCS, we can calculate the conversion relations
between the
BCS and the
TCS as the following expressions:
where
represents matrix inversion.
Now, using the relationship
between the
CCS and the
TCS, which are obtained by the camera and the relationship
between the
BCS and the
TCS, we can calculate the relationship
between the
CCS and the
BCS as
Based on Equations (6) and (7), an initial value of the extrinsic parameters between the CCS and the BCS can be obtained experimentally. Next, we will use more data to iteratively optimize the initial parameter values.
We define the pose relationship matrix between the
CCS and the
TCS derived from
and
as
. According to the coordinate system conversion relations,
can be calculated by
According to the 6-DOF pose given in Equation (8), we can obtain six re-projection imaging points
on the image plane with the help of the imaging theorem and the calibrated parameters. Then, we can calculate the Euclidean distance between each re-projection imaging point and its corresponding actual imaging point
, namely the reprojection error. A total of
measurements are acquired by changing the azimuth and angle of the cooperative target and recording the results after each change. We utilize an iterative optimization scheme for all measurements by minimizing the reprojection errors of six LED points on the cooperative target. The following objective function is optimized:
where
represents the intrinsic matrix of the camera,
and
are the extrinsic parameters relating the
CCS to the
BCS,
represent the coordinates of six LED points in the
TCS, and
is the projection functions of six LED points based on the pose relationship between the
CCS and the
TCS .
The rotation matrix
is parametrized using the Cayley–Gibbs–Rodriguez (CGR) theorem [
32]. It can be expressed as a function of the CGR parameters
, as follows:
Subsequently, the problem will be transformed into an unconstrained optimization problem. By solving Equation (9) based on the Levenberg–Marquardt optimization method [
33,
34], we can obtain accurate calibration parameters of the intrinsic matrix of the camera
, extrinsic parameters relating the
CCS to the
BCS , and the coordinates of six LED points in the
TCS ;
Figure 3b shows the results of applying the proposed calibration procedure to our setup comprising a camera and an LRF.
In this subsection, we obtained the conversion relations that transfer the initial 6-DOF pose measured by the camera to the BCS, as well as calculated the origin of the LRF and the direction of the laser line in the BCS. In the following subsection, we will design an algorithm to obtain the ultra-high precision 6-DOF pose of the cooperative target.
4.2. Pose Acquisition and Fusion
The accurate depth
measured by the LRF can be directly used by the fusion algorithm. We use a series of image processing algorithms to detect the centroid of the imaging points
of six LEDs. The different steps of this algorithm include image filtering, circle detection, and centroid acquisition [
35]. The coordinates of six LEDs under the cooperative target coordinate system (
TCS) are denoted by
.
As
Figure 3a shows, we have six cooperative points in the
TCS and their corresponding image coordinates in the
ICS. The calculation of the relative 6-DOF pose between the camera and the cooperative target belongs to the
PnP problem mentioned in
Section 1.
The corresponding relationship between
and
is given by the following expression:
where
is the intrinsic matrix of the camera, and
is a depth scale factor. The rotation matrix and the translation vector relating the
CCS to the
TCS measured by the camera based on the
SRPnP method are denoted by
and
, respectively. Subsequently, we transfer them to the
BCS using the following expressions:
where
RB meets the orthogonal constraint of the rotation matrix and has three degrees of freedom. Thus, we obtain the initial 6-DOF pose and accurate depth value between the
BCS and the
TCS. Our goal is to use the depth value
to correct the
component of the 6-DOF pose.
However, as illustrated in
Figure 4, the laser beam has an inevitable width. In other words, the laser spot is diffuse. This means that the laser corner cube prism does not necessarily lie on the straight line
defined by the center of the laser beam and, therefore, the depth value of LRF cannot be directly used. However, we are certain that the laser corner cube prism is on the laser spot.
Without loss of generality, we assume that the laser spot is a plane that passes through the point
, which is located in the center line of laser beam
. The line
can be represented by an origin
and a unit norm vector
. We can obtain
and
by calibrating the transformation parameters of the
BCS and the
LCS. Thus, the equation of line
can be expressed as
We can easily observe that the direction vector of the plane is parallel to the direction vector
of line
. In addition, the distance from the origin of the
LCS to the plane is the actual measurement value of the LRF. According to the above conditions, the equation of the plane can be solved by the following simultaneous equations:
We know that the 6-DOF pose of the cooperative target consists of the rotational DOF as well as the translational DOF . They can define a line segment whose starting and ending coordinates are the origins of the BCS and TCS, respectively. As mentioned earlier, the other values in the 6-DOF pose are relatively accurate, except for the translational DOF along the -axis of the BCS.
We utilize the vector
to denote the initial translational DOF of the cooperative target. The measured initial origin and the precise origin of the cooperative target are represented as
and
, respectively. Thus, the translational DOF measurement error along the
-axis is as follows:
According to the previous discussion, we know that
Therefore, as shown in
Figure 4, within the allowable error range, we can assume that the precise origin of the cooperative target
(red point) lies on the line defined by the initial origin of the cooperative target
(black point) and the origin of the
BCS. This signifies that the starting coordinate and the direction of the line segment
are accurate. On the contrary, the endpoint coordinate has a large error in the
-axis. In other words, the precise origin of the cooperative target
has an uncertainty of
.
Let us represent the slopes of the line
by a unit vector
. Thus, the equation of the line
is
Subsequently, we obtain the straight-line parameters by substituting into Equation (18).
Now, we know that the precise origin of the cooperative target
is not only on the straight-line
, but also on the plane of the laser spot
. Therefore, as shown in
Figure 4, our fusion strategy becomes a mathematical problem to find the intersection point (red point) of a spatial plane (blue curve) and a spatial line (green line) in the
BCS, expressed as follows:
By solving Equation (19), we can obtain the precise coordinates of the origin of the cooperative target. So far, we have corrected the initial 6-DOF pose using the accurate depth value of LRF.
4.3. Bundle Adjustment
We utilize an optimized scheme for each correction result to further improve the accuracy. This scheme minimizes the combination of the error of the initial depth and the corrected depth as well as the reprojection errors of the six LED points of the cooperative target as follows:
where
represents the coordinates of the imaging points of six LEDs in the image coordinate system, and
represents the projection functions of six LED points. A weight parameter is denoted by
, which should be adjusted to normalize the variance of camera and LRF residue distributions. Generally, its value is fixed as
.
Minimizing Equation (20) as a nonlinear optimization problem based on the Levenberg–Marquardt method, we will obtain an accurate 6-DOF pose of the cooperative target in the BCS.
5. Experimental Results
A series of experiments are performed in this section to validate the sensor fusion method proposed in this paper and evaluate the performance of the 6-DOF pose measurement system. First, we introduce the experimental setup in
Section 5.1. Then, in
Section 5.2, we describe the fusion system calibration experiment. Last, in
Section 5.3, simulation experiments are described that are used to evaluate the effects of image noise and the size of cooperative target on the measurement accuracy. In addition, details of actual experiments carried out to evaluate the 6-DOF pose measurement accuracy of our fusion system are provided.
Table 2 shows the parameters of the fusion sensor that was built according to the proposed fusion method. We can observe that it has an impressive 6-DOF pose measurement precision.
5.1. Experimental Setup
Figure 5 shows the layout of our experiment and our proposed fusion sensor. The cooperative target and the fusion system are placed approximately ten meters apart, with the Leica Absolute Tracker (AT960) in the middle. The 6-DOF pose change of the cooperative target is controlled using the motion platform. The AT960 can obtain the actual 6-DOF position using the control points on the cooperative target and the camera housing. As the positional accuracy of AT960 can reach 15 μm, its measurements are defined as true values.
Table 3 shows the relevant parameters of the camera and LRF of our proposed system.
5.2. System Calibration Experiment
In this subsection, we describe the experiments conducted on the calibration method described in
Section 4.1. First, the intrinsic parameters of the camera were calibrated using Zhang’s algorithm [
31].
Table 4 shows the intrinsic parameters and radial distortion of the camera.
Next, based on the laser beam calibration method mentioned in
Section 4.1.1, we moved the corner cube prism along the direction of the LRF’s laser beam. Subsequently, the AT460 and LRF were used to record the 3D coordinates and depth values of 10 points, and then these coordinates are transferred to the
BCS. Then, we used those 3D coordinates to fit a line based on the least-squares method. Subsequently, we calculated the origin of the LRF by combining it with the depth values of each point.
Table 5 shows the origin and direction of the LRF semi-line.
Last, during the calibration of the extrinsic parameters relating the CCS to the BCS, we changed the azimuth and angle of the cooperative target to obtain 120 poses. After moving the cooperative target each time, the monocular camera system captured the images, calculated the relative pose, and saved the data. Simultaneously, the laser tracker measured and saved the 6-DOF pose relating the BCS to the TCS. As mentioned earlier, we can consider that the measurements of the AT460 are the true values of the relative 6-DOF pose between the BCS and the TCS.
We could obtain an initial value of the extrinsic parameters between the
CCS and the
BCS using a set of data based on Equation (8). Then, we inputted other data into Equation (9) and solved the nonlinear optimization problem based on the Levenberg–Marquardt iterative optimization method. As shown in Equation (9), the intrinsic parameters of the camera, the extrinsic parameters between the
CCS and the
BCS, and the coordinates of the LED points in the
TCS were all optimized.
Table 6 shows the calibration results of the extrinsic parameters between the
CCS and the
BCS.
5.3. Simulation Experiments
It is known that the centroid extraction accuracy of the feature points in the cooperative target and its size strongly influence the pose measurement accuracy. Therefore, we verified the influence of these two factors on our proposed 6-ODF pose measurement method through simulation experiments.
We used MATLAB R2019a software on Windows 10 to carry out the simulation experiments. To represent a realistic measuring environment, we fixed the relevant simulation system parameters equal to the calibration values of our system.
Table 7 shows the true pose between the
TCS and the
BCS used in the simulation experiments.
First, we validated the influence of centroid extraction error on 6-ODF pose measurement accuracy. Throughout the experiment, we compared the calculated 6-DOF pose values with the real pose values. The difference between both values is the measurement error of our proposed multi-sensor fusion-based 6-DOF pose measurement method.
We added
Gaussian noise to the centroid of the extracted LED image points, where
is the standard deviation of the Gaussian distribution. We increased
gradually from 0.01 to 0.36 pixels. For each noise level, we performed 100 independent experiments. The errors between the measured value and true poses are shown in
Figure 6. The red horizontal lines represent the median, and the borders of the boxes represent the first and third quartiles, respectively. The measurement results marked with a red “+” symbol represent high noise in the simulation. It can be observed from
Figure 6 that the measurement errors in all directions increase with the increase in noise level. When the centroid extraction noise is 0.15 pixels, the translational and rotational DOF errors in all directions are approximately 0.02 mm and 15″, respectively.
We also analyzed the influence of the size of the cooperative target on the pose measurement accuracy. The size was changed gradually from 50 mm to 150 mm. We performed 100 independent experiments for each size of the cooperative target, and the results are shown in
Figure 7. We can see that, in our proposed multi-sensor fusion-based 6-DOF pose measurement method, the changing size of the cooperative target slightly impacts the translation vector, but significantly impacts the attitude rotation angle.
5.4. Real Experiments
We also evaluated the performance of the proposed fusion system by conducting actual experiments, in which multiple static measurements were taken at the same pose, and measurements were tracked at various poses.
First, the cooperative target was kept stationary, and we performed 30 6-DOF pose measurements using the proposed fusion system and laser tracker (AT460) simultaneously. Then, we calculated the initial and final 6-DOF pose. The initial 6-DOF pose values were measured directly by the monocular camera, and the corresponding final 6-DOF pose values were obtained using our proposed fusion method. As mentioned earlier, we assume that the results measured by the laser tracker are the true values. We subtracted those true values from both the initial and final 6-DOF pose values. The corresponding differences are the measurement errors of the initial and final poses.
As shown in
Figure 8, compared with the initial pose, we can observe that the final pose is improved in both translational and rotational DOF. In particular, the accuracy of translational DOF in the
z-axis shows a 50-fold increase corresponding to the calculations in
Section 2. The maximum errors of
,
,
,
,
, and
are 0.0130 mm, 0.0114 mm, 0.0016 mm, 6.634″, 5.452″, and 11.113″, respectively. The reduction in the translational DOF error in the z-axis is mainly due to our fusion strategy, while the reduction in other degrees of freedom error is mainly due to the contribution made by our bundle adjustment optimization in
Section 4.3.
We also conducted tracking measurement experiments at different poses to check the accuracy of our proposed fusion system. We place the cooperative target approximately 10 m away from our fusion system and acquired 30 frames of measurement data by changing the azimuth and angle of the cooperative target. The final 6-DOF pose obtained by the proposed method over 30 measurements are calculated and optimized based on the methods proposed in
Section 4.
We use (
,
,
,
,
, and
) to represent the 6-DOF pose of the cooperative target measured by the proposed fusion system, and (
,
,
,
,
, and
) to represent the true 6-DOF pose measured by the laser tracker AT460.
Figure 9 shows the measured value, true value, and error over 30 tracking measurements drawn in the form of line graphs. The blue curve represents the value measured by the proposed system, the black curve represents the true value measured by the laser tracker, and the red curve is the absolute value of the difference between these two values, namely the measurement error of our system.