Multi-Tag Fusion Localization Method Based on Geometric Constraints

Liu, Zhuojun; Qin, Wengang; Li, Zexing; Zhou, Guofeng

doi:10.3390/app14135480

Open AccessArticle

Multi-Tag Fusion Localization Method Based on Geometric Constraints

¹

College of Engineering, Shanghai Ocean University, Shanghai 201306, China

²

College of Mechanical and Power Engineering, Shanghai Jiao Tong University, Shanghai 200240, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(13), 5480; https://doi.org/10.3390/app14135480

Submission received: 18 May 2024 / Revised: 21 June 2024 / Accepted: 21 June 2024 / Published: 24 June 2024

(This article belongs to the Special Issue Challenges in the Guidance, Navigation and Control of Autonomous and Transport Vehicles)

Download

Browse Figures

Versions Notes

Abstract

In environments where Global Navigation Satellite System (GNSS) signals are unavailable, our proposed multi-tag fusion localization method offers a robust solution for the precise positioning of vehicles or robots. During our research, we observed variations in the positioning information estimated from tags located at different positions within the same frame. Our goal was to extract reliable positioning information from this noisy data. By constructing geometric constraints, our method introduces an outlier factor to quantify the differences between tags. After effectively eliminating outliers, we enhanced the Kalman filter framework to accommodate the fusion of data from two or more tags, with the outlier factor dynamically adjusting the observation noise during the fusion process. The experimental results demonstrate that, even under the influence of motion and obstacles, our method maintains position errors within a 3 cm range and orientation errors within 3°. This indicates that our method possesses high positioning accuracy and stability.

Keywords:

localization; visual tag; data fusion; geometric constraints; AprilTag

1. Introduction

Localization is fundamental to vehicle planning, control, and subsequent tasks, essential for the integrity of vehicle functions [1,2,3]. The most common positioning technology is Global Navigation Satellite System (GNSS); however, not all scenarios offer ample and reliable signals, such as urban canyons, parking lots, tunnels, and indoor areas [4,5]. Visual tag positioning technology has become increasingly prevalent across a spectrum of applications, including robotics [6,7] and Unmanned Aerial Vehicles (UAVs) [8,9], offering innovative solutions for the precise positioning of intelligent transport systems. By employing tags with predefined coordinate information, this technology achieves centimeter-level accuracy in positioning, which is crucial for various mobile platforms, including but not limited to automotive applications.

Despite the low cost and capability for absolute positioning of visual tags [10,11], they still face issues of inaccuracy and instability. During the localization process utilizing tags, the detection quality is significantly influenced by the distance and rotation angle between the camera and the tag [12]. The precision of the system is also impacted by the number and position of the tags within the image. Furthermore, lighting conditions can affect the accuracy of tag detection, particularly for tags like AprilTag [13]. It is crucial that the tag size is substantial enough to offer adequate information while remaining within the camera’s field of view [14]. There are three main approaches to address these issues: improving the detection accuracy of individual tags, fusing individual tag data with other sensor information, and increasing the number of tag detections.

To enhance positioning accuracy, some researchers have focused on improving marker detection accuracy. Two processing techniques were proposed by [15], including a method that can filter out very inaccurate detections resulting from partial border occlusion, and a new highly accurate method for edge refinement. Based on observations, ref. [16] proposed improvements include trigonometric yaw angle correction for camera alignment, real-time tag center tracking via a custom yaw-axis gimbal, and a novel pose-indexed probabilistic sensor error model for AprilTag utilizing Gaussian Processes and validated by particle filter tracking. In the process of vehicle movement, ref. [17] found that the marker detection error ranged from 0.028 m to 0.13 m. To reduce pose jitter, they designed a strategy to constrain the inner and outer corner points of the markers, limiting the relative error between markers to within 0.05 m.

Meanwhile, fusing marker information with other data can also improve positioning accuracy. Ref. [13] described an algorithm that improves the pose estimation accuracy of square fiducial markers in difficult scenes by fusing information from RGB and depth sensors. Ref. [18] utilized particle filtering techniques to fuse data from ArUco markers and sonar sensors, achieving effective integration of multi-source sensor information. Ref. [19] introduced a variational Bayesian estimator for mobile robot localization with unknown noise covariance, utilizing tags as landmarks and an onboard camera. By treating the unknown measurement noise covariance as a random matrix following an inverse Wishart distribution, the method simultaneously estimates the robot pose. Ref. [20] proposed an on-manifold extended Kalman filter for tag-based visual–inertial localization, addressing the topological structure of rotation and pose groups and associated uncertainty propagations. Ref. [21] combined the robustness of makers to environmental uncertainty with motion data to minimize the required number of tags. An Iterative Extended Kalman Filter (IEKF) is employed to fuse visual and motion information, and an area detection algorithm is introduced to accelerate AprilTag processing.

While single-tag localization techniques have seen significant advancements, they encounter challenges when marker observation is obstructed or incomplete during vehicle movement [22]. This highlights the necessity for a method that can integrate information from multiple markers, especially in environments with partial occlusions or dynamics. Ref. [23] designed an integrated vision and LIDAR robot localization method using two AprilTag tags for pose estimation, proving the stability of the proposed method over single-tag approaches.

The third category of methods, which we focus on, involves detecting multiple tags simultaneously and fusing their data. For example, Ref. [24] averages observations from multiple markers and applies a constant velocity motion model for Kalman Filtering (KF) but does not adequately account for variability among different markers. Ref. [17] integrates markers into SLAM and applies the 3σ principle to filter outliers, which may be insufficient with fewer than ten markers as it defaults to ranking pose estimates and selecting the median, potentially not reflecting true poses if the observation distribution is skewed. Ref. [25] fuses IMU data with multiple markers using Extended Kalman Filtering (EKF), which can negatively impact localization accuracy when kinematic predictions significantly deviate from actual conditions. The optimization algorithm presented in [26], fusing multiple ArUco marker data based on the Mahalanobis distance, shares conceptual similarities with our approach but also reports trajectory discontinuities due to marker detection errors, especially when markers are distant or partially occluded. In the aforementioned filtering methods, observation noise is often determined by trial and error. However, ref. [27] discovered that the observation noise of pose from planar fiducial markers varies across the measurement range, a fact that must be considered during sensor fusion for a reliable estimate. They present experimental measurements of fiducial markers in real and simulation scenarios for 2D pose estimation and propose analytical functions that approximate the variances of pose estimates.

This paper proposes a multi-tag fusion localization method based on geometric constraints. The approach begins with detecting visual markers to acquire relative pose information between the target and the markers. Following coordinate transformations, the global pose of the target in the world coordinate system is computed. To ensure data accuracy and reliability, geometric constraints are constructed for all observations, effectively eliminating outliers. Subsequently, the filtered and reliable observations are incorporated into a multi-observation fusion framework utilizing a modified KF. This integration facilitates the effective fusion of multiple observations, ultimately leading to precise positioning information.

Our main contributions are as follows:

(1): A linear fusion method using only a camera: This method is simple, low-cost, and avoids complexities and potential inaccuracies associated with motion model-dependent predictions.
(2): Introducing an outlier factor based on geometric constraints: The outlier factor quantifies the variability among different tags, excluding outliers before fusion and dynamically adjusting observation noise during the fusion process, reducing the likelihood of initial noise settings being unreasonable.

The paper consists of several parts. Section 2 introduces common visual tags and provides a detailed description of the AprilTag’s detection process. Section 3 describes the outlier handling method before data fusion, focusing on the construction of geometric constraints. Section 4 outlines the fusion approach, utilizing a modified KF for integrating data from multiple tags. Section 5 summarizes the experimental setup and presents the results and analysis. The final section concludes this paper with a summary of the key findings.

2. Visual Tag Process

Visual tags serve as objects in the image that provide key reference points and measurement points, and they have a known size and shape. Common visual tags include ARTag [28], AprilTag [29], ArUco [30], and Stag [31], as shown in Figure 1. Ref. [32] conducted an experimental comparison of these four tags, evaluating them comprehensively in terms of accuracy, recognition rate, and computational cost. The experimental results showed that ARTag has a low computational cost but performs poorly in the detection rate of a single marker; AprilTag, Stag, and ArUco exhibit excellent performance in terms of direction and position; however, AprilTag has a high computational cost and is sensitive to motion blur; Stag and ArUco are sensitive to smaller marker sizes and larger distances; and the computational cost of ArUco increases with the number of markers. Given the diversity of visual tags and their differences in characteristics, it is necessary to weigh the selection of tags based on the needs and scenarios of the actual application.

2.1. AprilTag3 Detection

In the pursuit of a pose estimation method well-suited for our application, AprilTag3 [33] has been identified as the optimal choice, predominantly attributable to its outstanding capabilities in delivering precise orientation results and elevated detection rates. AprilTag3, a sophisticated iteration within the AprilTag family, stands out for its open-source accessibility, high real-time processing capabilities, and enhanced accuracy.

This advanced version has undergone significant optimization, resulting in a more robust algorithmic framework. Notably, these enhancements have led to substantial gains in computational efficiency and a reduction in memory usage, making AprilTag3 an attractive proposition for resource-intensive applications. Additionally, AprilTag3 supports an innovative format of two-dimensional codes, expanding its utility and applicability.

As depicted in Figure 2, the algorithmic process begins with an image filtering step that downscales the initial image to accelerate subsequent operations. This is followed by adaptive thresholding, which generates a binarized image that clearly delineates the tag boundaries against the background.

The subsequent edge detection phase employs the union-find algorithm to segment the image into meaningful components, effectively filtering out irrelevant sections and thus enhancing the precision of the detection process. Quadrilateral fitting then identifies potential tags from clustered line segments through a calculated sequence of operations.

To address perspective distortion, a perspective correction step is integrated, utilizing a transformation matrix to reconcile the disparity between the tag’s anticipated and actual poses. The final stage involves image sharpening to mitigate blurriness, enabling the clear decoding of the tag ID and the determination of the tag’s relative position to the camera.

2.2. Coordinate Transformation

To ascertain the global coordinates of a target within a scene, it is imperative to perform transformations across multiple coordinate systems. This process engages several defined coordinate systems: the world coordinate system (W), tag coordinate system (T), camera coordinate system (C), and the self or robot coordinate system (M), all adhering to the right-hand rule for orientation. As illustrated in Figure 3, the X-axis is represented in red, the Y-axis in green, and the Z-axis in blue.

Firstly, the relative pose between the tag and the camera, denoted as

P_{C}

, is transformed into the relative pose between the tag and the target, denoted as

P_{M}

:

P_{M} = R_{C}^{M} P_{C} + T_{C}^{M}

(1)

In the equation,

R_{C}^{M}

represents the rotation matrix from the self-coordinate system to the camera coordinate system, and

T_{C}^{M}

is the installation position of the camera. Subsequently, the coordinates of the target P in the tag coordinate system are obtained as

P_{T}

:

P_{T} = R_{C}^{T} P_{M} + T_{C}^{T}

(2)

R_{C}^{T}

is the rotation matrix between the tag coordinate system and the camera coordinate system. During the initial deployment of the tags, each tag is assigned a unique identifier, and the world coordinates of the center of the tag are represented by the matrix

T (i d)

:

T (i d) = {[x, y, z]}^{T}

(3)

Finally, the coordinates of the target P in the world coordinate system are calculated as

P_{W}

:

P_{W} = R_{T}^{W} P_{T} + T (id)

(4)

In this formula,

R_{T}^{W}

is the rotation matrix from the tag coordinate system to the world coordinate system. The aforementioned rotation matrices

R_{C}^{M}

,

R_{C}^{T}

,

R_{T}^{W}

are derived from the Equation (5):

R = R_{x} R_{y} R_{z}

(5)

R_{x}

,

R_{y}

, and

R_{z}

, are the rotation matrices calculated from the corresponding angles,

ϕ

,

θ

and

ψ

representing the rotation around the X-axis, Y-axis, and Z-axis, respectively.

R_{x} = [\begin{matrix} 1 & 0 & 0 \\ 0 & \cos ϕ & \sin ϕ \\ 0 & - \sin ϕ & \cos ϕ \end{matrix}]

(6)

R_{y} = [\begin{matrix} \cos θ & 0 & - \sin θ \\ 0 & 1 & 0 \\ \sin θ & 0 & \cos θ \end{matrix}]

(7)

R_{z} = [\begin{matrix} \cos ψ & \sin ψ & 0 \\ - \sin ψ & \cos ψ & 0 \\ 0 & 0 & 1 \end{matrix}]

(8)

An image containing N tags, once processed through tag detection, yields N distinct relative poses. After undergoing coordinate transformation, these relative poses result in N global poses of the target in the world coordinate system at the same moment. These global poses are then utilized as observation values for subsequent fusion steps. The observation values include the two-dimensional coordinates in the world coordinate system and the orientation angle.

3. Construction of Geometric Constraints

When multiple tags are detected in an image, the positioning data extracted from different tags possess varying degrees of uncertainty. This implies that some observed values may have significant errors, and these outliers can substantially affect the final positioning outcome. As depicted in Figure 4, at the same moment, observed values i, j, and k are obtained. Under ideal circumstances, observed values i, j, and k should coincide with the true position. However, due to factors such as location and lighting, deviations occur between the observed values and the actual position. In statistical terms, an outlier is defined as one or several values within a dataset that differ significantly from the rest, constituting abnormal data. This paper constructs geometric constraints to measure the variability among observed values, thereby identifying and eliminating outliers.

Consider any two observed values, i and j. The coordinates of observed value i in the world coordinate system are

[x_{i}, y_{i}, z_{i}]

, and the angle it makes with the X-axis is ∠a. The coordinates of observed value j in the world coordinate system are

[x_{j}, y_{j}, z_{j}]

, and the angle it makes with the X-axis is ∠b.

In the XOY plane, the difference in distance between any two observed values,

l_{i j}

, can be calculated as

l_{i j} = \sqrt{{(x_{i} - x_{j})}^{2} + {(y_{i} - y_{j})}^{2}}

(9)

To calculate the angular difference

d_{i j}

, first determine the angles ∠a and ∠b:

\{\begin{cases} a = \arctan (\frac{y_{i}}{x_{i}}) \\ b = \arctan (\frac{y_{j}}{x_{j}}) \end{cases}

(10)

Then, find the difference in angles:

d_{i j} = |a - b|

(11)

If there are a total of N observed values, by combining them in pairs to construct geometric constraints and calculating the corresponding distances

l_{i j}

and angular differences

d_{i j}

,

G (N)

combinations are needed in total. The number of combinations is given by the binomial coefficient:

G (N) = C_{N}^{2} = \frac{N!}{(N - 2)! 2}

(12)

Each observed value will participate in (N − 1) geometric constraint constructions, resulting in (N − 1) different

l_{i j}

and

d_{i j}

. The sum of the distance differences for each observed value with respect to the other observed values is

l_{i} = \sum_{j = 1}^{N - 1} l_{i j}

(13)

The sum of the angular differences for each observed value with respect to the other observed values is

d_{i} = \sum_{j = 1}^{N - 1} d_{i j}

(14)

An outlier factor,

S (i)

, is introduced and defined using a normalization method based on the maximum and minimum values to describe the variability between observed values from both distance and angular differences:

S (i) = k_{1} \frac{l_{i} - \min (l_{i})}{\max (l_{i}) - \min (l_{i})} + k_{2} \frac{d_{i} - \min (d_{i})}{\max (d_{i}) - \min (d_{i})}

(15)

In the equations, the weight coefficient

k_{1}

,

k_{2}

is used, which must satisfy

k_{1} + k_{2} = 1

(16)

\begin{array}{l} \max (l_{i}) = \max [l_{1}, \dots, l_{i}, \dots, l_{N - 1}] \\ \min (d_{i}) = \min [d_{1}, \dots, d_{i}, \dots, d_{N - 1}] \end{array}

(17)

4. Fusion of Multiple Observations

In the context of multi-tag localization, the identification and removal of aberrant observations is a fundamental step. Yet, the integration of the subsequent reliable observations from multiple tags is an equally significant challenge. Traditional KF has long been a cornerstone for data fusion, operating under the assumption that the motion of the target conforms to specific patterns, such as uniform velocity or constant acceleration. This assumption facilitates the formulation of kinematic equations that predict system states.

Within this established framework, observational data from detected tags are merged with these predicted states to enhance the accuracy of the localization process. However, the standard KF is limited in its capacity to integrate only two elements: the predicted motion and the actual observation. This limitation becomes particularly evident when attempting to incorporate the wealth of data from multiple reliable observations, a scenario that surpasses the filter’s original design.

To address this, a novel approach has been introduced. This approach diverges from the standard method by performing predictive updates based solely on the available observational data. This enhanced method is capable of handling an increased number of observations through an iterative process, which, in turn, refines the precision and stability of the localization system. The flow of this algorithm is illustrated in Figure 5.

At time t, N tags are observed, and after constructing

G (N)

geometric constraints, n reliable observations are as follows:

Z_{1}

,

Z_{2}

, …,

Z_{n}

. The system state at time t is

X_{t}

, and the system state at the k-th fusion is

X_{k}

, where

k \in [1, n - 1]

. A total of n − 1 fusions are required for each moment. The system state includes the two-dimensional coordinates in the world coordinate system and the orientation angle:

X_{t} = {[x, y, ψ]}^{T}

(18)

X_{k} = {[x, y, ψ]}^{T}

(19)

The relationship between

X_{t}

and

X_{k}

is given by:

X_{t} = X_{k} = X_{k - 1} + w_{k - 1}

(20)

In this equation,

w_{k}

represents the process noise in the fusion, which is normally distributed,

w_{k} ~ (0, Q)

, where

Q

is the covariance matrix of the process noise.

The relationship between the observation values and the system state is

Z_{i} = H X_{k} + v_{i}

(21)

Here,

H

is the observation matrix, and

v_{i}

represents the measurement noise of the i-th reliable observation, which is also normally distributed,

v_{i} ~ (0, R)

,

i \in [1, n]

, where

R

is the covariance matrix of the measurement noise.

At the k-th fusion, where

k \in [1, n - 1]

, the fused value from the (k − 1)-th fusion,

{\hat{X}}_{k - 1}

is used as the current moment’s predicted value,

\bar{X_{k}}

:

\bar{X_{k}} = {\hat{X}}_{k - 1}

(22)

The initial value,

\bar{X_{1}}

is given by:

\bar{X_{1}} = H^{-} Z_{1}

(23)

The prediction error, denoted as

{\bar{e}}_{k}

, is calculated as

\begin{array}{l} {\bar{e}}_{k} & = X_{k} - \bar{X_{k}} \\ = X_{k} - {\hat{X}}_{k - 1} \\ = X_{k - 1} + w_{k - 1} - {\hat{X}}_{k - 1} \\ = e_{k - 1} + w_{k - 1} \end{array}

(24)

The prediction error covariance matrix

{\bar{P}}_{k}

is computed as

\begin{array}{l} {\bar{P}}_{k} & = E [{\bar{e}}_{k} {\bar{e}}_{k}^{T}] \\ = P_{k - 1} + Q_{k - 1} \end{array}

(25)

where Q is the process noise covariance matrix.

The initial value,

{\bar{P}}_{1}

, is given by

{\bar{P}}_{1} = P_{t - 1}

(26)

The observation noise is adjusted based on the outlier factor of the observation values:

R_{k} = (S (k) - S (1)) R_{1}

(27)

Here,

S (1)

is the outlier factor of the first reliable observation, and

R_{1}

is the initial observation noise. Since the reliable observations are sorted in ascending order based on the outlier factor, the first reliable observation has the smallest outlier factor value. This implies that a larger outlier factor corresponds to greater observation noise.

The Kalman gain

K_{k}

is calculated as

K_{k} = \frac{{\bar{P}}_{k} H^{T}}{H {\bar{P}}_{k} H^{T} + R_{k}}

(28)

The new observation value

Z_{i + 1}

is introduced as the k-th observation

Z_{k} = Z_{i + 1}

(29)

The prediction is updated using the Kalman gain and the new observation value:

{\hat{X}}_{k} = {\bar{X}}_{k} + K_{k} (Z_{k} - H {\bar{X}}_{k})

(30)

Update the error:

\begin{array}{l} e_{k} & = X_{k} - {\hat{X}}_{k} \\ = X_{k} - ({\bar{X}}_{k} + K_{k} (Z_{k} - H {\bar{X}}_{k})) \\ = (I - K_{k} H) (X_{k} - {\bar{X}}_{k}) - K_{k} v_{k} \end{array}

(31)

where

I

is the identity matrix.

Update the error covariance matrix:

P_{k} = E [e_{k} e_{k}^{T}] = (I - K_{k} H) \bar{P_{k}}

(32)

The iteration continues until all reliable observations have been integrated. At the end of the iteration, the final fused value at time t,

{\hat{X}}_{t}

is obtained. At this point, the error covariance matrix will serve as the predicted error covariance matrix for the next moment t + 1:

{\hat{X}}_{t} = {\hat{X}}_{k = n - 1}

(33)

{\bar{P}}_{t + 1} = P_{t} = P_{k = n - 1}

(34)

5. Experiment

In this section, we detail the experimental approach undertaken to validate the efficacy of our multi-tag fusion localization method. Utilizing the Isaac Sim platform, we have conducted simulation-based experiments that offer a cost-effective and flexible framework for assessing our method. The decision to employ simulation was driven by the constraints of time and budget, while also allowing for a diverse range of scenario configurations through parameter adjustments. This approach enables us to systematically test the robustness and accuracy of our method under controlled conditions. Our primary objectives were twofold: to initially validate the geometric constraints by demonstrating their effectiveness in filtering out outliers and subsequently to assess the overall positioning accuracy and stability of the proposed localization method. These objectives guide the design of our experiments and the analysis of the results obtained.

5.1. Experimental Setup

The Isaac Sim camera model, depicted in Figure 6, generates high-fidelity images for reliable testing. The reliability of the simulation camera was validated through fixed-point tests, confirming its ability to provide trustworthy and high-quality images. Positioned at the rear of the vehicle at a height of 1.15 m, the camera captured 6 to 8 tags per frame at a resolution of 1280 × 720 pixels and a frame rate of 25 fps.

This study designed multiple simulation scenarios, as depicted in Figure 7.

Simulation Scenario 1: Visual tags are laid out on the ground. AprilTag’s tag36 h 10 series is used for the visual tags, with each tag measuring 0.04 m × 0.04 m and spaced 0.15 m apart. This initial setup establishes a baseline for evaluating the method’s accuracy in an unobstructed environment.

Simulation Scenario 2: Adds obstacles based on Scenario 1. The incorporation of obstacles is intended to simulate real-world conditions where line-of-sight to tags may be partially blocked, thereby testing the method’s resilience to occlusions.

Simulation Scenario 3: Similar to Scenario 1, but with tags measuring 0.03 m × 0.03 m and spaced 0.1 m apart. Reducing the tag size and adjusting the spacing allows us to examine the method’s sensitivity to changes in tag density and visibility range.

Simulation Scenario 4: Adds obstacles based on Scenario 3. Further introducing obstacles with the smaller tags assesses the method’s adaptability to challenging conditions with increased occlusion and reduced tag size.

This study conducted a series of six experiments designed to evaluate the effectiveness of geometric constraints and to assess the positioning accuracy and stability of the proposed method, as detailed in Table 1. The simulation software provided the ground truth for the experiments, establishing a reliable benchmark for comparison.

Due to the uniform scaling of the tag models, changes in tag size were accompanied by corresponding adjustments in the spacing between tags. As depicted in Figure 7, the simulation software offered realistic lighting conditions, resulting in varying illumination for tags at different locations. This feature of the simulation software introduced an additional layer of realism to the experiments, as it mimicked the diverse lighting conditions that tags might encounter in real-world scenarios.

The inclusion of these factors in the experimental design allowed for a more comprehensive assessment of the proposed method’s performance under conditions that closely resemble real-world applications. This attention to detail ensures that the results obtained from the simulation are not only indicative of the method’s efficacy but also reflective of its robustness in varying environmental conditions.

5.2. Geometric Constraint Effectiveness

A critical aspect of this research was to demonstrate the effectiveness of geometric constraints in enhancing the accuracy of the positioning system.

5.2.1. Single-Frame Validation

For a demonstrative purpose, we extracted a specific frame from the simulation dataset, capturing both the true and observed positioning values. Within this frame, eight tags were discerned, with the identifiers {1120, 1121, 1136, 1137, 1408, 1409, 1410, 1425}. Implementing geometric constraints, we assigned a consistent weight coefficient of 0.5 to all observations, as per Equation (15), to ascertain the outlier factor for each. The results of these calculations are presented in Table 2.

After sorting the observations by their outlier factors in ascending order, we designated the top 4 observations with the lowest outlier factors as reliable. These reliable observations were identified by the IDs {1121, 1410, 1409, 1137}, while the remaining observations were considered outliers: {1425, 1408, 1120, 1136}.

Figure 8 presents a scatter plot of the observed tag coordinates within the world coordinate system, where each distinct point corresponds to a single observation identified by its unique tag ID. The positioning of these points in proximity to the true values illustrates the precision of our method, demonstrating the successful application of our filtering process in discerning reliable observations from outliers.

Furthermore, Table 3 illustrates the variance of the observations before and after the removal of outliers. The variance in the X, Y, and yaw dimensions of the reliable observations is consistently lower than that of all observations combined. This reduction in variance demonstrates that the data become more concentrated and stable after the application of geometric constraints, confirming the reliability of the filtered observations.

5.2.2. Process Validation

In this section, we compare the method without geometric constraints and our proposed method over a sustained period, with detailed results presented in Table 4. This comparison allows us to evaluate the impact of geometric constraints on the system’s accuracy and stability in a continuous positioning process.

The Method Without Geometric Constraints: This method fused all raw observations. It treated each tag’s data equally, applying a uniform and constant level of observation noise throughout the fusion process.

Our Method: The proposed method first culled outliers based on outlier factors, then dynamically adjusted observation noise for each tag’s data during fusion, enhancing accuracy and stability.

The table provides a clear comparison of the error metrics between the method without geometric constraints and our proposed method, highlighting the impact of incorporating geometric constraints on positioning accuracy.

For positional error, the method without geometric constraints had an average error within 8 cm, but this error peaked at a maximum of 14 cm during the motion sequence. This variability suggests a potential lack of reliability in positioning accuracy.

In contrast, our method, which applies geometric constraints, showcased superior precision with a maximum average error of 2.611 cm and a maximum error capped at 2.968 cm. This represents a significant improvement in positional accuracy, demonstrating the advantage of using geometric constraints.

When it comes to orientation angle error, the method without geometric constraints exhibited a maximum error of 8.39° and a maximum average error of 3.38°. These figures indicate a higher likelihood of inaccuracy in orientation estimation.

However, our method achieved a maximum error of 2.90° and a maximum average error of just 1.99° in orientation angle estimation. This substantial enhancement in orientation accuracy further underscores the effectiveness of geometric constraints in refining the precision and stability of the positioning system.

We have selected a specific dataset to visually demonstrate the comparative performance between our proposed method and the method without geometric constraints. The visualization focuses on the X, Y, and YAW positioning data for both methods. As depicted in Figure 9, our method exhibits a smoother trajectory, indicative of more stable and precise localization. In contrast, the method lacking geometric constraints shows noticeable spikes and abrupt changes, revealing its susceptibility to inaccuracies.

5.3. Method Performance

In this section, we present graphical representations of our method’s performance, focusing on the trajectory and positional data from Groups 2 and 3 as outlined in the experimental setup table, where the positioning information obtained through our method is compared against the true positions provided by the simulation software.

Figure 10 (Group 2): This figure displays the raw observational data for the vehicle’s motion as a scatter plot, which clearly shows the substantial dispersion of points due to the presence of outliers. Our method adeptly addresses these irregularities, producing a trajectory that closely mirrors the true path with a notable reduction in fluctuations.

Figure 11 (Group 3): The motion trajectory of the vehicle is portrayed under more complex conditions, where movement is compounded by the presence of obstacles. The increased presence of outliers, particularly at corners where occlusions affect observation quality, is evident. Despite these challenges, our method demonstrates its robustness by producing a trajectory that is significantly smoother and more aligned with the true path compared to the raw data.

Based on the experimental setup outlined in Table 1 and the data presented in Table 4, we conclude the following:

Motion State: The motion state of the vehicle significantly affects positioning accuracy, with moving Groups 2 and 5 showing higher positioning errors compared to the stationary Groups 1 and 4. Our method, however, reduces these errors, bringing the positioning accuracy of moving states close to that of stationary states, indicating its effectiveness in handling motion challenges.

Tag Size: Comparisons across different tag sizes demonstrate that our method maintains low positioning errors regardless of tag size, whether larger (Groups 1 and 2) or smaller (Groups 4 and 5). This finding suggests that our method is compatible and consistent with various tag dimensions.

Obstacle Obstruction: The presence of obstacles in Groups 3 and 6, as opposed to the absence of obstructions in Groups 2 and 5, results in decreased positioning accuracy. However, our method’s identification and removal of outliers effectively lessen the impact of obstructions on positioning outcomes.

6. Conclusions

The present study offers a contribution to the field of multi-tag localization by proposing a method that incorporates geometric constraints to address the issues of instability and inaccuracy that may arise in such scenarios.

We have developed a method that calculates an outlier factor based on geometric constraints, which may provide a means to characterize the variability among observational data points from multiple tags. This approach could potentially enhance the accuracy of the positioning process by dynamically adjusting the observation noise for each tag after identifying and removing outliers. Our method also includes an extension to the traditional KF technique, designed to manage the fusion of data from multiple tags. This extension, which is based solely on observational data for prediction updates, may offer a pathway to bypass the limitations of conventional KF that typically integrate two data sources.

Preliminary results from high-fidelity simulations indicate that our method can achieve a reasonable level of positioning accuracy and stability, with errors maintained within a 3 cm range for position and 3° for orientation. These results are encouraging and suggest that the method may be applicable in environments where GNSS signals are not available or are subject to interference. In our future work, we aim to integrate Z-axis constraints into our localization method to achieve six-degree-of-freedom (6-DOF) positioning. This enhancement will be crucial for high-precision navigation in environments with varying terrain. Additionally, we plan to deploy our method on a real vehicle, where we will tackle and overcome the practical issues encountered in actual deployment scenarios.

Author Contributions

Conceptualization, Z.L. (Zhuojun Liu) and Z.L. (Zexing Li); Data curation, Z.L. (Zhuojun Liu); Formal analysis, Z.L. (Zhuojun Liu); Funding acquisition, W.Q. and G.Z.; Investigation, Z.L. (Zhuojun Liu) and W.Q.; Methodology, Z.L. (Zhuojun Liu); Project administration, W.Q.; Resources, W.Q., Z.L. (Zexing Li), and G.Z.; Software, Z.L. (Zhuojun Liu) and Z.L. (Zexing Li); Supervision, W.Q. and G.Z.; Validation, Z.L. (Zhuojun Liu); Visualization, Z.L. (Zhuojun Liu); Writing—original draft, Z.L. (Zhuojun Liu) and G.Z.; Writing—review and editing, Z.L. (Zhuojun Liu), W.Q. and G.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the first author.

Acknowledgments

The authors would like to acknowledge the generous support of Qin Wengang, Li Zexing, and Zhou Guofeng in funding aspects of this work.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Chalvatzaras, A.; Pratikakis, I.; Amanatiadis, A.A. A Survey on Map-Based Localization Techniques for Autonomous Vehicles. IEEE Trans. Intell. Veh. 2023, 8, 1574–1596. [Google Scholar] [CrossRef]
Li, Z.; Wang, Y.; Zhang, R.; Ding, F.; Wei, C.; Lu, J.-G. A LiDAR-OpenStreetMap Matching Method for Vehicle Global Position Initialization Based on Boundary Directional Feature Extraction. IEEE Trans. Intell. Veh. 2024, 1–13. [Google Scholar] [CrossRef]
Zhou, Z.; Wang, Y.; Zhou, G.; Nam, K.; Ji, Z.; Yin, C. A Twisted Gaussian Risk Model Considering Target Vehicle Longitudinal-Lateral Motion States for Host Vehicle Trajectory Planning. IEEE Trans. Intell. Transp. Syst. 2023, 24, 13685–13697. [Google Scholar] [CrossRef]
Lu, Y.; Ma, H.; Smart, E.; Yu, H. Real-Time Performance-Focused Localization Techniques for Autonomous Vehicle: A Review. IEEE Trans. Intell. Transp. Syst. 2022, 23, 6082–6100. [Google Scholar] [CrossRef]
Springer, J.; Kyas, M. Autonomous Drone Landing with Fiducial Markers and a Gimbal-Mounted Camera for Active Tracking. In Proceedings of the 2022 Sixth IEEE International Conference on Robotic Computing (IRC), Naples, Italy, 5–7 December 2022; pp. 243–247. [Google Scholar] [CrossRef]
Mantha, B.R.K.; Garcia de Soto, B. Investigating the Fiducial Marker Network Characteristics for Autonomous Mobile Indoor Robot Navigation Using ROS and Gazebo. J. Constr. Eng. Manag. 2022, 148, 04022115. [Google Scholar] [CrossRef]
Zhang, H.; Zhang, C.; Yang, W.; Chen, C.-Y. Localization and navigation using QR code for mobile robot in indoor environment. In Proceedings of the 2015 IEEE International Conference on Robotics and Biomimetics (ROBIO), Zhuhai, China, 6–9 December 2015; pp. 2501–2506. [Google Scholar] [CrossRef]
Wang, S.; Hu, T. ROS-Gazebo Supported Platform for Tag-in-Loop Indoor Localization of Quadrocopter. In Intelligent Autonomous Systems 14; Chen, W., Hosoda, K., Menegatti, E., Shimizu, M., Wang, H., Eds.; Springer International Publishing: Cham, Switzerland, 2017; pp. 185–197. [Google Scholar] [CrossRef]
Ghasemi, A.; Parivash, F.; Ebrahimian, S. Autonomous landing of a quadrotor on a moving platform using vision-based FOFPID control. Robotica 2022, 40, 1431–1449. [Google Scholar] [CrossRef]
Zhang, Z.; Zhong, L.; Lin, M.; Lin, R.; Li, D. Triangle codes and tracer lights based absolute positioning method for terminal visual docking of autonomous underwater vehicles. Ind. Robot. Int. J. Robot. Res. Appl. 2024, 51, 269–286. [Google Scholar] [CrossRef]
Kulikov, V.; Spirin, E.; Pikalov, I.; Saramud, M. Methods for determining spatial position and production objects orientation based on the proposed fiducial markers for technical vision system. Int. J. Adv. Manuf. Technol. 2024, 132, 2549–2562. [Google Scholar] [CrossRef]
López-Cerón, A.; Cañas, J. Accuracy analysis of marker-based 3D visual localization. In Proceedings of the XXXVII Conference on Automation, Madrid, Spain, 7–9 September 2016; pp. 1124–1131. [Google Scholar] [CrossRef]
Jin, P.; Matikainen, P.; Srinivasa, S.S. Sensor fusion for fiducial tags: Highly robust pose estimation from single frame RGBD. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; pp. 5770–5776. [Google Scholar] [CrossRef]
Szentandrási, I.; Zachariáš, M.; Havel, J.; Herout, A.; Dubská, M.; Kajan, R. Uniform Marker Fields: Camera localization by orientable De Bruijn tori. In Proceedings of the 2012 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), Atlanta, GA, USA, 5–8 November 2012; pp. 319–320. [Google Scholar] [CrossRef]
Kallwies, J.; Forkel, B.; Wuensche, H.-J. Determining and Improving the Localization Accuracy of AprilTag Detection. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; pp. 8288–8294. [Google Scholar] [CrossRef]
Abbas, S.M.; Aslam, S.; Berns, K.; Muhammad, A. Analysis and Improvements in AprilTag Based State Estimation. Sensors 2019, 19, 5480. [Google Scholar] [CrossRef] [PubMed]
Qin, H.; Shen, G.; Zhou, Y.; Huang, S.; Qin, X.; Xie, G.; Ding, R. Tag-Based Vehicle Visual SLAM in Sparse Feature Scenes. Automot. Eng. 2023, 45, 1543–1552. [Google Scholar] [CrossRef]
Martínez-Barberá, H.; Bernal-Polo, P.; Herrero-Pérez, D. Sensor Modeling for Underwater Localization Using a Particle Filter. Sensors 2021, 21, 1549. [Google Scholar] [CrossRef] [PubMed]
Zhang, S.; Shan, J.; Liu, Y. Variational Bayesian Estimator for Mobile Robot Localization With Unknown Noise Covariance. IEEE/ASME Trans. Mechatron. 2022, 27, 2185–2193. [Google Scholar] [CrossRef]
Kayhani, N.; Zhao, W.; McCabe, B.; Schoellig, A. Tag-based visual-inertial localization of unmanned aerial vehicles in indoor construction environments using an on-manifold extended Kalman filter. Autom. Constr. 2022, 135, 104112. [Google Scholar] [CrossRef]
Chen, J.; Gao, Y.; Li, S. Real-time Apriltag Inertial Fusion Localization for Large Indoor Navigation. In Proceedings of the 2020 Chinese Automation Congress (CAC), Shanghai, China, 6–8 November 2020; pp. 6912–6916. [Google Scholar] [CrossRef]
Neunert, M.; Bloesch, M.; Buchli, J. An open source, fiducial based, visual-inertial motion capture system. In Proceedings of the 2016 19th International Conference on Information Fusion (FUSION), Heidelberg, Germany, 5–8 July 2016; pp. 1523–1530. Available online: https://ieeexplore.ieee.org/abstract/document/7528064 (accessed on 20 June 2024).
Huang, Y.-H.; Lin, C.-T. Indoor Localization Method for a Mobile Robot Using LiDAR and a Dual AprilTag. Electronics 2023, 12, 1023. [Google Scholar] [CrossRef]
Zhenglong, G.; Qiang, F.; Quan, Q. Pose Estimation for Multicopters Based on Monocular Vision and AprilTag. In Proceedings of the 2018 37th Chinese Control Conference (CCC), Wuhan, China, 25–27 July 2018; pp. 4717–4722. [Google Scholar] [CrossRef]
Kayhani, N.; Heins, A.; Zhao, W.; Nahangi, M.; McCabe, B.; Schoellig, A. Improved Tag-based Indoor Localization of UAVs Using Extended Kalman Filter. Presented at the 36th International Symposium on Automation and Robotics in Construction, Banff, AB, Canada, 21–24 May 2019. [Google Scholar] [CrossRef]
Xu, Z.; Haroutunian, M.; Murphy, A.J.; Neasham, J.; Norman, R. An Underwater Visual Navigation Method Based on Multiple ArUco Markers. J. Mar. Sci. Eng. 2021, 9, 1432. [Google Scholar] [CrossRef]
Adámek, R.; Brablc, M.; Vávra, P.; Dobossy, B.; Formánek, M.; Radil, F. Analytical Models for Pose Estimate Variance of Planar Fiducial Markers for Mobile Robot Localisation. Sensors 2023, 23, 5746. [Google Scholar] [CrossRef] [PubMed]
Fiala, M. ARTag, a fiducial marker system using digital techniques. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; Volume 2, pp. 590–596. [Google Scholar] [CrossRef]
Wang, J.; Olson, E. AprilTag 2: Efficient and robust fiducial detection. In Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, Republic of Korea, 9–14 October 2016; pp. 4193–4198. [Google Scholar] [CrossRef]
Garrido-Jurado, S.; Muñoz-Salinas, R.; Madrid-Cuevas, F.J.; Marín-Jiménez, M.J. Automatic generation and detection of highly reliable fiducial markers under occlusion. Pattern Recognit. 2014, 47, 2280–2292. [Google Scholar] [CrossRef]
Benligiray, B.; Topal, C.; Akinlar, C. STag: A stable fiducial marker system. Image Vis. Comput. 2019, 89, 158–169. [Google Scholar] [CrossRef]
Kalaitzakis, M.; Cain, B.; Carroll, S.; Ambrosi, A.; Whitehead, C.; Vitzilaios, N. Fiducial Markers for Pose Estimation: Overview, Applications and Experimental Comparison of the ARTag, AprilTag, ArUco and STag Markers. J. Intell. Robot. Syst. 2021, 101, 71. [Google Scholar] [CrossRef]
Krogius, M.; Haggenmiller, A.; Olson, E. Flexible Layouts for Fiducial Tags. In Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 3–8 November 2019; pp. 1898–1903. [Google Scholar] [CrossRef]

Figure 1. Common visual labels: (a) ARTag; (b) ArUco; (c) AprilTag; (d) Stag.

Figure 2. AprilTag3 detection process.

Figure 3. Diagram of the coordinate system.

Figure 4. Schematic representation of the geometric constraints.

Figure 5. Algorithm flowchart.

Figure 6. (a) The real camera takes the image; (b) The simulation camera takes the image.

Figure 7. (a) Simulation Scenario 1; (b) Simulation Scenario 2; (c) Simulation Scenario 3; (d) Simulation Scenario 4.

Figure 8. Schematic representation of the observed values.

Figure 9. Method comparison with/without geometric constraints. (a) Trajectory; (b) X; (c) Y; (d) Yaw.

Figure 10. Experimental results of Group 2. (a) Trajectory; (b) X; (c) Y; (d) Yaw.

Figure 11. Experimental results of Group 3. (a) Trajectory; (b) X; (c) Y; (d) Yaw.

Table 1. Experimental setup.

Group	State of Motion	Tag Size	Tag Interval	Whether Obstructions Exist
1	still	0.04 m × 0.04 m	0.15 m	no
2	moving	0.04 m × 0.04 m	0.15 m	no
3	moving	0.04 m × 0.04 m	0.15 m	yes
4	still	0.03 m × 0.03 m	0.1 m	no
5	moving	0.03 m × 0.03 m	0.1 m	no
6	moving	0.03 m × 0.03 m	0.1 m	yes

Table 2. Outlier factor.

ID	Outlier Factor	Whether It Is a Reliable Observation
1121	0	yes
1410	0.04049	yes
1409	0.08078	yes
1137	0.25164	yes
1425	0.67949	no
1408	0.80974	no
1120	0.83909	no
1136	1	no

Table 3. Comparison of variance before and after excluding outliers.

Dimensionality	The Variance of All Observations	Variance of Reliable Observations
X	0.00715	0.00254
Y	0.00436	0.00184
Yaw	0.01124	0.00846

Table 4. Comparative error analysis.

Group		Method Without Geometric Constraints		Our Method
Group		Distance Error (m)	Yaw Error (rad)	Distance Error (m)	Yaw Error (rad)
1	Maximum Error	0.02246	0.03816	0.01627	0.01262
	Minimum Error	0.00386	0.00089	0.00273	0.00018
	Average Error	0.01113	0.00989	0.00824	0.00432
2	Maximum Error	0.16475	0.14654	0.02968	0.05063
	Minimum Error	0.00231	0.00014	0.00278	0.00004
	Average Error	0.05813	0.02442	0.01772	0.01298
3	Maximum Error	0.61676	0.10386	0.02858	0.04184
	Minimum Error	0.01283	0.00321	0.00355	0.00036
	Average Error	0.05587	0.05908	0.02294	0.02153
4	Maximum Error	0.05618	0.03842	0.02213	0.02451
	Minimum Error	0.00033	0.00025	0.00033	0.00006
	Average Error	0.01646	0.00837	0.00889	0.00550
5	Maximum Error	0.09544	0.10569	0.02826	0.03641
	Minimum Error	0.00502	0.01421	0.00772	0.00020
	Average Error	0.04035	0.03189	0.01871	0.03478
6	Maximum Error	0.14736	0.10569	0.02624	0.04256
	Minimum Error	0.00148	0.00113	0.00219	0.00027
	Average Error	0.07850	0.03613	0.02611	0.02389

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, Z.; Qin, W.; Li, Z.; Zhou, G. Multi-Tag Fusion Localization Method Based on Geometric Constraints. Appl. Sci. 2024, 14, 5480. https://doi.org/10.3390/app14135480

AMA Style

Liu Z, Qin W, Li Z, Zhou G. Multi-Tag Fusion Localization Method Based on Geometric Constraints. Applied Sciences. 2024; 14(13):5480. https://doi.org/10.3390/app14135480

Chicago/Turabian Style

Liu, Zhuojun, Wengang Qin, Zexing Li, and Guofeng Zhou. 2024. "Multi-Tag Fusion Localization Method Based on Geometric Constraints" Applied Sciences 14, no. 13: 5480. https://doi.org/10.3390/app14135480

APA Style

Liu, Z., Qin, W., Li, Z., & Zhou, G. (2024). Multi-Tag Fusion Localization Method Based on Geometric Constraints. Applied Sciences, 14(13), 5480. https://doi.org/10.3390/app14135480

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multi-Tag Fusion Localization Method Based on Geometric Constraints

Abstract

1. Introduction

2. Visual Tag Process

2.1. AprilTag3 Detection

2.2. Coordinate Transformation

3. Construction of Geometric Constraints

4. Fusion of Multiple Observations

5. Experiment

5.1. Experimental Setup

5.2. Geometric Constraint Effectiveness

5.2.1. Single-Frame Validation

5.2.2. Process Validation

5.3. Method Performance

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI