1. Introduction
The Pose Estimation of rigid objects has a wide range of applications in today’s life, such as robot grasping [
1], aerospace applications, autonomous driving [
2], and so on.
Throughout history, pose estimation has played a vital role, from Ceres’s position predicting to navigation on the sea ([
3], Introduction 1.1). One day,
Captain cook sailing in the sea has lost his bearings. The ship moves on a wavy and wind sea. With the reasonable assumption, He estimates the optimal localization of the ship from the noisy measurements (
Figure 1, top).
New challenges come to this field. The success of service robotics relies on well-perceiving an unfamiliar kitchen scenarios, such as bottles, dishes, and containers. Robot grasps under cluttered or limited lighting environments are becoming common in a rapidly growing warehouse automation industry [
4]. Consequently, robots need to know the full 6D pose of objects to manipulate objects, which is difficult due to the uncertainty caused by environmental factors and sensing technologies. The intuition is to deploy expensive high-precision sensors to avoid such uncertainty. However, all sensors have limited precision, and measurements derived from actual sensors have associated uncertainty ([
3], Introduction 1.2).
The CNN [
5] broke through in the ImageNet [
6] challenge, motivating more and more works to embrace deep learning. However, most prior works in this area failed to take the weak information [
7] and the 6D pose uncertainty into account, and they only provide a single best guess for each object pose [
8,
9,
10]. The accurate metric of the model cannot handle unprepared situations, such as dim lighting. The uncertainty will rise with the increase in accuracy over time.
The pose estimation of symmetric objects is currently the most challenging task [
11]. Existing multiple correct poses for the same visual appearance [
12]. This issue also results in lousy training performance since a network receives inconsistent loss signals. Xiang et al. [
9] design an MSE loss function based on quaternions and regress the 3D rotation using the neural network. It reports better performance for symmetric objects such as bowls. However, they do not take the underlying structure of the manifolds into consideration.
Deng et al. [
13] detect the object’s position by a bounding box using a neural network. They assume the Gaussian position and approximate the observation likelihood to obtain the conditioning orientation density by a Rao–Blackwellized particle filter. The result is considerably robust in rotational uncertainty. However, this approach is computationally expensive, and it is not clear which kind of distribution holds on the orientations. Furthermore, uncertain rotation axes need to be considered when underlying uncertainty is not axis aligned [
14].
Nonetheless, the neural network correctly assumes the underlying structure of the data is robust. Prokudin [
15] exhibits a good example dealing with periodic circular data. The same loss will emerge when the predicted angle is equal to
or
with the ground truth angle of
. He presents a loss based on the directional setting and assumes the angle obeys the von Mises distribution. As a result, the model works suitably for the data with the periodic nature.
The Gaussian assumption is commonly embedded, leading the neural network to fail to learn the underlying manifold structure of the 6D pose. Holding the robust assumption of uncertainty is the foremost for 6D pose estimation.
Inevitably, coupling or decoupling is the first consideration when parametrizing the 6D pose. It is also related to further modeling uncertainty. Daniilidis [
16] employs dual quaternion to parameterize the 6D pose for hand–eye calibration and establishes a linear homogeneous system, simultaneously solving rotation and translation. Goddard and Addi [
17] capture the correlation between rotation and translation for rigid motion tracking by dual quaternions.
Horn [
18] proposes the method that decouples the orientation from translation. Inspired by Horn, Srivatsan [
19] makes a linear assumption on the state of the
elements by adopting dual quaternions. Li [
20] explains the correlation between rotation and translation terms based on hyperspherical parallel transport and gives a Bingham distribution on the orientation decoupled from the translation. Manhardt [
12] illustrates the rotation uncertainty caused by occlusion in the 6D pose estimation by employing Bingham distribution. However, it is still unknown how the Bingham distribution extends to
space without variations.
The work’s most Bingham distribution of interest is motivated by Glover. He presents the BPA [
21] algorithm to recover the full poses from patches of local features of the 3D point cloud. This work extends from the
[
22].
Main Contribution
In this work, we aim to model the uncertain 6D pose by extending the probability density function on the
to the
. This is achieved through parameterizing the 6D pose by dual quaternions. The work simplifies a future generalization to the 6D pose estimation of objects (
Figure 1, bottom). The proposed distribution appears as a marginal distribution for the rotational part of the proposed model. We further discuss the relationship between orientation and position. To the best of the author’s knowledge, there is no work exactly claimed for this.
This paper is organized as follows. We will compare several existing 6D parameterization techniques and select the most promising one in
Section 2. In
Section 3, we propose a new probability distribution on the 6D pose. The normalization constant and parameter estimation techniques will be discussed. Later, in
Section 4, we use the proposed framework to represent the transformation of the 6D pose, and the algorithm that recovers the transformation from the samples of our proposed distribution will be presented. Lastly, the discussion and conclusions of this work are in
Section 5.
2. Preliminaries
There are several existing methods to parameterize
elements, including orthonormal rotation matrices, Euler angles, Rodrigues vectors, quaternions, etc. For orthonormal rotation matrices, the compounding of two elements is much more complex ([
23], p. 45). Although Euler angles are invariant under transforms and easy to comprehend, there exist singularities [
24]. Rodrigues vectors are complex to implement as a composition algorithm [
25]. For parameterization of the 3D rotation, the quaternion is the best from the analog computation view [
26]. Still, it is also limited to representing rotation in a full 6D pose, and the translation must be dealt with separately.
The dual quaternions this work introduced takes both rotation and translation into consideration. It provides a closed-form solution for the composition of 6D poses, which is the analogy to the transform matrix in homogeneous coordinates [
27]. Kenwright [
28] finds that the transforms by the dual quaternion multiplication 10 percent faster compared to matrix multiplication on average. Nonetheless, Kavan et al. [
29] present a practical example, and they utilize dual quaternions to solve the shortage of the linear blend skinning.
Conventions. In this work, lower-case letters represent scalars, and matrices are represented by capital letters, vectors, and quaternions in bold. Quaternions are distinguished from dual quaternions by a caret, e.g., denotes a quaternion and denotes a dual quaternion. Two quaternions and multiplication is denoted as , while the multiplication of the two dual quaternions is denoted as . The dot product and cross product of vectors and are denoted as and , respectively. Finally, we use represent the skew-field of quaternions. With this in mind, consider the quaternions . The pose is synonymous with the 6D pose or transformation; the rotation is synonymous with orientation; the translation is synonymous with the position.
2.1. Quaternions
Similar to complex numbers, the sum of a real number and three complex numbers represent quaternions. The quaternion
can be treated as a 4-tuple
. It can also be identified with the typical basis elements
i,
j,
k via the coefficients:
The multiplication of the two quaternions
,
is given by:
where
names the scalar part and
is the vector part; note that the product is not commutative.
Furthermore, linear operators [
30]
and associated with (
2) can be also defined by matrix-vector form as:
with
where
is the skew-symmetric matrix generated from the corresponding vector.
The canonical norm on of quaternions is defined by
The conjugate of the quaternion is obtained by changing the sign of each element in the imaginary part: .
Last but not the least, the quaternion addition is simply the 4-tuple addition of quaternion representations: .
2.2. Representation of 3D Rotation
Unit quaternions are quaternions with and commonly represent the rotations in 3D Euclid space. The inverse of a unit quaternion is obtained by its conjugate form .
The quaternion can represent the rotation around a 3D axis with a unit-length vector
and the rotation angle
:
A point
is denoted as the purely imaginary quaternion without real part
. From (
4), the rotated point
is obtained as [
25]:
In addition,
and
represent the same rotation due to the
antipodal property on the hypersphere
, so the set
of unit quaternion is a double coverage of the
of the 3D rotations.
2.3. Dual Quaternions
The dual theory is helpful when understanding the concept of dual quaternions. A dual number combines the non-dual part , and the dual part is represented as where . The is the dual unit; note that , . The multiplication of two dual numbers is given as
Dual quaternions (
) are quaternions equivalent to dual numbers, i.e., replacing real numbers
with quaternions
and
. Thus, a dual quaternion is given as follows:
The multiply operation of two dual quaternions is similar to dual numbers. As two dual quaternions
and
(
), the product of the two is:
Note that the product is non-commutative.
The dual quaternions apply the same strategy utilized by quaternions, linear operators:
and
can be defined for dual quaternions by:
with
Dual quaternions have three conjugates [
30]: (1)
, (2)
, and (3)
. A dual quaternion
is a unit if
.
The norm of a dual quaternion in (
5) is denoted as
, which expands to:
The unit dual quaternions have the norm that equals one. A dual quaternion satisfies the unit dual quaternion if and only if the non-dual part
equals one and the dot product of two parts
equals zero. This property helps to understand the relationship between the two parts of the dual quaternion. Especially, a unit quaternion is a unit dual quaternion when the dual part is zero.
2.4. Representation of 6D Pose
The 6D pose composes a 3D rotation part represented by a unit quaternion
in the form of (
4) and a 3D translation part
. A unit dual quaternion representing the 6D pose is defined by the following equation:
where
for translation
is defined by:
where
is represented by a purely imaginary translation quaternion with zero real parts.
We introduce the Hamilton operators
to replace the linear operators defined above. The term Hamilton operator, which is borrowed from Akyar [
31], is not commonly used, at least in the robotics literature, but it seems appropriate here:
so that
where
is a
skew-symmetric matrix generated from translation vector
.
A point
, we can be embedded in skew-field
by using a dual quaternion
in (
7). The transformation can be mathematically described as:
Consistent with the unit quaternion in rotations, the dual quaternions
and
represent the same pose.
3. Methods
3.1. Base Element
Consider the fact that the unit dual quaternions
and
in the form of (
5) represent the same pose. Furthermore, the underlying structure of the periodic nature of the 6D pose is no longer linear. Hence, a distribution that can characterize the non-linear structure antipodal symmetric property is required. Consequently, we can tackle multiple, conflicting hypotheses that naturally arise in ambiguous situations.
A Bingham distribution [
32] is exactly a distribution on a unit hypersphere
. The property of hte most interest is the antipodal symmetric, where
(i.e., opposite points on
have equal probability). It commonly represents uncertainty on 3D rotations, denoted as
mathematically, by the unit quaternion form [
33], but not on the 6D pose.
Definition 1. Let a random vector on the hyperspherebe the unit hypersphere in.
The probability density function(p.d.f)of a Bingham distribution is given by:whereis an orthogonal matrix (
),
describing the orientation,
with as the concentration matrix, and N as a corresponding normalization constant. For the representation of small uncertainty, Bingham is almost near the Gaussian distribution, and for the large uncertainty, the Gaussian is worse (or even quite poor) than the Bingham distribution [
34].
For those who are familiar with multivariate Gaussian, the parameter matrices
and
can be derived via the eigendecomposition of a symmetric matrix
in subsequent
Section 3.2 (which is denoted as the inverse covariance matrix
in multivariate Gaussian distribution).
3.2. A New Distribution Model
For a quite intuitive interpretation, we decompose a unit dual quaternion into in vector form. Thus, a joint probability density over the non-dual part, and the dual part can be denoted as the following Lemma 1.
Lemma 1. A random vector
is distributed to the proposed distribution, and the p.d.f is:where and ,
symmetric and positive definite ,
and a normalization constant .
It is always possible to break a joint density into the product of two factors. We can work out the details for the joint case by employing the
Schur complement. First, let
in Lemma 1 be denoted as:
with
; note that
, according to the settings [
22],
needs to be symmetric,
may be arbitrary, and
has to be symmetric negative definite to ensure the antipodal symmetry, which we will further discuss in
Section 3.3.
Lemma 2. The proposed probability density function can be rewritten as:where,
.
From Lemma 2, the rotation part of the dual quaternion evidently appears as a Bingham distribution, which can be derived by marginalizing out the corresponding conditional distribution of the dual part.
Since the dual part,
, combines the rotation and translation by a Hamilton product, a canonical way to describe dependencies between the position and the orientation of a dual quaternion is still unknown [
22].
From (
9), the dual part given the non-dual part
can be treated as a multivariate Gaussian distribution
. Thus, the joint density function from Lemma 2 can be rewritten as:
where ,
.
3.3. Normliazation Constant
The Bingham distribution is flexible for 3D rotation represented by the unit quaternion on the hypersphere
. Although it represents uncertainty, it is still not well propagated in the computer vision and robotics communities because the computation of the normalization constant is complex. The normalization constant,
F, in the Bingham distribution is not closed form, which means the normalization constant,
F, only has the numerical solution. However, several techniques can overcome this difficulty, such as caching techniques [
35] and saddlepoint approximations. It is still an area of active research [
14].
Fortunately, the computational burden is alleviated by the method adapted from [
22]. The normalization constant is written as follows according to the discussion:
where
.
Furthermore, this work calculates
by the hypergeometric function [
34] of a matrix argument:
For the case
, this reduces to:
A statistics library [
36] is used for the computation of the normalization constant,
F, in this work.
3.4. Parameter Estimation
We hope to obtain the parameters of the proposed distribution given a sample matrix , where each column vector is assumed to be generated i.i.d from our proposed distribution. This procedure can be divided into two stages. First, the first four entries of samples will recover the parameter of the Bingham distribution. Second, the eight entries recover the parameter in the Gaussian distribution.
From Lemma 2, we observed that the base element with
in
Section 3.1 appears a marginal distribution of our model. Thus, we can apply Definition 1:
where
is the first four entries from the proposed distribution.
The parameter can be obtained as the matrix of eigenvectors of the covariance with corresponding eigenvalues in the order , equivalent to the eigendecomposition of .
The probability density function of a 4D Bingham distribution projected on
by a unit vector is shown in
Figure 2, which appears in the marginal distribution of the first four entries
in our model.
As we have already discussed in
Section 3.2, the dual part
given the non-dual
is assumed to be a Gaussian distribution
. The parameters
and
can be obtained by maximum likelihood estimation of a multivariate linear regression ([
37], Theorem 8.2.1).
The mode of proposed distribution is related to the order of the column vector in matrix
. It turns to be the last column, according to diagonal entries when we enforce
in
. It is also possible to swap columns of
without changing the distribution [
38]. There exist two correct modes in the proposed distribution because of the antipodal symmetry. For one of the modes
, where
is the normalized eigenvector of
and
, the
is the another correct one.
4. Results
As expected, unit dual quaternions can be used to represent 3D rotation while the dual part is zero. Thus, (
4) can be extended to a dual form as:
where
.
This work utilizes dual quaternions to generate the form of the 3D translation. A vector
embedded in the skew-field
is represented by the following:
To combine rotations and translations, the product of two unit dual quaternions is used according to (
6). A rotation with a subsequent translation is given as follows:
A dual quaterinon in the form of (
14) is still a unit dual quaternion, and there is no restriction on translation that must be unit-length. According to the defined norm property of the unit dual quaternion, that is,
, this property provides ingredients to our algorithm for translation.
According to (
5), the first four entries are interpreted as the non-dual part, which describes the rotation, and its last four entries are interpreted as a dual part of
. That is:
Represented by the vector form, the unit dual quaternion can be written as , decomposing into the non-dual part and the dual part .
Furthermore, the Hamilton operator
is associated with (
8), and the translation part can be derived from (
14) as follows:
where
is the conjugate of
.
Algorithm 1 shows a robust method to generate the rotation angle, axis, and translation in 3D space by sampling from the proposed distribution. Furthermore, this work applies Algorithm 1 to several sets of the samples generated from the proposed distribution, and the results are shown in
Figure 3.
Algorithm 1: Recover the corresponding translation and rotation from sampled vector on the proposed distribution. |
Input: A sampled random vector | |
/*
Compute the rotation angle | */ |
| |
/*
Compute the rotational arbitrary vector | */ |
| |
| |
| |
/* Compute the translation | */ |
| |
| |
| |
Output: Rotation angle , rotational arbitrary vector and translation vector |
The 6D pose is obtained by the random vector sampled from the proposed distribution. Each sample from our distribution appears in the antipodal symmetry in rotations. Compared with the planar rigid motion case [
22], our distribution breaks the antipodal symmetry in the translation that
and
are denoted as positions that always have the same probability values.
5. Conclusions
This work proposes a distribution over the 6D pose in order to capture the uncertainty which is typically ignored by the pose estimation task. Our model takes the periodic nature of the underline structure into consideration, such as antipodal symmetry, which is suitable for high-noise regimes. The proposed distribution eliminates the drawbacks from initializing the , in which the position is ambiguous due to antipodal symmetry, and our distribution setting is closer to reality.
However, we assume that the Gaussian on the correlation part leads to a better, but still imperfect, answer. Many beautiful properties in the parameter matrix of the proposed distribution have not been explored. Still, a unified distribution exists to couple the rotation and translation.
In the future, we will verify the proposed distribution in practical applications, such point cloud registration, as well as the most important one, the 6D pose estimation task. Last but not least, the combination of the neural network with the proposed probability to regress the object pose is worth endeavoring.