1. Introduction
There is a growing interest in autonomous aerial vehicles, which have a wide range of applications in mobile missions such as surveillance, exploration and recognition in different environments. Motion information of the vehicle, generally separated into rotation and translation, is needed to realize autonomous implementation of the system. The estimation of rotation has typically been well resolved using measurements from an onboard strap-down inertial navigation system [
1]. The rotation information can be provided by a low-cost IMU [
2], which is more accurate and less time consuming than pure visual algorithms. Therefore, we used the rotation information directly from the IMU and focused on the estimation of translation in this paper, including position and translational velocity.
Various kinds of sensors have been utilized in this field. The combination of GPS with gyroscopes, accelerometers, and magnetometers provided positional and velocity information as described in [
3]. However, the downside is that GPS is susceptible to weather and terrain conditions, and it expends significant power. Laser range and visual-based sensors were used in [
4] to obtain accurate position information, and in [
5], a laser range sensor combined with visual-inertial odometry was proposed to help complete accurate positioning. However, laser range sensors also have some disadvantages, including their limited perception range and their excessive weight for UAVs. Doppler radars [
6] and ultrasonic sensors [
7] were chose onboard to solve this location issue. However, they are also constrained by factors such as accuracy, cost, weight, and environmental limitations. Vision sensors, due to their excellent performance in these areas, therefore have become a popular choice for obtaining motion information in the system [
8].
Previous research obtaining motion information with the assistance of a visual input has been carried out in several ways. Dual cameras could reconstruct the captured environment and obtain the location of the system using a stereo computer vision algorithm. These methods were mentioned in [
9,
10]. However, in a micro aerial vehicle system, the weight of onboard equipment is usually hoped to be as light as possible, with the consideration of payload and cruise duration of the vehicle. Therefore, monocular cameras and related algorithms have been researched and implemented in a wide range of applications.
The optical flow method mentioned in [
11,
12,
13] utilized pixel changes within the sequential image frames in the time domain and the correlation between adjacent frames to establish the corresponding relationship between previous and current frames, enabling the calculation of the object’s translation velocity. By integrating this velocity, the position information of the object could be obtained. The key advantage of this method is that it performs motion estimation without any requirement for knowledge about the scene. However, the position result will drift over time due to an unbounded accumulation of integration error, and the amount of computation required is enormous.
In [
14,
15,
16], artificial landmarks with known information were laid on the ground, and image processing technology was used to detect and extract the feature point information of the artificial landmarks. Based on this, a coordinate system transformation model was established using coordinate system transformation and camera imaging models, thereby obtaining the position information of the UAVs. However, these methods only work in certain fixed environments.
In [
17,
18], the feature-based methods mentioned were designed to detect and match features points between the current video frame and the reference frame. The feature matching method mentioned in [
17] was designed to detect and match features across multiple frames. In contrast, ref. [
18] reported a feature tracking approach that specifically matched features between adjacent frames. Once corresponding points were identified, they were used to solve a visual equation to obtain information regarding relative rotation and translation. The semi-direct monocular visual odometry (SVO) proposed in [
19] combines direct methods and feature-based methods for motion estimation. This method does not require feature points to be extracted from each frame, but rather transfers feature points of the current frame from the previous frame using an optical flow approach. Feature extraction is only necessary when inserting new key frames in the mapping thread. Therefore, improved robustness and real-time performance have been achieved. Direct Sparse Odometry (DSO) [
20] combines the direct method with sparse reconstruction to extract the brightest pixel positions in image sequences. By monitoring sparse pixel groups, it takes into account image generation parameters and adopts indirect monitoring procedures. It should be pointed out that DSO only works perfectly when using photometric cameras for calibration, rather than conventional cameras, which do not provide high-precision results. Since the focus of this paper is on fusion of IMU data with pure visual sensor motion estimation information, the optimization of the pure visual motion estimation algorithms is not involved in this study.
Vision–inertial fusion methods were also studied by many researchers to estimate motion state. In [
21], inertial data were used to establish motion estimation equations together with visual results, rotation and scale factor were also estimated with these equations, which was different from the method proposed in this paper.
Observing that the rotation information could also be provided by the IMU [
2], there are redundant calculations here. More pairs of corresponded points are needed to solve the extra undetermined rotation variables, which could otherwise be directly obtained from the IMU. Besides extra computational complexity, another disadvantage is that wrong pairs of corresponded points will affect other correct ones when they are contained in one equation. A monocular vision algorithm alone is not able to obtain real scale but just the direction of translation; related solutions were mentioned in [
22].
To obtain accurate, fast-updated and reliable states estimation of the system, position observation directly from a vision algorithm is usually fused with inertial information, normally based on expanded versions of the Kalman filter [
11]. However, taking the multi-rate of the sensors and the delay of visual observation caused by hardware, wireless transmission and processing time of the vision algorithm into consideration, the classic Kalman filter model is not exactly the same as the model here and needs to be modified.
A single-rate Kalman filter with delayed measurement was researched in [
23], and a solution was proposed by extrapolating the measurement. To reduce the computational burden, stable Kalman gain instead of real Kalman gain was used in the method, causing non-optimality. In [
24], the residual was calculated using current measurements and past corresponding estimates, and it was then fused using normal Kalman filter update rules. However, due to the asynchronous nature between the estimate and the residual, it is not optimal.
In this article, a solution of motion estimation is introduced for an aerial vehicle with an onboard IMU and a downward-looking monocular camera fixed on it. The environment below the camera is assumed to be basically planar. The whole solution is separated into two steps.
First we presented a novel method to obtain the 3D local position from vision. Different from previous pure vision algorithms, the attitude information provided by the IMU was integrated in the vision equations as known parameters, rather than being treated as undetermined parameters.
With the assistance of a height sensor, this algorithm could obtain a measurement of the position just with a single pair of corresponded points, which makes the algorithm faster and more robust. Second, a multi-rate optimal filter is presented to fuse vision information and inertial information with consideration of the delayed measurement. Then a modification is made to control the computational complexity so that it could be implemented on an onboard micro controller.
The rest of the paper is organized as follows.
Section 2 derives the state model from the dynamic model of the UAV.
Section 3 presents the method obtaining location observation from measurements with a monocular camera, an IMU, and a height sensor.
Section 4 describes the framework of the state estimation of the UAV. In
Section 5, a real-time filter with consideration of multi-rate sensors and delayed vision observation is introduced. Experiment results are shown in
Section 6 which verifies the feasibility and performance of the proposed method. Some conclusions are presented in
Section 7.
2. Dynamic Model
Our system configuration is exhibited in
Figure 1. An IMU and a monocular camera with its head down are the main sensors onboard. The IMU consists of an accelerometer, a magnetometer and a gyroscope.
Let represent the world frame, where the , , and axes, respectively, correspond to the east, the north, and the vertical direction. Let represent the body frame, where , , and axes adhere to the vehicle body, and the center of the body frame coincides with the system centroid.
Thus, one obtains the dynamic equation of the system motion:
where
,
and
, respectively, represent the vehicle position, velocity, and acceleration in world frame.
Let
denote the rotation matrix from the body frame to the world frame. The vehicle acceleration in world frame could be given by:
where
represents the acceleration in body frame.
denotes the acceleration value provided by the accelerometer adhering to the vehicle body.
and
denote the Gaussian noise and the bias of the accelerometer, respectively.
represents the gravity vector.
could be calibrated either off-line or on-line [
19]. It is not considered in this paper and the measured acceleration is regarded as non-bias. Note that because the body frame is a non-inertial frame, obtaining the complete relationship between
and
should take consideration of inertial acceleration, as shown in [
25]. However, since the motion of the vehicle is not rigid during hovering flight, inertial acceleration is usually omitted to simplify the model in many papers, such as [
22,
26,
27]. This simplification is adopted here and (2) is obtained.
Substituting (2) into (1), one obtains the dynamic model of the system:
which could be transformed into discrete form:
where
denotes the update cycle of the accelerometer. Let
denote the state vector in our system, then the state model of the system could be presented as:
4. Principle of State Estimation
A popular model to fuse information from multiple sensors is the Kalman filter model, which contains a state equation and an observation equation:
denotes the state vector that needs to be estimated. denotes the input vector. denotes the measurement vector. and denote the process noise and the measurement noise, respectively, which are usually considered as white Gaussian noise.
With the state model presented in
Section 2 and the observation model presented in
Section 3, the detailed Kalman filter model of our system could be established as:
The statistical properties of are features of the accelerometer sensor, which could be found in a related hardware data sheet. in our system is set to 10 ms. As mentioned in the introduction, the rotation matrix could be updated by the IMU independently. Observe that is a direction cosine matrix (which is an identity matrix) and is symmetrical in the 3 axis of the sensor; it can be proved that the covariance matrix of is static, which means the covariance matrix of the processing noise is static.
In (12), some of the parameters are time invariant, while the input signal is time variant. Let:
where
and
denote the covariance matrix of
and
, respectively.
denotes the identity matrix.
,
and
will be substituted directly with the identity matrix in the equations below for simplification.
Now we have obtained a detailed Kalman filter model of the system. The entire framework of the state estimation solution in this paper is shown in
Figure 5. Data from sensors are collected, organized and processed to establish the final estimation model of the system. Notice that the attitude estimation component is solved with the method suggested in [
2], and we do not present the details of the attitude estimation algorithm in this paper.
The Kalman filter is an optimal linear filter. The estimation algorithm of the Kalman filter is a recursive update algorithm, which, typically for the model in Equation (12), could be divided into two steps:
Step 1 of the classic Kalman filter: (one-step optimal prediction) given the measurement vector
, one could obtain:
where
denotes the optimal estimation of
with the measurement vector
.
Step 2 of the classic Kalman filter: (optimal filtering) when a new observation
arrived:
where
is called the Kalman gain and could be calculated with
,
and
.
5. Delay Compensation
5.1. Compensation Algorithm
The Kalman estimation model is established in
Section 4. However, the real estimation model herein is different from the classic Kalman model because the observation delay and multiple updating frequencies of sensors need to be considered, as mentioned in
Section 3.2. The signal sequence in the real estimation model is shown in
Figure 6.
The update cycle of the inertial measurements as the same as the control cycle of the system is 10 ms, which will be used as a unit of time in the discrete analysis. Record the update cycle of the observation as and the delay of the observation as , which could be tested through experiments. is mainly caused by software computation on the PC computer, while is caused by hardware and wireless transmission, besides software computation. Imaging that every time a cycle of computation (which is ) on the PC computer is finished and an observation result is obtained, the algorithms go on picking up the newest image just transmitted from onboard for the next cycle of computation, but the newest image already has a delay caused by hardware and wireless transmission (which is ). So, it could be understood that the delay of the observation is larger than the update cycle of the observation, and they need to be considered separately.
Assume that the first time we obtained an observation is at time . Then, the observations are only obtained at the times multiple of , and the real time at which they are measured is at time , where . The subscript of times is allowed to be negative because it is just a mark of sequence.
Record as the sequence of the first k observations we obtained.
The two steps (14) and (15) of the Kalman filter as well as a corollary of (14) will be used to obtain the optimal estimation of the state in the system with the delay of observation and the multiple updating frequency of sensors.
Corollary 1. multi-step optimal prediction
where:Equation (16) presents that the multi-step optimal prediction from time to time could be obtained given the optimal estimation at time and the input information during the period from to . The corollary has been detailed, derived in [
28]
with the orthogonality principle. With (14), (15) and (16) presented above, the algorithm of the state estimation with the delay observation could be presented. As shown in
Figure 6, a new observation arrived at the time
, which is actually measured at the time
. The whole sequence of observations we have obtained during the time
(
) does not change until a new observation arrives at the time
. The optimal linear estimation of the state during the time
(
) could be expressed as:
where
denotes the optimal linear estimation of
with certain numbers of observation
.
denotes the covariance matrix of the estimation.
Compensation algorithm:
Initial conditions: and are known.
Case 1: new observation arrived at (), which is actually measured at .
Step 1: optimal filtering at the time
with new observation
, using (15):
Step 2: Multi-step optimal prediction from the time
to
, using (16). This prediction result will be used as the initial conditions next time when a new observation arrives.
Step 3: Multi-step optimal prediction from the time
to
, using (16). This prediction result will be used as the current optimal estimation result.
Case 2: no new observation arrived ()
Step 1: One-step optimal prediction from the time to , using (14).
This prediction result will be used as the current optimal states estimation.
When a new observation arrives at time , the estimation algorithm could go on for the next cycle with the new initial conditions updated in Case 1 (Step 2) of the last cycle.
In every step of the update algorithm presented above, an optimal linear estimation result is given using all the observations obtained so far. Therefore, the update algorithm presented above is an optimal estimation algorithm.
In case of a dropout of the observation, as mentioned in
Section 3.2, the observation algorithm will return a signal of failure. Therefore, in Case 1 (Step 1) a one-step prediction, the same as in Case 2 (Step 1), will be processed, instead of the filtering as usual. The effect of occasional dropouts could be reduced with the prediction, especially when the input signals are accurate.
The estimation result with dropouts of observation will be shown in
Section 6.
5.2. Optimization of Compensation Algorithm
Because the estimation algorithm will be processed on the onboard micro controller, its computational complexity needs to be controlled. In the update algorithm presented above, Case 1 (Step 1) and Case 2 (Step 1) just use Equations (14) and (15) in the classic Kalman filter; thus the computational burden is acceptable. However, in Case 1 (Step 2) and Case 1 (Step 3), multi-step optimal prediction is processed. As shown in (16), there are two terms in the equation of the multi-step optimal prediction. The coefficient of the first term is static because
is static, so it is easy to calculate. However, the second term is a summary of the time-variant input signals, which will cause heavy computational burden if it is summarized in one control cycle when needed, e.g., at the time
. In this paper, the computational burden is reduced with an iterative update method of this summary term. Let
denote this term which is defined as the “input compensation term”:
With a buffer of the input signals,
could be updated in every control cycle and could be used whenever needed without complex computation.
In Case 1 (Step 2) and Case 1 (Step 3), the input compensation term is and , respectively. So the length of the buffer of the input signals is set to (because ), which means all the input signals during the last period are stored in a buffer used for the updating of the input compensation term.
Optimization of compensation algorithm: in the update algorithm presented in
Section 5.1, two extra initial conditions need to be added:
And in every step of the algorithm, these two initial conditions should be updated using (25):
These two input compensation terms are updated every step to minimize the computational complexity. They will be used every time when a new observation arrives and the multi-step optimal prediction in (16) needs to be executed.