A Small UAV Based Multi-Temporal Image Registration for Dynamic Agricultural Terrace Monitoring

Wei, Ziquan; Han, Yifeng; Li, Mengya; Yang, Kun; Yang, Yang; Luo, Yi; Ong, Sim-Heng

doi:10.3390/rs9090904

Open AccessArticle

A Small UAV Based Multi-Temporal Image Registration for Dynamic Agricultural Terrace Monitoring

¹

School of Information Science and Technology, Yunnan Normal University, Kunming 650500, China

²

Laboratory of Pattern Recognition and Artificial Intelligence, Yunnan Normal University, Kunming 650500, China

³

The Engineering Research Center of GIS Technology in Western China of Ministry of Education of China, Yunnan Normal University, Kunming 650500, China

⁴

Department of Electrical and Computer Engineering, National University of Singapore, Singapore 117576, Singapore

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Remote Sens. 2017, 9(9), 904; https://doi.org/10.3390/rs9090904

Submission received: 15 July 2017 / Revised: 21 August 2017 / Accepted: 21 August 2017 / Published: 31 August 2017

(This article belongs to the Special Issue Remote Sensing of Smallholder Subsistence Agriculture Using Satellites and UAVs)

Download

Browse Figures

Versions Notes

Abstract

:

Terraces are the major land-use type of agriculture and support the main agricultural production in southeast and southwest China. However, due to smallholder farming, complex terrains, natural disasters and illegal land occupations, a light-weight and low cost dynamic monitoring of agricultural terraces has become a serious concern for smallholder production systems in the above area. In this work, we propose a small unmanned aerial vehicle (UAV) based multi-temporal image registration method that plays an important role in transforming multi-temporal images into one coordinate system and determines the effectiveness of the subsequent change detection for dynamic agricultural terrace monitoring. The proposed method consists of four steps: (i) guided image filtering based agricultural terrace image preprocessing, (ii) texture and geometric structure features extraction and combination, (iii) multi-feature guided point set registration, and (iv) feature points based image registration. We evaluated the performance of the proposed method by 20 pairs of aerial images captured from Longji and Yunhe terraces, China using a small UAV (the DJI Phantom 4 Pro), and also compared against four state-of-the-art methods where our method shows the best alignments in most cases.

Keywords:

agricultural terrace; dynamic monitoring; multi-temporal; image registration; small UAVs

Graphical Abstract

1. Introduction

China has more than 70% of its land made up of mountains and hills, while the main agricultural terraces are located in southwest and southeast China. Therefore, agricultural terraces have become the most important agricultural land-use type and support the main agricultural production in these areas. Furthermore, it also plays an important role in reducing flood runoff, changing terrain slope and maintaining water, soil and conservation fertilizer.

However, due to Chinese agricultural policy and regional characteristics, most agricultural terraces in southwest and southeast China are farmed by smallholders, and have small sizes, scattered distributions, complex terrains and other characteristics. Such issues have increased difficulty in land use management and planting management of local governments, as well as regular cropping and planting area monitoring of smallholder farmers. Meanwhile, illegal land occupations, natural disasters, water and wind erosion are also causing a drastic decrease in agricultural terraces, and these further aggravate the soil erosion and endanger the smallholder production. Consequently, a light-weight and low cost dynamic monitoring of agricultural terraces has become a serious concern for smallholder production systems in China.

Agricultural terrace monitoring includes two major aspects: (i) crop monitoring and (ii) planting area monitoring. In this work, we mainly focus on the planting area monitoring, which generally depends on aerial remote sensing and image processing techniques. In parallel to the technological advances of aerial remote sensing, achievements in image processing have promoted the development of sophisticated algorithms using aerial images, such as the landform classification, which is one of the effective ways for dynamic monitoring of land-use changes. Traditional landform classification methods can be divided into non-supervised classification methods and supervised classification methods [1]. Basically, non-supervised methods consist of manual classification methods and automated classification methods [2]. Manual classification methods are relatively time-consuming and the results depend on subjective decisions of the interpreter and are, therefore, neither transparent nor reproducible [3,4]. Automated classification methods [5,6] make use of the unsupervised nature and automation of the change analysis process. However, they are unfavored by difficulties in identifying and labeling change trajectories [7], and the lack of information on calibration of the ground [7]. Artificial neural network (ANN) constitutes a key component of supervised classification methods [8,9]. It is a non-parametric method that is capable of estimating the properties of data based on the training samples. However, ANN suffers from the long training time, the sensitivity to the amount of training data used, as well as the applicability of ANN functions in the common image processing softwares [7].

Subsequent to the landform classification, classified images can be adopted to monitor land use changes. Over the last few years, in order to analyze agricultural terraces, different and new technologies have been used. Satellite data of high spatial resolution and advanced image processing techniques, have opened up a new insight for mapping landscape features, such as terraces. This has given the opportunity for quantitative assessment of farming practices as an indicator in water pollution risk assessment [10], soil erosion risk assessment [11] and landslide boundary monitoring [11,12,13,14]. In addition, airborne LiDAR (light detection and ranging), which has been developed to collect and subsequently characterize vertically distributed attributes [15], and the derived digital elevation model (DEM) [16,17,18,19,20] or digital terrain model (DTM) [12,14] are becoming standard practices in spatial related areas. Recently, the use of unmanned aerial vehicle (UAV) for civil applications has emerged as an attractive and flexible option for the monitoring of various aspects of agriculture and environment [21]. For example, Diaz-Varela et al. [21] proposed an automatic identification of agricultural terraces through object-oriented analysis of high resolution digital surface models and multi-spectral images obtained from UAVs. Deffontaines et al. [22] monitored the active inter-seismic shallow deformation of the Pingting terraces by using UAV high resolution topographic data combined with InSAR time series. Yang et al. [23] proposed a multi-viewpoint remote sensing image registration method that provided an accurate mapping between different viewpoint images for ground change detections.

Compared with satellite and other aerial remote sensing, using small UAVs for agricultural terrace monitoring has a strong mobility, high efficiency, low cost and other advantages. However, the following issues still exist: (i) Due to the payload capacity, small UAVs usually can only carry a light-weight visible light camera, such as CCD or CMOS cameras that limit available image information while increasing difficulty in monitoring algorithms compared with using multi-spectral imaging. (ii) When collecting multi-temporal images for the same location (e.g., a planting area in terraces), the imaging perspective of small UAVs is often easily affected by wind speed/direction, complex terrain, battery capacity (e.g., flying distance), aircraft posture (pitch, roll, yaw), flying height and other human factors. These factors cause the captured scenes (i.e., the same location in a pair of multi-temporal images) to not be in the same coordinate system, while image geometric distortions, low image overlapping, brightness changes and color changes may also be produced in such multi-temporal images.

The above issues have led to the fact that multi-temporal images of the same scene captured by small UAVs may not be directly used to detect changes for dynamic agricultural terrace monitoring, and a reliable multi-temporal image registration, which can transform the images into one coordinate system, is necessary in order to be able to subsequently compare or integrate the data obtained from the multi-temporal images.

In this work, we focus on planting areas of agricultural terraces, and present a small UAV based multi-temporal image registration method for dynamic agricultural terrace monitoring. The major contributions of the proposed method includes: (i) the guided image filtering for agricultural terrace image preprocessing is first designed to enhance terrace ridges in multi-temporal images, (ii) the multi-feature descriptor is then applied to combine the texture feature and the geometric structure feature of terrace images for improving the description of feature points and rejecting outliers, (iii) the multi-feature guided model provides an accurate guiding for feature point set registration, and (iv) the feature points based image registration finally registers the terrace images accurately.

2. Methodology

The proposed small UAV based multi-temporal image registration method has four major sequential processes: (i) image preprocessing; (ii) feature extraction and combination; (iii) feature point set registration; and (iv) image registration. In this section, we first introduce the proposed method followed by analyzing the computational complexity and discussing the implementation details.

2.1. Guided Image Filtering Based Agricultural Terrace Image Preprocessing

Given a gray agricultural terrace image

I

with

x \times y

pixels and intensity

I

= 0.3R + 0.59G + 0.11B from the colorized image captured by a small UAV. We first define a preprocessing method to strengthen the identifiability of terrace ridges in multi-temporal images, and then extract salient features of terrace images along the enhanced ridges in the second step. The main reason is that accuracy and validity of an image registration are not only controlled by the performance of feature point set registration, but also determined by the number and the distribution density of feature points, because of the image transformation constructed by abundant feature points.

In this work, the preprocessing method improves the contrast ratio between terraces and their ridges using the Guided Image Filtering (GIF) [24]. In order to extract large and quality feature points from terrace ridges, we first adopt the GIF, which has the edge-preserving smoothing and the gradient preserving, to preprocess input images. The GIF employs a guidance image to construct a spatially variant kernel and is also related to the matting Laplacian matrix [25].

Firstly, a linear translation-variant guided filtering process in a square window

π_{k}

centered at a pixel k is defined by:

Λ_{i} = α_{k} I_{i} + β_{k}, \forall i \in π_{k},

(1)

where i is a pixel index,

Λ

is the output image,

α_{k} = \frac{\frac{1}{| π |} \sum_{i \in π_{k}} I_{i} g_{i} - γ_{k} {\bar{g}}_{k}}{δ_{k}^{2} + ε}

and

β_{k} = {\bar{g}}_{k} - α_{k} γ_{k}

are two parameters of the minimal cost function

\underset{α_{k}, β_{k}}{arg min} \sum_{i \in π_{k}} ({(α_{k} I_{i} + β_{k} - g_{i})}^{2} + ε α_{k}^{2})

. Here, g is the input image that is identical to the guidance image

I

,

γ_{k}

and

δ_{k}^{2}

are the mean and the variance of

I

in

π_{k}

,

| π |

is the number of pixels in

π_{k}

,

ε

is a regularization parameter preventing

α_{k}

from being too large, and

{\bar{g}}_{k} = \frac{1}{| π |} \sum_{i \in π_{k}} g_{i}

is the mean of g in

π_{k}

.

Secondly, we apply the linear model to all local windows in the entire image:

\begin{matrix} Λ_{i} & = \frac{1}{| π |} \sum_{k : i \in π_{k}} (α_{k} I_{i} + β_{k}) \\ = {\bar{α}}_{i} I_{i} + {\bar{β}}_{i}, \end{matrix}

(2)

where

{\bar{α}}_{i} = \frac{1}{| π |} \sum_{k \in π_{k}} α_{k}

and

{\bar{β}}_{i} = \frac{1}{| π |} \sum_{k \in π_{k}} β_{k}

. An example of agricultural terrace image enhancement (

250 \times 150

pixels) using GIF is given in Figure 1.

An agricultural terrace is constructed by cropland and terrace ridges. Ridges can be considered as a salient feature that provides the features of geometric contours and surface textures for feature based terrace image registration. However, the flat color of terrace ridge is similar with croplands and detecting cropland and terrace ridges becomes difficult when crops are growing in the early stage. Thus, the extracted feature points along ridges are more helpful than points distributed on the cropland. Mathematically, the exponential function can change the density of the data distribution; therefore, we expand the gray value distribution of the common terrace via a natural exponential function formed as:

I_{i}^{n e w} = \{\begin{matrix} 1, & if e x p (- Λ_{i}) \geq 0.7, \\ e x p (- Λ_{i}), & otherwise . \end{matrix} .

(3)

Finally, we can obtain the preprocessed gray agricultural terrace image

I^{n e w}

with a high contrast ratio, smoothing edges, and prominent ridges. The preprocessing step gives an opportunity to extract quality features from these preprocessed terrace images.

2.2. Features Extraction and Combination

Feature points are selected by using the good-feature-to-track criterion [26], similar to the Harris detector, based on the second moment matrix [27]. The selection specifically maximizes the quality of tracking, and is therefore optimized by construction, as opposed to more ad hoc measures of texturedness. The selected feature point set

P = {p_{t}}_{t = 1}^{T}

belongs with the geometric coordinate of a input agricultural terrace image pixel, where

{p_{t}}_{t = 1}^{T} \in Z^{+}

. There is an example of feature points extraction from a preprocessed agricultural terrace image (

500 \times 300

pixels) as shown in Figure 2.

2.2.1. Local Texture Feature Descriptor

A local texture (LT) feature descriptor is designed to describe the texture features around each feature point according to the dominant rotated local binary patterns (DRLBP) proposed by Mehta [28]. Given a gray image

I

with

x \times y

pixels. The DRLBP operates in a local circular region by taking the difference of the central pixel with respect to its neighbors. It is defined as:

D R L B P_{R, L}^{I} (x, y) = \sum_{l = 0}^{L - 1} m (i (x, y), i (a_{l} (x), b_{l} (y))) \cdot 2^{m o d (l - D, L)},

(4)

where

m (i (x, y), i (a_{l} (x), b_{l} (y))) = \{\begin{matrix} 1, & i (a_{l} (x), b_{l} (y)) \geq i (x, y), \\ 0, & i (a_{l} (x), b_{l} (y)) < i (x, y) . \end{matrix},

i (x, y)

and

i (a_{l} (x), b_{l} (y))

are the gray values of central pixel and its neighbor in image

I

, respectively,

(x, y)

and

(a_{l} (x), b_{l} (y))

are the geometric coordinate of the central pixel and its lth neighbor. Note that

a_{l} (x) = x + R c o s (2 π l / L)

and

b_{l} (y) = y - R s i n (2 π l / L)

, R is the radius of the circular neighborhood and L is the number of the neighbors. The

m o d

indicates the modulus operator, and

D = \underset{l \in {0, 1, \dots, L - 1}}{arg max} | i (a_{l} (x), b_{l} (y)) - i (x, y) |

. In this paper, the

{D R L B P_{R, L}^{I} (x, y)}_{x = 0, y = 0}^{X - 1, Y - 1}

are held in binary form. The DRLBP descriptor

D R L B P_{R, N}^{I}

of

I

is denoted by the DRLBP histogram

H_{R, L} (D R L B P_{R, L}^{I}) \in R^{2^{L}}

. The

m o d

operator circularly shifts the weights with respect to the dominant direction because of the weight term

2^{m o d (l - D, L)}

depends on D in the above definition. Therefore, the DRLBP is a rotation invariance and computationally efficient texture descriptor.

Before giving the LT for each feature point, the image is first weighted for each feature point based on its geometric coordinates, the weighting matrix for each point

p_{t}

is defined as:

ϵ_{x y}^{t} = e x p (- \frac{∥ I_{x y} - p_{t} ∥^{2}}{2 τ^{2}}),

(5)

where

I_{x y}

is the geometric coordinate of the pixel

I (x, y)

,

p_{t}

is the tth point of

P

, and

τ

is a parameter that controls the window size of LT.

ϵ^{t}

is a

x \times y

weighting matrix of the tth feature point, and the tth weighting gray image is obtained by:

I^{t} (x, y) = ϵ_{x y}^{t} \times I (x, y),

(6)

where the weighting gray image

I^{t}

has the same size with the source gray image. Let

I^{t} (x, y) = 0

when

I^{t} (x, y) \leq 10^{- 4}

. Figure 3 shows an example of weighting gray image.

We define the LT of the tth feature point via DRLBP in the weighting gray image:

LT {(p_{t})}_{R, L} = H_{R, L} (D R L B P_{R, L}^{I^{t}}),

(7)

where

LT {(P)}_{R, L}

is a LT feature set with T vectors both size of

1 \times 2^{L}

.

2.2.2. Local Geometric Structure Feature Descriptor

A local geometric structure (LGS) feature is designed for each feature point in

P

by a local vector weighting method defined by:

LGS (p_{t}) = \sum_{k = 1}^{K} η_{t k} \vec{p_{t} p_{t_{k}}},

(8)

where

t_{k}

is the index of kth neighbor of

p_{t}

, and K is the number of neighbors. The LGS descriptor employs the K neighbors to give a LGS description for each feature point

p_{t}

but ignores outlier points. Hence, the value of K and the weight term

η_{t k}

play a crucial role for the performance of LGS. We use an outlier score to define the weight term of

\vec{p_{t} p_{t_{k}}}

:

η_{t k} = \frac{1}{2 π σ_{LGS}^{2}} e x p (- \frac{Δ_{t k}^{LT}}{2 π σ_{LGS}^{2}}),

(9)

where

σ_{LGS}^{2}

is the variance of

{Δ_{t}^{LT}}_{t = 1}^{T}

, and the outlier score

Δ_{t_{k}}^{LT}

is computed by the LT distance between

p_{t_{k}}

and the point that has the most similar LT feature with

p_{t_{k}}

as:

Δ_{t k}^{LT} = min ∥ LT {(p_{t_{k}})}_{R, L} - LT {(p_{t (t \neq t_{k})})}_{R, L} ∥^{2} .

(10)

2.2.3. Multi-Feature Descriptor

Different types of feature descriptors have their own advantages and limitations. This motivates us to make the respective advantages of LT and LGS descriptors complementary to each other. The multi-feature (MF) is designed to combine the local texture information and the local geometric structure information for improving the identifiability of each feature point. However, a fixed falseness does no good for guiding point registration throughout the iterations. Thus, the MF descriptor is defined as:

MF (P) = P + T_{1} LGS (P) + T_{2} LT {(P)}_{R, L},

(11)

where

T_{1}

and

T_{2}

are annealing parameters for the LGS and the LT features, respectively. The instantiation of Equation (11) is given in the implementation details section.

2.3. Multi-Feature Guided Point Set Registration Model

For feature based image registration, two sets of feature points are extracted from a pair of multi-temporal agricultural terrace images (i.e., a sensed image and a reference image), respectively. The extracted feature points contain a large number of outliers that limit the performance of current non-rigid point set registration algorithms [29,30,31]. For this issue, a robust multi-feature guided model is designed—given two point sets

A = {a_{n}}_{n = 1}^{N}

(i.e., the source point set) and

B = {b_{m}}_{m = 1}^{M}

(i.e., the target point set) which are extracted from the sensed image and the reference image, respectively. The proposed point set registration model is first (i) to estimate correspondences between

A

and

B

by the proposed MF descriptor at each iteration, and then (ii) to update the location of

A

using a non-rigid transformation built by the recovered correspondences. The steps (i) and (ii) are iterated such that the

A

can gradually and continuously approach the target point set

B

, and finally match the exact corresponding points in

B

.

2.3.1. Correspondence Estimation

In the first step, the Gaussian mixture model (GMM) is applied to estimate correspondences by measuring the similarity of the MF between two point sets, and the correspondence estimation problem is considered as a GMM probability density estimation problem. Let the MF of

a_{n}

be the centroid of the nth Gaussian component, and the MF of

b_{m}

be the mth data. The GMM probability density function (PDF) is therefore obtained as:

s (MF (b_{m})) = (1 - ω) \sum_{n = 1}^{N} P_{m n} ϕ (MF (b_{m}) | MF (a_{n})) + \frac{ω}{M},

(12)

where

ϕ (MF (b_{m}) | MF (a_{n})) = \frac{1}{2 π σ^{2}} e x p (- \frac{∥ MF (b_{m}) - MF (a_{n}) ∥^{2}}{2 σ^{2}})

with the equal isotropic covariances

σ^{2}

of MF,

P_{m n} = \frac{1}{N}

are non-negative equal quantity with

\sum_{n = 1}^{N} P_{m n} = 1

, which are called the priors of GMM.

\frac{ω}{M}

is an additional uniform distribution with a weighting parameter

ω

,

0 < ω < 1

for outlier dealing.

Once we have the PDF of GMM that is guided by the similarity of MFs, we can estimate correspondence by the posterior probability of GMM via Bayes’ rule:

s_{n m} (MF (a_{n}) | MF (b_{m})) = \frac{e x p (- \frac{∥ MF (b_{m}) - MF (a_{n}) ∥^{2}}{2 σ^{2}})}{\sum_{i = 1}^{N} e x p (- \frac{∥ MF (b_{m}) - MF (a_{i}) ∥^{2}}{2 σ^{2}}) + \frac{2 π σ^{2} ω N}{M (1 - ω)}},

(13)

by which we obtain an one-to-many fuzzy correspondence matrix

S_{N \times M}

guided by the similarity of MFs. Meanwhile, the corresponding target point set is obtained by:

\bar{B} = S \times B .

(14)

The proposed correspondence estimation method imitates the process of human practice, which measures the similarities of geometric structure feature and local texture feature. Generally, the process for humans to estimate the corresponding point of the source point a in terrace image consists of two parts: (i) searching for a region in a reference image that has a similar geometrical location and structure compared to the region that surrounds source point a, and (ii) finding a point within this region that has similar color features (LT in this paper) to the source point set. An example is shown in Figure 4, where only 100 feature points are shown for visual convenience. The two feature points a and b both have the similar pattern of LT histogram and the similar geometric feature.

2.3.2. Transformation Estimation

We model the non-rigid displacement function f by requiring it to lie within a specific functional space, namely a vector-valued reproducing kernel Hilbert space (RKHS) [32,33]. The Gaussian kernel, which is in the form

G (a_{n 1}, a_{n 2}) = e x p (- \frac{1}{2 α^{2}} ∥ a_{n 1} - a_{n 2} ∥^{2})

and of size

N \times N

, is chosen to be the associated kernel for the RKHS, where

α

is a constant to control the spatial smoothness. The function f can be defined by:

f (A) = A + G W .

(15)

Thus, the transformation estimation boils down to finding a finite parameter matrix W.

Before a direct parameter estimation, we first illustrate a rule by which a reliable transformation parameter is obtained in the estimation process.

Regularizing the transformation estimation process.
The adopted Tikhonov regularization framework [34,35,36,37] is one of the most common forms of regularization. It minimizes an energy function in an RKHS $H$ to regularize a function f, and can be written as:

$R (f) = {∥ f ∥}_{H}^{2} .$

(16)

In this paper, the function f is defined in Equation (15).

As shown in Figure 5a, the regularized function (denoted by black line) is more reasonable than its non-regularized counterpart (denoted by blue curve). The transformation of an iterative registration method is such a procedure that slowly displaces the source point set so that the correspondence estimation is easier and more reliable. In other words, regularizing the transformation is necessary to accomplish the iterative registration. Figure 5b,c indicate that the ill-posed problem will exist if the transformation is not regularized. Note that as the number of points increases, the increasing arbitrariness of the transformation will lead to more severe ill-posed problems.

The multi-feature guided fuzzy correspondence matrix

S

contains

N \times M

probabilities, hence a reliable transformation will produce a larger expectation of probabilities. Therefore, the solution of transformation estimation is detected by maximizing a likelihood function that is formed as

Π_{m = 1}^{M} s (MF (b_{m}))

, or equivalent to minimizing the negative log-likelihood function, which is formed as:

E (W, σ^{2}) = - \sum_{m = 1}^{M} l o g (1 - ω) \sum_{n = 1}^{N} P_{m n} ϕ (MF (b_{m}) | MF (a_{n})) + \frac{ω}{M} .

(17)

We use the maximizing expectation (M-step) of the expectation maximization (EM) algorithm [29] to estimate the transformation. The idea of the EM algorithm is first to guess the values of parameters (“old” parameter values) via computing the posterior probability by Equation (13) (E-step), and then to find the “new” parameter values via minimizing the expectation of the complete negative log-likelihood function (M-step), which is formed as:

Q (W, σ^{2}) = \frac{1}{2 σ^{2}} \sum_{m = 1}^{M} \sum_{n = 1}^{N} s_{n m} {∥ MF (b_{m}) - MF (f (a_{n})) ∥}^{2} + N_{S} l o g σ^{2} + \frac{μ}{2} R (f),

(18)

where

N_{S} = \sum_{m = 1}^{M} \sum_{n = 1}^{N} s_{n m} \leq M

(with

M = N_{S}

only if

ω = 0

),

R

is the regularization of the transformation, and

μ

is a weighting parameter controlling the strength of the regularization. Furthermore, with an initialized deterministic annealing parameter

σ^{2}

, the parameter W is obtained by

\underset{W}{arg min} Q

. The mathematical solution is detailed in Section 2.5.

2.4. Feature Points Based Image Registration

Let

I

and

I^{t}

be the sensed and reference images, where the source point set

A

and target point set

B

are extracted from

I

and

I^{t}

, respectively.

x (\cdot) \times y (\cdot)

denotes the size of one image. Our goal is to obtain the transformed image

\hat{I}

.

After the transformed source point set

\hat{A}

is obtained, a mapping function can be estimated based on the corresponding set that is constructed by

C = {A, \hat{A}},

and then the image registration can be realized. There are two types of mapping: (i) forward approach: directly transforming the sensed image

I

using the mapping function, and (ii) backward approach: determining the transformed image

\hat{I}

from

I

using the grid of the reference image

I^{t}

and the inverse of the mapping. Due to the discretization and rounding, (i) is complicated to implement, as it can produce holes and/or overlaps in the output image, and we use the backward approach for image transformation.

We employ the TPS (thin plate spline) transformation model which obtained by:

E^{T P S} = {(\begin{matrix} K & Φ \\ Φ^{T} & O \end{matrix})}^{- 1} (\begin{matrix} A \\ 0 \\ 0 \\ 0 \end{matrix}),

(19)

where the TPS model

E^{T P S}

is of size

(N + 3) \times 3

,

O

is a

3 \times 3

matrix of zeros and

Φ

is the

N \times 3

matrix with the nth denoting

(1, {\hat{a}}_{n})

, and the

N \times N

TPS kernel

K_{i j} = ∥ {\hat{a}}_{i} - {\hat{a}}_{j} ∥^{2} l o g ∥ {\hat{a}}_{i} - {\hat{a}}_{j} ∥

.

A regular grid

Θ_{Z \times 2}^{t} = {θ_{z}^{t}}_{z = 1}^{Z}^{T}

is obtained by a pixel-by-pixel indexing process on the reference image

I^{t}

, where

Z = X (I^{t}) \times Y (I^{t})

. Letting grid

Θ^{t}

be the source point set, and

E^{T P S}

the TPS transformation model, the transformed grid is obtained by first computing

{\hat{Θ}}_{Z \times 3}^{t} = (\begin{matrix} \bar{K} & \bar{Φ} \end{matrix}) E,

(20)

then restoring the dimension of the grid to 2 by

{\hat{Θ}}^{t} \leftarrow (\begin{matrix} {\hat{Θ}}_{(\cdot, 1)}^{t} & {\hat{Θ}}_{(\cdot, 2)}^{t} \end{matrix})

, where the

Z \times N

kernel

{\bar{K}}_{i j} = ∥ θ_{i}^{t} - {\hat{a}}_{j} ∥^{2} log ∥ θ_{i}^{t} - {\hat{a}}_{j} ∥

,

\bar{Φ}

is the

Z \times 3

matrix with the zth row denotes

(1, θ_{z}^{t})

and

{\hat{Θ}}_{(\cdot, i)}^{t}

denotes the ith column of

{\hat{Θ}}^{t}

. Let

Θ

be the grid obtained on

I

, we have

\hat{Θ} = {\hat{Θ}}^{t} \cap Θ .

(21)

Finally, the transformed image

\hat{I}

is obtained by resampling intensities from the sensed image

I

based on

\hat{Θ}

, setting the rest of pixels to black. Note that the bicubic interpolation is used to improve the smoothness of

\hat{I}

; to be more precise, the intensities of each pixel in

\hat{I}

is determined by summing the weighted neighbor pixel intensities within a

4 \times 4

window.

2.5. Implementation Details

The instantiation of Equation (11) is impeded by LT descriptor

LT

, which is constructed by a

2^{L}

dimension histogram. Actually, the goal of Equation (11) is to measure the distance of MFs, which means that instantiating

Γ_{n m} = ∥ MF (a_{n}) - MF (b_{m}) ∥

is equivalent to instantiating Equation (11). Analogically,

Γ_{n m} = ∥ (a_{n} - b_{m}) + (LGS (a_{n}) - LGS (b_{m})) + (LT (a_{n}) - LT (b_{m})) ∥

has two-dimensional terms and one

2^{L}

dimension term, hence, respectively instantiating

G_{n m} (B, A) = (b_{m} - a_{n}) + (LGS (b_{m}) - LGS (a_{n}))

(22)

and

Ψ_{n m} (B, A) = LT (b_{m}) - LT (a_{n})

(23)

is equivalent to instantiating Equation (11). The distance of geometric and similarity of LT are denoted by Equations (22) and (23), in which the instantiation of

G

can be commonly realized by Euclidean distance computation. Furthermore, in general, a human similarity estimation of LT is to differentiate the pattern of the histogram, as we discussed in Section 2.3.1. Therefore, we instantiate

Ψ

by firstly normalizing each histogram in

[0, 1]

, and then computing the quadratic distance sum of each dimension in the histogram, which is formed as:

\begin{matrix} Ψ_{n m} & = LT (a_{n}) - LT (b_{m}) \\ = \sum_{i = 1}^{2^{L}} {(LT (a_{n}, i) - LT (b_{m}, i))}^{2}, \end{matrix}

(24)

where

LT (a_{n}, i)

denotes the ith column of the histogram of

a_{n}

.

Once Equation (11) is instantiated, we can respectively rewrite Equations (13) and (18) by:

s_{n m} = \frac{e x p (- \frac{∥ G_{n m} {(B, A) ∥}^{2} + Ψ_{n m} (B, A)}{2 σ^{2}})}{\sum_{i = 1}^{N} e x p (- \frac{∥ G_{i m} {(B, A) ∥}^{2} + Ψ_{i m} (B, A)}{2 σ^{2}}) + \frac{2 π σ^{2} ω N}{M (1 - ω)}},

(25)

and

Q = \frac{1}{2 σ^{2}} \sum_{m = 1}^{M} \sum_{n = 1}^{N} s_{n m} (∥ G_{n m} (\bar{B}, f (A)) ∥^{2} + Ψ_{n m} (B, A)) + N_{S} l o g σ^{2} + \frac{μ}{2} R (f);

(26)

thus, we can complete the M-step of EM algorithm for implementing the agricultural terrace image registration. The matrix form of Equation (26) to simplify the derivative is written as:

\begin{matrix} Q & = \frac{1}{2 σ^{2}} (T r (G_{\bar{B}}^{T} S_{\bar{B}} G_{\bar{B}}) - 2 T r (G_{f (A)}^{T} S G_{\bar{B}}) + T r (G_{f (A)}^{T} S_{A} G_{f (A)}) - 2 T r (W^{T} GS G_{\bar{B}}) \\ + 2 T r (W^{T} {GS}_{A} G_{f (A)}) + T r (W^{T} {GS}_{A} G W) + T r (Ψ^{T} (B, A) S Ψ (B, A))) + N_{S} l o g σ^{2} + \frac{μ}{2} T r (W^{T} G W), \end{matrix}

(27)

where

T r

denote trace operate,

S_{A} = d i g (S 1), S_{B} = d i g (S^{T} 1)

,

1

is a column vector with all ones.

G_{P} = P + U^{T} (P) P

, where operator

U (P)

is defined by

U_{i j} (P) = \{\begin{matrix} η_{i k} - K, & \vec{p_{i} p_{j}} \in {\vec{p_{i} p_{i_{k}}}}_{k = 1}^{K}, \\ - K, & \vec{p_{i} q_{j}} \notin {\vec{p_{i} q_{i_{k}}}}_{k = 1}^{K}, \end{matrix}

with

P = {p_{i}}_{i = 1}^{I}

denoting a non-representational point set containing I points,

i_{k}

is the index of the kth neighbor of the ith point, K is the number of neighbors, and

η_{i k}

is the weight of LGS defined by Equation (9). The partial derivative of Equation (27) with respect to the parameter W is obtained by:

\frac{\partial Q}{\partial W} = - \frac{GS G_{\bar{B}}}{σ^{2}} + \frac{{GS}_{A} G_{A}}{σ^{2}} + \frac{{GS}_{A} G W}{σ^{2}} + μ G W .

(28)

Setting Equation (28) to zero, the parameter W is obtained by:

W = {({GS}_{A} G + μ σ^{2} G)}^{- 1} (GS G_{\bar{B}} - {GS}_{A} G_{A}) .

(29)

Parameter Setting

For evaluating image features, four groups of parameters—(i) R, the radius of the circular neighborhood in DRLBP; (ii) L, the number of the neighbors in DRLBP; (iii)

τ

, the window size controlled parameter in LT; and (iv)

T_{1}

and

T_{2}

, two annealing parameters for LGS and LT—are used. We set

R = 1

,

L = 8

,

τ = 10

,

T_{1} = e x p (- i t e r / 10)

and

T_{2} = e x p (- i t e r / 50)

.

For registering extracted feature points, six parameters—(i)

ω

, outlier weighting parameter; (ii)

α

, a constant to control the spatial smoothness; (iii)

σ^{2}

, the equal isotropic covariances of MF; (iv) W, the parameter of point set transformation; (v)

μ

, the weighting parameter of regularization and (vi)

i t e r_{m a x}

, the max number of iteration—are used. We set

ω = 0.7

,

α = 2

and

i t e r_{m a x} = 50

, and initialize W as a matrix with all zeros. Moreover,

σ^{2}

and

μ

are first initialized by

σ^{2} = \frac{N T r (A^{T} A) - 2 (1 A) {(1 B)}^{T} + M T r (B^{T} B)}{2 N M}

and

μ = 8

, then annealed as

σ^{2} \leftarrow \frac{| T r (A^{T} S_{A} A) - 2 T r (f {(A)}^{T} S \bar{B}) + T r ({\bar{B}}^{T} S_{B} \bar{B}) |}{2 N_{S}}

and

μ \leftarrow \frac{{(i t e r_{m a x}^{4} - i t e r^{4} + 1)}^{\frac{1}{4}}}{i t e r_{m a x}} \times μ,

respectively.

The pseudo-code of our feature points based agricultural terrace image registration is outlined in Algorithm 1.

Algorithm 1: Local texture feature and geometric structure feature guided agricultural terrace image registration

2.6. Computational Complexity

The computational complexity of each part in our feature points based agricultural terrace image registration is as follows:

Image preprocessing:
time complexity: $O (X_{1} Y_{1} + X_{2} Y_{2})$ , space complexity: $O (X_{1} Y_{1} + X_{2} Y_{2})$ .
Feature extraction:
time complexity for feature points selection : $O (X_{1} Y_{1} + X_{2} Y_{2})$ , space complexity: $O (N + M)$ ;
time complexity for computing LT: $O (N X_{1}^{2} Y_{1}^{2} + M X_{2}^{2} Y_{2}^{2})$ , space complexity: $O (N X_{1} Y_{1} + M X_{2} Y_{2})$ ;
time complexity for computing LGS: $O (N^{2} + M^{2})$ , space complexity: $O (N + M)$ .
Point set registration:
time complexity for correspondence estimation: $O (N M)$ , space complexity: $O (N M)$ ;
time complexity for transformation estimation: $O (N^{3} + M^{3})$ , space complexity: $O (N^{2} + M^{2} + N M)$ .
Image registration:
time complexity: $O (N^{2} + X_{1} Y_{1} N + X_{1} Y_{1} + M^{2} + X_{2} Y_{2} M + X_{2} Y_{2})$ , space complexity: $O (N^{2} + X_{1} Y_{1} N + X_{1} Y_{1} + M^{2} + X_{2} Y_{2} M + X_{2} Y_{2})$ ,

where

X_{1}, X_{2}, Y_{1}, Y_{2}

are the widths and heights of the source image and sensed image, respectively;

N, M

are numbers of the feature points extracted from the source and sensed image. Overall, the time complexity of the proposed method is

O (N X_{1}^{2} Y_{1}^{2} + M X_{2}^{2} Y_{2}^{2})

, and the space complexity is

O (N X_{1} Y_{1} + M X_{2} Y_{2})

.

3. Experiments and Results

3.1. Experiments Design

CPD (coherent point drift) [29], GLMDTPS (global and local mixture distance with thin plate spline transformation) [30], SIFT (scale invariant feature transform) [38] and SURF (speeded-up robust features) [39], four state-of-the-art methods, are compared against our method in the following experiments. SIFT and SURF methods used the open source

V L F e a t

toolbox with the threshold 1 and the Matlab open source

O p e n S U R F

function with the default setting, respectively. We design two series of experiments: (i) due to the employment of the same feature point sets, the quantitative comparison on feature point matching is carried out on CPD, GLMDTPS and our method using the precision ratio (PR) [40]; (ii) quantitative comparison and qualitative demonstration on image registration are carried out on all the methods using the root of mean square error (RMSE), mean absolute error (MAE) and standard deviation (SD). The experimental dataset includes 20 pairs multi-temporal (4–5 month time interval) and multi-viewpoint agricultural terrace images (

500 \times 300

pixels) captured from Longji and Yunhe terraces, China (see Figure 6). All agricultural terrace images were obtained by a small UAV (the DJI Phantom 4 Pro (SZ, China), store homepage: [41]) with a CMOS camera. The small UAV basically maintained the same flight height (around 50–70 m) for collecting multi-temporal images of the same locations, but appropriately changed the imaging perspective for generating different geometric distortions, and different overlapping degrees of image pairs. All experiments are tested on a PC with 2.60 GHz Intel CPU and 16 GB memory.

3.2. Evaluation Criterion

The PR is usually employed to estimate the accuracy of feature point matching and defined as

P R = \frac{T P}{T P + F P},

(30)

where

T P

and

F P

denote the true positive and the false positive, respectively. The positive indicates inliers and the negative indicates outliers.

The RMSE, MAE and SD are usually used to quantify image registration accuracy. We manually determine 20 pairs of landmarks between the sensed image and the reference image as ground truth, and all the landmarks are well-distributed and selected in the easily identified places around agricultural terraces. The related formulations and the definitions in statistics are as follows:

R M S E = \sqrt{\frac{1}{N^{l}} \sum_{n = 1}^{N^{l}} {∥ a_{n}^{l} - b_{n}^{l} ∥}^{2}},

(31)

M A E = \frac{\sum_{n = 1}^{N^{l}} ∥ a_{n}^{l} - b_{n}^{l} ∥}{N^{l}},

(32)

S D = \sqrt{\frac{1}{N^{l}} \sum_{n = 1}^{N^{l}} {(d (a_{n}^{l}, b_{n}^{l}) - R M S E)}^{2}},

(33)

where

a_{n}^{l}

and

b_{n}^{l}

are the nth pair of corresponding landmarks in the sensed image and the reference image, respectively.

N^{l}

is the total number of selected landmarks, and the operator

d (\cdot, \cdot)

denotes the distance.

3.3. Results of Feature Matching

All agricultural terrace image pairs contain viewpoint changes and were captured with a 4–5 month time interval. By using the proposed preprocessing method, each image pair has 624 to 1513 feature points. For the quantitative comparison and visual demonstration, we evenly selected 100 feature points with more than 50% outlier (negative) to data rate for calculating the PR and visualizing the matching results as shown in Table 1 and Figure 7. CPD and GLMDTPS gave poor performances since they estimated correspondences only using the Euclidean distance and the mixed geometric features, respectively, although CPD applied the motion coherent based geometric constraint to regularize the displacement field and GLMDTPS applied the annealing scheme to gradually change the transformation from rigid to non-rigid during registration. Our method gave the best matching performance in all image pairs since the local texture feature and the local geometric structure feature are combined and very complementary.

3.4. Results of Image Registration

For CPD, GLMDTPS and our method, all extracted feature points were used for image registration. Feature points for SIFT and SURF were extracted by their default setting. The quantitative comparison using the mean RMSE, MAE and SD are shown in Table 2. The transformed images and the checkboards in four typical image registration examples are shown in Figure 8. Registering images using sparse correspondence has the advantages of being potentially faster and easily maintains a high feature point matching ratio. However, the image registration accuracy might not be high since a desired image transformation with acceptable accuracy can only be established based upon more dense correspondence (e.g., more feature points). This can be explained since the image transformation is basically interpolated from correspondences of feature points, thereby an adequate number of correspondences play a crucial role in yielding detailed transformation.

In this experiment, SIFT and SURF, which extract a relative small number of feature points, failed all and 17 registrations, respectively, although SURF performed well in the other three registrations (see Table 2). The reason was that the image registrations with a small number of feature points are sensitive to mismatching. Although CPD and GLMDTPS employed the same number of feature points with our method, the geometric features used in CPD and GLMDTPS were sensitive to outliers and similar neighborhood structures in multi-temporal images. Therefore, they also gave the relatively poor registration accuracies (e.g., GLMDTPS failed five registrations). In our method, the local texture feature and the local geometric structure feature of terrace images are combined well to improve the feature description of points, while helping to reject outliers. The outliers were rejected in the point matching step, but used to yield detailed transformation in the image registration step. Therefore, our method gave the best registration performance.

4. Conclusions

Due to soil erosion, illegal land occupations, agricultural land management practices and low purchasing power in China’s southeast and southwest mountain area, smallholder farmers as well as local governments require a light-weight and low cost technology such as DJI Phantom UAVs to monitor their planting area in terraces. However, multi-temporal images of the same planting area captured by small UAVs have only visible image information and are always accompanied by viewpoint changes, image geometric distortions, low image overlapping, brightness changes and color changes such that the images may not be directly used for dynamic agricultural terrace monitoring. Thus, transforming multi-temporal images into one coordinate system is necessary in order to be able to subsequently compare or integrate planting area information for dynamic agricultural terrace monitoring.

In this work, we have presented a small UAV based multi-temporal image registration method. The proposed method first designed a guided image filtering to enhance terrace ridges in multi-temporal images, and a multi-feature descriptor was applied to combine the texture feature and the geometric structure feature of terrace images for improving the description of feature points and rejecting outliers. The multi-feature guided model then provided an accurate guiding for feature point set registration, and the feature points based image registration finally gave an accurate image registration. Experiments on 20 pairs of multi-temporal terrace images captured by a DJI Phantom 4 Pro demonstrated that our method gives the best registration performance, and outperforms four state-of-the-art methods. To fully realize the dynamic agricultural terrace monitoring, future work will focus on core decision rules of automatic change detection algorithms between the registered images. For registering other landscape elements, the image enhancement preprocessing, and multi-feature extraction and combination steps (as described in Section 2.1 and Section 2.2), should focus on object-specific features that are invariant to various imaging perspectives, color and brightness changes.

Acknowledgments

The authors wish to thank David G. Lowe, Herbert Bay, Andriy Myronenko, Zhaoxia Liu and Yang Yang for providing their implementation source codes and test data sets. This greatly facilitated the comparison experiments. This work was supported by (i) the National Nature Science Foundation of China (41661080); (ii) the Scientific Research Foundation of Yunnan Provincial Department of Education (2017TYS045); (iii) the Doctoral Scientific Research Foundation of Yunnan Normal University (01000205020503065); and (iv) the National Undergraduate Training Program for Innovation and Entrepreneurship (201610681002).

Author Contributions

Yang Yang, Ziquan Wei and Kun Yang developed the method; Kun Yang and Yi Luo designed the data acquisition of agricultural terrace images by a small UAV; Mengya Li and Yifeng Han conceived and designed the experiments; Yang Yang, Ziquan Wei, Mengya Li and Yifeng Han performed the experiments and analyzed the data; Sim-Heng Ong and Ziquan Wei helped technology implementation of the method; Yang Yang and Ziquan Wei wrote the paper. All the authors reviewed and provided valuable comments for the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

Guo, Y.; Wu, Y.; Ju, Z.; Wang, J.; Zhao, L. Remote sensing image classification by the Chaos Genetic Algorithm in monitoring land use changes. Math. Comput. Model. 2010, 51, 1408–1416. [Google Scholar]
Małgorzata, W.; Piotr, M. Automatic relief classification versus expert and field based landform classification for the medium-altitude mountain range, the Sudetes, SW Poland. Geomorphology 2014, 206, 133–146. [Google Scholar]
Doxani, G.; Karantzalos, K.; Tsakiri-Strati, M. Monitoring urban changes based on scale-space filtering and object-oriented classification. Int. J. Appl. Earth Obs. Geoinform. 2012, 15, 38–48. [Google Scholar] [CrossRef]
Müllerová, J.; Pergl, J.; Pyšek, P. Remote sensing as a tool for monitoring plant invasions: Testing the effects of data resolution and image classification approach on the detection of a model plant species Heracleum mantegazzianum (giant hogweed). Int. J. Appl. Earth Obs. Geoinform. 2013, 25, 55–65. [Google Scholar] [CrossRef]
Drǎguţ, L.; Blaschke, T. Automated classification of landform elements using object-based image analysis. Geomorphology 2006, 81, 330–344. [Google Scholar] [CrossRef]
Ho, L.; Yamaguchi, Y.; Umitsu, M. Automated micro-landform classification by combination of satellite images and SRTM DEM. In Proceedings of the 2011 IEEE International Geoscience and Remote Sensing Symposium, Vancouver, BC, Canada, 24–29 July 2011; pp. 3058–3061. [Google Scholar]
Lu, D.; Mausel, P.; Brondízio, E.; Moran, E. Change detection techniques. Int. J. Remote Sens. 2004, 25, 2365–2401. [Google Scholar] [CrossRef]
Prima, O.; Ayako, E.; Ryuzo, Y.; Takeyoshi, Y. Supervised landform classification of Northeast Honshu from DEM-derived thematic maps. Geomorphology 2006, 78, 373–386. [Google Scholar] [CrossRef]
Camargo, F.; Almeida, C.; Florenzano, T.; Heipke, C.; Feitosa, R.; Costa, G. ASTER/Terra Imagery and a Multilevel Semantic Network for Semi-automated Classification of Landforms in a Subtropical Area. Photogramm. Eng. Remote Sens. 2011, 11, 619–629. [Google Scholar] [CrossRef]
Karydas, C.G.; Sekuloska, T.; Sarakiotis, I. Fine scale mapping of agricultural landscape features to be used in environmental risk assessment in an olive cultivation area. IASME Trans. 2005, 4, 582–589. [Google Scholar]
Pradhan, B.; Chaudhari, A.; Adinarayana, J.; Buchroithner, M.F. Soil erosion assessment and its correlation with landslide events using remote sensing data and GIS: A case study at Penang Island, Malaysia. Environ. Monit. Assess. 2012, 184, 715–727. [Google Scholar] [CrossRef] [PubMed]
Ventura, G.; Vilardo, G.; Terranova, C.; Sessa, E.B. Tracking and evolution of complex active landslides by multi-temporal airborne LiDAR data: The Montaguto landslide (Southern Italy). Remote Sens. Environ. 2011, 115, 3237–3248. [Google Scholar] [CrossRef]
Martha, T.R.; Kerle, N.; van Westen, C.J.; Jetten, V.; Kumar, K.V. Object-oriented analysis of multi-temporal panchromatic images for creation of historical landslide inventories. ISPRS J. Photogramm. Remote Sens. 2012, 67, 105–119. [Google Scholar] [CrossRef]
Bailly, J.S.; Levavasseur, F. Potential of linear features detection in a Mediterranean landscape from 3D VHR optical data: Application to terrace walls. In Proceedings of the 2012 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Munich, Germany, 22–27 July 2012; pp. 7110–7113. [Google Scholar]
Wulder, M.A.; White, J.C.; Nelson, R.F.; Næsset, E.; Ørka, H.O.; Coops, N.C.; Hilker, T.; Bater, C.W.; Gobakken, T. Lidar sampling for large-area forest characterization: A review. Remote Sens. Environ. 2012, 121, 196–209. [Google Scholar] [CrossRef]
Demoulin, A.; Bovy, B.; Rixhon, G.; Cornet, Y. An automated method to extract fluvial terraces from digital elevation models: The Vesdre valley, a case study in eastern Belgium. Geomorphology 2007, 91, 51–64. [Google Scholar] [CrossRef]
Martínez-Casasnovas, J.A.; Ramos, M.C.; Cots-Folch, R. Influence of the EU CAP on terrain morphology and vineyard cultivation in the Priorat region of NE Spain. Land Use Policy 2010, 27, 11–21. [Google Scholar] [CrossRef]
Del Val, M.; Iriarte, E.; Arriolabengoa, M.; Aranburu, A. An automated method to extract fluvial terraces from LiDAR based high resolution digital elevation models: The Oiartzun Valley, a case study in the Cantabrian margin. Quat. Int. 2015, 364, 35–43. [Google Scholar] [CrossRef]
Li, Y.; Gong, J.; Wang, D.; An, L.; Li, R. Sloping farmland identification using hierarchical classification in the Xi-He region of China. Int. J. Remote Sens. 2013, 34, 545–562. [Google Scholar] [CrossRef]
Gioia, D.; Bavusi, M.; Di Leo, P.; Giammatteo, T.; Schiattarella, M. A geoarchaeological study of the metaponto coastal belt, southern Italy, based on geomorphological mapping and gis-supported classification of landforms. Geogr. Fis. Din. Quat. 2016, 39, 137–148. [Google Scholar]
Yang, K.; Pan, A.; Yang, Y.; Zhang, S.; Ong, S.H.; Tang, H. Remote Sensing Image Registration Using Multiple Image Features. Remote Sens. 2017, 9, 581. [Google Scholar] [CrossRef]
Diaz-Varela, R.; Zarco-Tejada, P.; Angileri, V.; Loudjani, P. Automatic identification of agricultural terraces through object-oriented analysis of very high resolution DSMs and multispectral imagery obtained from an unmanned aerial vehicle. J. Environ. Manag. 2014, 134, 117–126. [Google Scholar] [CrossRef] [PubMed]
Deffontaines, B.; Chang, K.J.; Champenois, J.; Fruneau, B.; Pathier, E.; Hu, J.C.; Lu, S.T.; Liu, Y.C. Active interseismic shallow deformation of the Pingting terraces (Longitudinal Valley-Eastern Taiwan) from UAV high-resolution topographic data combined with InSAR time series. Geomat. Nat. Hazards Risk 2016, 8, 120–136. [Google Scholar] [CrossRef]
He, K.; Sun, J.; Tang, X. Guided image filtering. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 1397–1409. [Google Scholar] [CrossRef] [PubMed]
Levin, A.; Lischinski, D.; Weiss, Y. A closed-form solution to natural image matting. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 30, 228–242. [Google Scholar] [CrossRef] [PubMed]
Shi, J.; Tomasi, C. Good features to track. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’94), Seattle, WA, USA, 21–23 June 1994; pp. 593–600. [Google Scholar]
Serby, D.; Meier, E.; Van Gool, L. Probabilistic object tracking using multiple features. In Proceedings of the 17th International Conference on Pattern Recognition (ICPR 2004), Cambridge, UK, 26 August 2004; Volume 2, pp. 184–187. [Google Scholar]
Mehta, R.; Egiazarian, K. Dominant rotated local binary patterns (DRLBP) for texture classification. Pattern Recognit. Lett. 2016, 71, 16–22. [Google Scholar] [CrossRef]
Myronenko, A.; Song, X. Point set registration: Coherent point drift. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 2262–2275. [Google Scholar] [CrossRef] [PubMed]
Yang, Y.; Ong, S.H.; Foong, K.W.C. A robust global and local mixture distance based non-rigid point set registration. Pattern Recognit. 2015, 48, 156–173. [Google Scholar] [CrossRef]
Ma, J.; Qiu, W.; Zhao, J.; Ma, Y.; Yuille, A.L.; Tu, Z. Robust L₂E Estimation of Transformation for Non-Rigid Registration. IEEE Trans. Signal Process. 2015, 63, 1115–1129. [Google Scholar] [CrossRef]
Ma, J.; Zhao, J.; Ma, Y.; Tian, J. Non-rigid visible and infrared face registration via regularized Gaussian fields criterion. Pattern Recognit. 2015, 48, 772–784. [Google Scholar] [CrossRef]
Ma, J.; Zhao, J.; Tian, J.; Yuille, A.L.; Tu, Z. Robust point matching via vector field consensus. IEEE Trans. Image Process. 2014, 23, 1706–1721. [Google Scholar]
Tikhonov, A.N.; Arsenin, V.Y. Solutions of Ill-Posed Problems; VH Winston & Sons: Washington, DC, USA, 1977. [Google Scholar]
Chen, Z.; Haykin, S. On different facets of regularization theory. Neural Comput. 2002, 14, 2791–2846. [Google Scholar] [CrossRef] [PubMed]
Schölkopf, B.; Smola, A.J. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond; MIT Press: Cambridge, MA, USA, 2002. [Google Scholar]
Ma, J.; Zhao, J.; Tian, J.; Bai, X.; Tu, Z. Regularized vector field learning with sparse approximation for mismatch removal. Pattern Recognit. 2013, 46, 3519–3532. [Google Scholar] [CrossRef]
Lowe, D.G. Object recognition from local scale-invariant features. In Proceedings of the Seventh IEEE International Conference on Computer Vision, Kerkyra, Greece, 20–27 September 1999; Volume 2, pp. 1150–1157. [Google Scholar]
Bay, H.; Ess, A.; Tuytelaars, T.; Van Gool, L. Speeded-up robust features (SURF). Comput. Vis. Image Underst. 2008, 110, 346–359. [Google Scholar] [CrossRef]
Jian, B.; Vemuri, B.C. Robust Point Set Registration Using Gaussian Mixture Models. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 33, 1633–1645. [Google Scholar] [CrossRef] [PubMed]
The Store Homepage of the DJI Phantom 4 Pro. Available online: http://store.dji.com/product/phantom-4-pro/ (accessed on 30 August 2017).

Figure 1. An example of guided image filtering. The images before and after filtering are shown in the first row, and the enhanced ridges are marked by the red windows. The image gradients before and after filtering are shown in the last row, exhibiting the gradient preserving by the GIF.

Figure 2. An example of feature points extraction. There are 1270 feature points extracted and denoted by blue circles.

Figure 3. An example of weighting gray image that is based on a feature point

p^{t}

. The left one is the original terrace gray image, and the right is the weighting gray image. The effect of the weight operator is shown in a red window, and the size of the window is controlled by the parameter

τ^{2}

.

Figure 3. An example of weighting gray image that is based on a feature point

p^{t}

. The left one is the original terrace gray image, and the right is the weighting gray image. The effect of the weight operator is shown in a red window, and the size of the window is controlled by the parameter

τ^{2}

.

Figure 4. Example on estimating correspondence between two terrace images. In the first row, histograms of the points

a

and

b

are provided, respectively. In the second row, the points

a

and

b

are classified as the corresponding pair, which are connected by the red line.

Figure 4. Example on estimating correspondence between two terrace images. In the first row, histograms of the points

a

and

b

are provided, respectively. In the second row, the points

a

and

b

are classified as the corresponding pair, which are connected by the red line.

Figure 5. Two examples for demonstrating the importance of regularization. (a) two different results for a function estimation problem. The black line and blue curve denote the estimated functions with or without being regularized, respectively. In addition, the red asterisks denote 11 data points; (b) point set transformation and its velocity field in regularized scenario; (c) point set transformation and its velocity field in a non-regularized scenario. In (b,c), the blue and red points denote the source and target point, respectively.

Figure 6. Location of Longji and Yunhe terraces. Longji terrace is located in Guangxi province, China (Longitude range:

110^{\circ} 08^{'} 65^{″}

E to

110^{\circ} 14^{'} 15^{″}

E; Latitude range:

25^{\circ} 72^{'} 51^{″}

N to

2^{\circ} 76^{'} 77^{″}

N.). Yunhe terrace is located in Zhejiang province, China (Longitude range:

11^{\circ} 42^{'} 73^{″}

E to

119^{\circ} 51^{'} 11^{″}

E; Latitude range:

28^{\circ} 00^{'} 58^{″}

N to

28^{\circ} 10^{'} 01^{″}

N).

Figure 6. Location of Longji and Yunhe terraces. Longji terrace is located in Guangxi province, China (Longitude range:

110^{\circ} 08^{'} 65^{″}

E to

110^{\circ} 14^{'} 15^{″}

E; Latitude range:

25^{\circ} 72^{'} 51^{″}

N to

2^{\circ} 76^{'} 77^{″}

N.). Yunhe terrace is located in Zhejiang province, China (Longitude range:

11^{\circ} 42^{'} 73^{″}

E to

119^{\circ} 51^{'} 11^{″}

E; Latitude range:

28^{\circ} 00^{'} 58^{″}

N to

28^{\circ} 10^{'} 01^{″}

N).

Figure 7. Feature matching demonstrations on four typical agricultural terrace image pairs. In (a,b), the first to the fourth rows are: the image pairs, the results on feature matching of CPD (coherent point drift), GLMDTPS (global and local mixture distance with thin plate spline transformation), and ours, respectively. Red circles denote feature points extracted by our method. Blue lines indicate the false positive and the false negative, yellow lines indicate the true positive, and red crosses indicate the true negative.

Figure 8. Agricultural terrace image registration examples on four typical image pairs. In (a,b), the first to the fourth rows are: the image pairs, transformed images and checkboards built by CPD (coherent point drift), GLMDTPS (global and local mixture distance with thin plate spline transformation), and ours, respectively. Yellow crosses denote 20 pairs landmarks. In (c,d), the first rows show the image pairs and the second rows show the transformed images and the checkboards built by SIFT (scale invariant feature transform) and SURF (speeded-up robust features), respectively. Red circles denote the extracted feature points.

Table 1. Experimental results on series (i). Quantitative comparisons on the mean PR (precision ratio). Bold fonts indicate the best results. All units are in percentages. CPD (coherent point drift) denotes the method [29] and GLMDTPS (global and local mixture distance with thin plate spline transformation) denotes the method [30].

Method	CPD	GLMDTPS	Ours
PR	1.45%	10.43%	63.1%

Table 2. Experimental results on series (ii). Quantitative comparisons on image registration measured using the mean RMSE (root of mean square error), MAE (mean absolute error) and SD (standard deviation) are carried out.

0 \leq θ \leq 20

denotes the number of the failed registrations (can not manually identify landmarks). Bold fonts indicate the best results, and all units are in pixels. CPD (coherent point drift) denotes the method [29], GLMDTPS (global and local mixture distance with thin plate spline transformation) denotes the method [30], SIFT (scale invariant feature transform) denotes the method [38] and SURF (speeded-up robust features) denotes the method [39].

Table 2. Experimental results on series (ii). Quantitative comparisons on image registration measured using the mean RMSE (root of mean square error), MAE (mean absolute error) and SD (standard deviation) are carried out.

0 \leq θ \leq 20

denotes the number of the failed registrations (can not manually identify landmarks). Bold fonts indicate the best results, and all units are in pixels. CPD (coherent point drift) denotes the method [29], GLMDTPS (global and local mixture distance with thin plate spline transformation) denotes the method [30], SIFT (scale invariant feature transform) denotes the method [38] and SURF (speeded-up robust features) denotes the method [39].

Method	$θ$	RMSE	MAE	SD
CPD	0	41.78	25.18	50.54
GLMDTPS	5	21.35	6.95	24.51
SIFT	20	-	-	-
SURF	17	11.51	3.51	14.30
Ours	0	29.95	10.70	37.89

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wei, Z.; Han, Y.; Li, M.; Yang, K.; Yang, Y.; Luo, Y.; Ong, S.-H. A Small UAV Based Multi-Temporal Image Registration for Dynamic Agricultural Terrace Monitoring. Remote Sens. 2017, 9, 904. https://doi.org/10.3390/rs9090904

AMA Style

Wei Z, Han Y, Li M, Yang K, Yang Y, Luo Y, Ong S-H. A Small UAV Based Multi-Temporal Image Registration for Dynamic Agricultural Terrace Monitoring. Remote Sensing. 2017; 9(9):904. https://doi.org/10.3390/rs9090904

Chicago/Turabian Style

Wei, Ziquan, Yifeng Han, Mengya Li, Kun Yang, Yang Yang, Yi Luo, and Sim-Heng Ong. 2017. "A Small UAV Based Multi-Temporal Image Registration for Dynamic Agricultural Terrace Monitoring" Remote Sensing 9, no. 9: 904. https://doi.org/10.3390/rs9090904

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Small UAV Based Multi-Temporal Image Registration for Dynamic Agricultural Terrace Monitoring

Abstract

1. Introduction

2. Methodology

2.1. Guided Image Filtering Based Agricultural Terrace Image Preprocessing

2.2. Features Extraction and Combination

2.2.1. Local Texture Feature Descriptor

2.2.2. Local Geometric Structure Feature Descriptor

2.2.3. Multi-Feature Descriptor

2.3. Multi-Feature Guided Point Set Registration Model

2.3.1. Correspondence Estimation

2.3.2. Transformation Estimation

2.4. Feature Points Based Image Registration

2.5. Implementation Details

Parameter Setting

2.6. Computational Complexity

3. Experiments and Results

3.1. Experiments Design

3.2. Evaluation Criterion

3.3. Results of Feature Matching

3.4. Results of Image Registration

4. Conclusions

Acknowledgments

Author Contributions

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI