An Empirical Study of Exhaustive Matching for Improving Motion Field Estimation

Vanel Lazcano

doi:10.3390/info9120320

Abstract

Optical flow is defined as the motion field of pixels between two consecutive images. Traditionally, in order to estimate pixel motion field (or optical flow), an energy model is proposed. This energy model is composed of (i) a data term and (ii) a regularization term. The data term is an optical flow error estimation and the regularization term imposes spatial smoothness. Traditional variational models use a linearization in the data term. This linearized version of data term fails when the displacement of the object is larger than its own size. Recently, the precision of the optical flow method has been increased due to the use of additional information, obtained from correspondences computed between two images obtained by different methods such as SIFT, deep-matching, and exhaustive search. This work presents an empirical study in order to evaluate different strategies for locating exhaustive correspondences improving flow estimation. We considered a different location for matching random locations, uniform locations, and locations on maximum gradient magnitude. Additionally, we tested the combination of large and medium gradients with uniform locations. We evaluated our methodology in the MPI-Sintel database, which represents the state-of-the-art evaluation databases. Our results in MPI-Sintel show that our proposal outperforms classical methods such as Horn-Schunk, TV-L1, and LDOF, and our method performs similar to MDP-Flow.

Keywords:

motion estimation; large displacement; color and gradient constancy constraint

1. Introduction

The apparent movement of pixels in a sequence of images is called the optical flow. Optical flow estimation is one of the most challenging problems in computer vision, especially in real scenarios where large displacements and illumination changes occur. Optical flow has many applications, including autonomous flight of vehicles, insertion of objects on video, slow camera motion generation, and video compression.

In particular, in video compression, the optical flow helps to remove temporal data redundancy and therefore to attain high compression ratios. In video processing, optical flow estimation is used, e.g., for deblurring and noise suppression.

In order to estimate the flow field that represents the motion of pixels in two consecutive frames of a video sequence, most of the optical flow methods are grounded in the optical flow constraint. This constraint is based on the brightness constancy assumption, which states that the brightness or intensity of objects remains constant from frame to frame along the movement of objects.

Let us consider two consecutive image frames,

I_{0}

(reference) and

I_{1}

(target), of a video sequence, where

I_{0}, I_{1} : Ω \to R

where

Ω \subset R^{2}

is the image domain (which is assumed to be a rectangle in

R^{2}

) and

u : Ω \to R^{2}

, and

u (x) = (u_{1} (x), u_{2} (x))

i.e.,

u_{1}, u_{2} : Ω \to R

. The aim is to estimate a 2D motion field

u (x)

the optical flow, such that the image points

I_{0} (x)

and

I_{1} (x + u (x))

are observations of the same physical scene point. In other words, the brightness constancy assumption writes

I_{0} (x) = I_{1} (x + u (x))

(1)

for x ∈

Ω

. Let us assume that the displacement

u (x)

is small enough to be valid the following linearized version of the brightness constancy assumption:

I_{0} (x) = I_{1} (x) + < \nabla I_{1} (x) \cdot u (x) >

(2)

where

<, >

denotes scalar product, and

\nabla I_{1}

denotes the gradient of

I_{1}

with respect to the space coordinates x and y, respectively. This equation can be rewritten as

\partial_{t} I_{1} + < \nabla I_{1} (x), u (x) > = 0

(3)

where

\partial_{t} I_{1}

denotes

I_{1} (x) - I_{0} (x)

. Equation (3) is usually called the optical flow equation or optical flow constraint.

The optical flow constraint expressed by Equation (3) is only suitable when the partial derivatives can be correctly approximated. This is the case when the motion field is small enough or images are very smooth. However, in the presence of large displacements, these conditions are not typically preserved, and it is common to replace it with a nonlinear formulation [1].

I_{0} (x) - I_{1} (x + u (x)) = 0

(4)

for

x \in Ω

.

Neither Equation (3) nor Equation (4) can be solved pointwise since the number of parameters (the two components

u_{1} (x), u_{2} (x)

of

u (x)

) to be estimated is larger than the number of equations.

To challenge these problems, a variational approach could be used and compute the optical flow

u

by minimizing the following energy or error measure,

J (u) = \int_{Ω} {|I_{0} (x) - I_{1} (x + u (x))|}^{2} d x .

(5)

However, Equation (5) is an ill-posed problem, which is usually challenged by adding a regularity prior. Then, the regularization term added to the energy model allows for a definition of the structure of the motion field and ensures that the optical flow computation is well posed. Ref. [2] proposed adding a quadratic regularization term. This work was the first that introduced variational methods to compute dense optical flow. An optical flow

u

is estimated as the minimizer of the following energy functional

J (u (x)) = \int_{Ω} {|I_{0} (x) - I_{1} (x + u (x))|}^{2} d x + \int_{Ω} {|\nabla u (x)|}^{2} d x

(6)

where

\nabla u (x) = (\nabla u_{1} (x), \nabla u_{2} (x))

denotes the gradient of the optical flow. This

L^{2}

regularity prior does not cope well with motion discontinuities and other regularization terms have been proposed [3,4,5]. Although the original work of Horn and Schunk reveals many limitations (e.g., the computed optical flow is very smooth and sensitive to the presence of noise), it has inspired many proposals. In order to cope with large displacements, optimization typically proceeds in a coarse-to-fine manner (also called a multi-scale strategy).

Let us now focus on the data term in (6),

J_{D} (u (x)) = \int_{Ω} {|I_{0} (x) - I_{1} (x + u (x))|}^{2} d x

. The brightness constancy constraint assumes that the illumination of the scene is constant over time and that the image brightness of a point remains constant along its motion trajectory. Therefore, changes in brightness are only due to different objects and different movements.

Let us briefly remark that a problem is called ill-posed if its solution either does not exist or it is not unique. We have observed that the optical flow Equation (3) (or (4)) is an ill-posed problem: it cannot be solved pointwise as there is a unique equation as well as two unknowns

u_{1} (x)

and

u_{2} (x)

. The optical flow expressed by Equation (3) can be rewritten as

< \nabla I, u (x) > = - \partial_{t} I (x)

(7)

where

\partial_{t} I

is the temporal derivative of I, by considering a local orthonormal basis

{e_{1}, e_{2}}

of

R^{2}

on directions of

\nabla I (x)

and

\nabla I {(x)}^{⊥}

. The optical flow

u (x)

can be expressed in this basis,

u = u_{p} (x) e_{1} + u_{⊥} (x) e_{2}

, where

u_{p} (x) e_{1}

is the projection of

u (x)

in the gradient direction and

u_{⊥} (x) e_{2}

is the projection of the optical flow

u (x)

in the direction perpendicular to the gradient. Thus, Equation (7) can be formally written as

< \nabla I (x), u_{p} (x) e_{1} + u_{⊥} (x) e_{2} > = ∥ \nabla I (x) ∥ u_{p} (x) = - \partial_{t} I (x) .

(8)

This equation can be solved for

u_{p} (x)

,

u_{p} (x) = - \frac{\partial_{t} I (x)}{∥ \nabla I (x) ∥}

, if

∥ \nabla I (x) ∥ \neq 0

. In other words, the only component of the optical flow that can be determined from Equation (3) is the component parallel to the gradient direction. This indeterminacy is called the aperture problem.

1.1. Optical Flow Estimation

The most accurate techniques that address the motion estimation problem are based on the formulation of the optical flow estimation in a variational setting. Energy-based methods are called global methods since they find correspondences by minimizing an energy defined on the whole image (such as inthe minimizing problem of Equation (6)). They provide a dense solution with subpixel accuracy and are usually called dense optical flow methods.

The authors in [2] estimate a dense optical flow field based on two assumptions: the brightness constancy assumption and a smooth spatial variation of the optical flow. They proposed the following functional:

J (u) = \int_{Ω} {|\partial_{x} I u_{1} (x) + \partial_{y} I u_{2} (x) + \partial_{t} I (x)|}^{2} d x + α^{2} \int_{Ω} {|\nabla u_{1} (x)|}^{2} + {|\nabla u_{2} (x)|}^{2} d x

(9)

where

α

is a real parameter that controls the influence of the smooth term. This functional is convex and has a unique minimizer. However, the computed optical flow is very smooth and does not preserve discontinuities of the optical flow. This is also the case for Equation (6), which can be considered a variant of Equation (9).

After the initial work in [2], many approaches that focus on accuracy have been developed. These works focus on the use of robust estimators, either in the data or smoothness terms, to be able to deal with motion discontinuities generated by the movement of different objects or by occlusions (e.g., [5,6]). For the data term,

L^{2}

or

L^{1}

dissimilarity measures have been used as well as more advanced data terms [6]. For the smoothness term, isotropic diffusion, image-adaptive, isotropic diffusion with non-quadratic regularizers, anisotropic diffusion (image or flow-adaptive), and the recent non-local regularizers have been proposed [3,4,5,6].

However, these methods may fail in occlusion areas due to forced, but unreliable, intensity matching. The problem can be further accentuated if the optical flow is smoothed across an object boundaries adjacent to occlusion areas.

1.2. Robust Motion Estimation

Normally, assumptions as brightness constancy and smooth space variation of optical flow are violated in real images. Advanced robust optical flow methods are developed with the goal to perform well even when violations of the optical flow assumptions are present.

The model expressed by Equation (9) to estimate the optical flow penalizes high gradients of

u (x)

and therefore does not allow discontinuities of

u (x)

. This model is highly sensitive to noise in the images and to outliers. The functional can be modified in order to allow discontinuities of the flow field by changing the quadratic data term to an

L^{1}

term and by changing the

L^{2}

regularization term. In [5], the authors present a novel approach to estimate the optical flow that preserves discontinuities, and it is robust to noise. In order to compute the optical flow

u = (u_{1}, u_{2}) : Ω \to R^{2}

between

I_{0}

and

I_{1}

, the authors propose minimizing the energy:

J (u (x)) = \int_{Ω} (λ | I_{0} (x) - I_{1} (x + u (x)) | + | \nabla u_{1} (x) | + | \nabla u_{2} (x) |) d x,

(10)

including robust data attachment and regularization terms (namely, the total variation of

u (x)

) with a relative weight given by the parameter

λ > 0

. This variational model is usually called the TV-L1 formulation. The use of

L^{1}

type-norm measures has shown a good performance in front of

L^{2}

norms to preserve discontinuities in the flow field and offers increased robustness against noise and illumination changes.

1.3. Related Works

As we mentioned above, motion estimation methods use an approximation of the data term. Energies that consider this approximation fail to estimate the optical flow when the displacements are very fast or larger than the size of the objects in the scene. Recently, a new term that considers correspondences between two images have been added to these kinds of models [3,6,7,8]. The inclusion of this additional information (as a prior) has improved the performance of the optical flow estimation, giving the model the capability of handle large displacements. The inclusion of this new term allows the methodology to be systematic and not depending on heuristic and complex rules that depend on the designer of the model.

In [3], the authors propose a model to handle large displacement. This model stated an energy model that considers (i) a data term, (ii) a regularization term, and (iii) a term for additional information. The additional information term proposed in this energy model comes from correspondences obtained either by SIFT or by patch match. Considering these computed correspondences, a constant candidate flow is proposed as a possible motion present in the image. Authors select the optimal flow among the constant candidates for each pixel in the image sequence as a labeling problem. The authors solve this problem using discrete optimization QPBO (quadratic pseudo-Boolean optimization).

In [9], a model is presented for estimating the optical flow that utilizes a robust data term, a regularization term, and a term that considers additional matching. Additional matching is obtained using HoGs (Histograms of Gradients). The incorporation of additional matching is weighted by a confidence value that is not simple to compute. This weight depends on the distance to the second best candidate of the matching and on the ratio between the error of current optical flow estimation and the error of the new correspondence. The location of the matching depends on the minimum eigenvalue of the structure tensor of the image and on the error of the optical flow estimation.

Deep-flow presented in [10] is a motion field estimation method inspired by (i) deep convolution neural networks and (ii) the work of [6]. Deep flow is an optical flow estimation method to handle large displacements, which obtains dense correspondences between two consecutive images. These correspondences are obtained using small patches (of

4 \times 4

pixels). A patch of

8 \times 8

is interpreted as composed by 4 patches of

4 \times 4

pixels, each small patch is called a quadrant. The matching score of an

8 \times 8

patch is formed by averaging the max-pooled scores of the quadrants [10]. This process is repeated recursively for

16 \times 16

,

32 \times 32

, and

64 \times 64

pixels becoming a more discriminant virtual patch. Computed matching is considered a prior term in an energy model. Deep flow uses a uniform grid to locate the correspondences. A study in depth is necessary for this proposed method. Which patch size is necessary to improve the results and which locations are the best to improve optical flow estimation are questions that must be answered.

Recently, new models have been proposed in order to tackle the problem of large displacements in [4,11,12,13]. These models consider sparse or dense matching using a deep matching algorithm [10], motion candidates [13], or SIFT. The principle idea is to give some “hint” to the variational optical flow approach by using such sparse matching [10]. In [12,13], an occlusion layer is also estimated.

In [11], the authors propose a method called SparseFlow that finds sparse pixel correspondences by mean of a matching algorithm, and these correspondences are used to guide a variational approach to obtain a refined optical flow. The SparseFlow matching algorithm uses an efficient sparse decomposition of pixels surrounding a patch as a linear sum of those found around candidate corresponding pixels [11]. The matching pixel dominating the decomposition is chosen. The pixel pair matching in both directions (forward–backward) are used to refine the optical flow estimation.

In [12], a successful method to compute optical flow is proposed which includes occlusion handling and additional temporal information. Images are divided into discrete triangles, and this allowed the authors to naturally estimate the occlusions that are then incorporated into the optimization algorithm to estimate an optical flow. Combining the “Inertial Estimate” of the flow and classifiers to fusion optical flow, they made some improvements in the final results.

The authors in [13] propose a method to compute motion field, and this method aims to tackle large displacements, motion detail, and occlusion estimation. The method consists of two stages: (i) They supply dense local motion candidates. (ii) They estimate affine motion models over a set of size-varying patches combined with a patch-based pairing.

In [14], an optical flow method that includes this new term for additional information coming from exhaustive matching is proposed. This model also considers an occlusion layer simultaneously estimated with optical flow. The estimation of the occlusion layer helps to improve the motion estimation of regions that are visible in the current frame but not visible in the next frame.

In this work, we present an empirical study to evaluate different strategies to locate additional exhaustive matching (correspondences coming from exhaustive search) in order to improve the precision of the motion field or optical flow estimation. We think that none of the above-revised works has answered precisely the question “What is the best location for additional correspondences in order to improve the precision of a motion estimation method?” We considered three possible locations of matching: (i) uniform locations, (ii) random locations, and (iii) locations in the maximum gradient of the reference frame. We also evaluated combined strategies: uniform locations in large gradients magnitude and uniform locations located in medium gradients magnitude. We performed two evaluations using different features. In the first one we used intensities, and in the second, we used gradients.

We present our complete evaluation strategy in Section 5.

1.4. Contribution of This Version

In this new version of the paper presented in [15], we have implemented the following modifications:

(a): We determine the parameters of the optical flow estimation model using particle swarm optimization (PSO).
(b): Our proposal was evaluated in the large database MIP-Sintel in both of its sets: training and test.
(c): We estimated the occluded pixels in two consecutive images based on the largest values of the optical flow error. We avoided computing exhaustive matching in these occluded pixels because the matching will be unreliable.
(d): We divided the gradient of the image into three sets: (i) small gradients, (ii) medium gradients, and (iii) large gradients. In sets (ii) and (iii), we have located additional matching in uniform locations and we have evaluated the performance in MPI-Sintel.
(e): We extended the Horn-Schunck optical flow to handle additional information coming from exhaustive search. This extended method was evaluated in the MPI-Sintel Database, and the obtained results were compared with the results obtained by our proposal.
(f): We performed exhaustive matching using colors and using gradients in order to make this experimental study more complete.

In Section 2, we present our proposal to estimate the motion field. In Section 3, the implementation of the methods is presented and we present the pseudo-code of the algorithm. In Section 4.3, we present the experiments and the database used, and in Section 5 the obtained results are presented. We present comments and discussions about the obtained results in Section 6.

2. Materials and Methods

We present in Section 2.1 our proposed model to handle large displacements and be robust to illumination changes. In Section 2.3 we extended the classical Horn-Shunck method to handle large displacement. Our aim is to compare the effect of using additional term in our proposed model and in the Horn-Shunck model.

2.1. Proposed Model

Models presented in [3,5,7,16] propose a variational model to estimate the motion field. Those cited models use the

L^{1}

norm (a data term which is robust to illumination changes and tolerate discontinuities) and an additional term that incorporates information coming from a correspondences estimation method.

Let

I_{0}

and

I_{1}

be two consecutive images and let

u = (u_{1}, u_{2})

be the motion field between images such that

J (u (x)) = J_{d} (u (x)) + J_{r} (u (x))

where

Ω \subset R^{2}

, and

J_{d} (u (x))

is the data term and is given by

J_{d} (u (x)) = \int_{Ω} λ |I_{0} (x) - I_{1} (x + u)| d x .

(11)

where

λ

is a real constant, and the regularization term

J_{r} (u (x))

is given by

J_{r} (u (x)) = \int_{Ω} \sqrt{{|\nabla u_{1} (x)|}^{2} + {|\nabla u_{2} (x)|}^{2}} d x .

2.2. Linearization

The model presented in [5] considers a linearization of the data term in Equation (11).

I_{1} (x + u (x))

is linearized around a known point

u_{0} (x)

as

I_{1} (x + u (x)) = I_{1} (x + u_{0} (x)) + < u (x) - u_{0}, \nabla I_{1} (x + u_{0} (x)) >

where

u_{0} (x)

is a known optical flow,

\nabla I_{1} (x + u_{0} (x))

is the gradient of the warped image

I_{1} (x + u_{0} (x))

, and the notation

<, >

represents the internal product. Considering this linearization, we obtain a new data term that can be written as

{\tilde{J}}_{d} (u (x)) = \int_{Ω} λ | I_{1} (x + u_{0} (x)) + < u (x) - u_{0} (x), \nabla I_{1} (x + u_{0} (x)) > - I_{0} (x) | d x .

(12)

The data term in Equation (12) is based on the brightness constancy assumption, which states that the intensity of the pixels in the image remains constant along the sequence. In most cases, this assumption does not hold due to shadows, reflections, or illumination changes. The presence of shadows and other intensity changes cause the brightness constancy assumption to fail. The gradient constancy assumption appears as an alternative that can handle pixel intensity changes.

We define a weight map

α : R^{2} \to [0, 1]

to switch between the brightness and gradient constancy assumption as in [3]. We construct a new data term:

J_{d} (u (x)) = λ \int_{Ω} α (x) D_{I} (u (x), x) d x + λ \int_{Ω} τ_{d} (1 - α (x)) D_{\nabla I} (u (x), x) d x

(13)

where

D_{I} (u (x), x) = | I_{1} (x + u_{0} (x)) + < u (x) - u_{0} (x), \nabla I_{1} (x + u_{0} (x)) > - I_{0} (x) | .

(14)

Equation (14) represents a linearized version of the brightness constancy assumption, and

\begin{matrix} D_{\nabla I} (u (x), x) = | \partial_{x} I_{1} (x + u_{0} (x)) + < u (x) - u_{0} (x), \nabla \partial_{x} I_{1} (x + u_{0} (x)) > - \partial_{x} I_{0} (x) | \\ + | \partial_{y} I_{1} (x + u_{0} (x)) + < u (x) - u_{0} (x), \nabla \partial_{y} I_{1} (x + u_{0} (x)) > & - \partial_{y} I_{0} (x) | \end{matrix}

(15)

represents the gradient constancy assumption, where

τ_{d}

is a positive constant. In Equation (15), the terms

\partial_{x} I_{0}

,

\partial_{x} I_{1}

, and

\partial_{y} I_{0}

,

\partial_{y} I_{1}

are the partial derivatives w.r.t. x and y of

I_{0}

and

I_{1}

, respectively. Considering Equations (14) and (15), we follow [3] and state the adaptive weight map

α (x)

:

α (x) = \frac{1}{1 + e^{β (D_{I} (u (x), x) - τ_{d} D_{\nabla I} (u (x), x))}}

where

β

is a positive constant real value.

Computing

α (x)

depends on the difference of two terms:

D_{I} (u (x), x)

and

D_{\nabla} (u (x), x)

. On the one hand, if

D_{I} (u (x), x)

is larger than

D_{\nabla} (u (x), x)

, the data term will be more confident w.r.t. the gradient constancy constraint. On the other hand, if

D_{I} (u (x), x)

is less than

D_{\nabla} (u (x), x)

, the data term will be more confident w.r.t. the color constancy constraint [3,8].

2.2.1. Decoupling Variable

In order to minimize the proposed functional in Equation (11), we propose using three decoupling variables (

v_{1} (x)

,

v_{2} (x)

,

v_{3} (x)

), and we penalize its deviation from

u (x)

, then the functional becomes

\begin{matrix} J (u (x), v_{1} (x), v_{2} (x), v_{3} (x)) & = J_{d} (u (x), v_{1} (x), v_{2} (x), v_{3} (x)) + J_{r} (u (x)) \\ + & \frac{1}{2 θ} \int_{Ω} \sum_{i = 1}^{3} {\bar{α}}_{i} {(u (x) - v_{i} (x))}^{2} d x \end{matrix}

(16)

where

\bar{α}

is a vector in

R^{3}

and is defined as

\bar{α} (x) = {(α (x), 1 - α (x), and 1 - α (x))}^{T}

.

2.2.2. Color Model

Considering color images in RGB space (

I = {I_{1}, I_{2}, I_{3}}

, which correspond to red, blue, and green components, respectively), we define five decoupling variables

v_{1} (x)

,

v_{2} (x)

, and

v_{3} (x)

for color components and

v_{4} (x)

and

v_{5} (x)

for gradients; thus, the functional becomes

J (u (x), \bar{v} (x)) = J_{d} (u (x), \bar{v} (x)) + J_{r} (u (x)) + \frac{1}{2 θ} \int_{Ω} \sum_{i = 1}^{5} {\bar{α}}_{i} (x) {(u (x) - v_{i} (x))}^{2} d x

(17)

where we have defined

\bar{v} (x) = {v_{1} (x), v_{2} (x), v_{3} (x), v_{4} (x), and v_{5} (x)}

, and

\bar{α} \in R^{5}

is defined as

\bar{α} (x) = {(α (x), α (x), α (x), 1 - α (x), a n d 1 - α (x))}^{T}

.

2.2.3. Optical Flow to Handle Large Displacements

In order to cope with large displacements, we use additional information coming from exhaustive matching computed between images of a video sequence used as a precomputed sparse vector field (a priori). The main idea is that this sparse vector field guides the optical flow estimation in regions where the approximated linearized model fails [7] due to fast movements or large displacements.

Let

u_{e} (x)

be this sparse vector field. We add to our model a term to enforce the solution

u (x)

to be similar to the sparse flow

u_{e} (x)

as in [7], and our model becomes

\begin{matrix} J (u (x), \bar{v} (x), u_{e} (x)) = & {\tilde{J}}_{d} (u (x), \bar{v} (x)) + J_{r} (u (x)) + \frac{1}{2 θ} \int_{Ω} \sum_{i = 1}^{5} {\bar{α}}_{i} {(u (x) - v_{i} (x))}^{2} d x \\ + \int_{Ω} κ χ (x) {(u (x) - u_{e} (x))}^{2} d x \end{matrix}

where

χ (x)

is a binary mask indicating where the matching was computed.

κ

is a decreasing weight for each scale. Following [7], the

κ

is updated for each iteration

κ = κ_{0} {(0.5)}^{n}

, where

κ_{0} = 300

, and n is the iteration number.

We use this

χ (x)

binary mask to test different locations of

u_{e} (x)

in order to evaluate its influence in the optical flow estimation performance.

2.2.4. Occlusion Estimation

The proposed model cannot handle occluded and dis-occluded pixel. There is no correspondence for pixels that are visible in the current frame and not visible in the target frame. Optical flow computed in those pixels presents a large error due to the forced matching. We used this fact to detect an occlusion and once the occlusion is detected we do not perform exhaustive matching in those points. That is to say, we define regions where the exhaustive matching must not be performed. As a proof of concept, we show in Figure 1 two consecutive frames of the sequence ambush_5 where the girl fights with a man. The sequence presents large regions where pixels are occluded.

Figure 1. Occlusion estimation comparing optical flow error with a threshold

θ_{o c c}

. (a) Frame_0049 of sequence ambush_5; (b) Frame_0050; (c) occlusion estimation.

We present in Figure 1 two consecutive images in (a) and (b), and in (c) we present our occlusion estimation. In the sequence, the hair of the girl moves to the left of the image and the occlusion is correctly estimated. Additionally, dis-occlusion is estimated as we see on the right side of the girl’s head where hair dis-occluded the hand and the lance.

We compare the magnitude of the data term with a threshold (

θ_{o c c}

) as in [17]. On the one hand, if the magnitude is larger than the threshold, then a binary mask

o (x)

is set to 1. On the other hand, if the magnitude is smaller than the threshold,

o (x)

is set to 0.

2.2.5. Solving the Model

For a fixed

u_{0} (x)

and following the notation in [5] we define

ρ_{i} (u (x))

. Let

ρ_{i} (u (x)) = I_{i} (x + u_{0} (x)) + ⟨u (x) - u_{0} (x), \nabla I_{i}⟩ - I_{0}

with

i = 1, \dots, 3

.

ρ_{4} (u (x)) = \partial_{x} I_{1} (x + u_{0} (x)) + ⟨u (x) - u_{0} (x), \nabla \partial_{x} I_{1}⟩ - \partial_{x} I_{0}

and

ρ_{5} (u (x)) = \partial_{y} I_{1} (x + u_{0} (x)) + ⟨u (x) - u_{0} (x), \nabla \partial_{y} I_{1}⟩ - \partial_{y} I_{0}

. If

θ

is a small constant, then

v_{1} (x)

,

v_{2} (x)

,

v_{3} (x)

,

v_{4} (x)

, and

v_{5} (x)

are close to

u (x)

. This convex problem can be minimized by alternating steps as in [5]:

(1): Solve exhaustively

$J_{d} (u_{e} (x)) = \min_{u_{e} (x)} \int_{Ω} |I_{0} (x) - I_{1} (x + u_{e} (x))| d x .$

(18)
(2): Let us fix $v_{1} (x)$ , $v_{2} (x)$ , $v_{3} (x)$ , $v_{4} (x)$ , and $v_{5} (x)$ and then solve for $u (x)$ :

$\min_{u (x)} \{\int_{Ω} \sum_{i = 1}^{5} \frac{{\bar{α}}_{i} (x) {(u (x) - v_{i} (x))}^{2}}{2 θ} d x + \int_{Ω} κ χ (x) (1 - o (x)) (u (x) - u_{e} (x)) d x + \int_{Ω} ∥ \nabla u ∥ d x .\}$

(19)
(3): Let us fix $u (x)$ and solve the problem for $v_{1} (x)$ , $v_{2} (x)$ , $v_{3} (x)$ , $v_{4} (x)$ , and $v_{5} (x)$ :

$\min_{v_{i} (x)} \{\int_{Ω} \sum_{i = 1}^{5} {(u (x) - v_{i} (x))}^{2} d x + \int_{Ω} |ρ_{i} (v_{i} (x))| d x .\} .$

(20)

This minimization problem for

v_{1} (x)

v_{2} (x)

,

v_{3} (x)

,

v_{4} (x)

, and

v_{5} (x)

can be solved point-wise. Since the propositions in [5] are fundamentals for our work, we adapted them to our model.

Proposition 1.

The solution of Equation (9) is given by

u (x) = \frac{α (x) (\frac{v_{1} (x) + v_{2} (x) + v 3 (x)}{3}) + (1 - α (x)) (\frac{v_{4} (x) + v_{5} (x)}{2}) + θ d i v (g (x) ξ (x)) + κ χ (x) (1 - o (x)) u_{e} (x)}{1 + κ χ (x) (1 - o (x))}

(21)

The dual variable

ξ (x) = {(ξ_{1} (x), ξ_{2} (x))}^{T}

is defined as

ξ_{1} (x) = \frac{ξ_{1} (x) + \frac{τ}{θ} \nabla u_{1} (x)}{1 + \frac{τ}{θ} \sqrt{| \nabla u_{1} {(x) |}^{2} + {| \nabla u_{2} (x) |}^{2}}}

(22)

ξ_{2} (x) = \frac{ξ_{2} (x) + \frac{τ}{θ} \nabla u_{2} (x)}{1 + \frac{τ}{θ} \sqrt{| \nabla u_{1} {(x) |}^{2} + {| \nabla u_{2} (x) |}^{2}}}

(23)

where

τ < \frac{1}{4}

.

Proposition 2.

The solution of the minimizing problem in Equation (20) is given by

v_{i} (x) = u (x) + \{\begin{matrix} 3 λ θ \nabla I_{i} (x + u_{0} (x)) if ρ_{i} (u (x)) < - 3 λ θ {| \nabla I_{i} (x + u_{0} (x)) |}^{2} \\ - \frac{ρ_{i} (u (x))}{| \nabla I_{i} |} \nabla I_{i} (x + u_{0} (x)) if | ρ_{i} (u (x)) | < 3 λ θ | \nabla I_{i} (x + u_{0} (x)) |^{2} \\ - 3 λ θ \nabla I_{i} (x + u_{0} (x)) if ρ_{i} (u (x)) > 3 λ θ {| \nabla I_{i} (x + u_{0} (x)) |}^{2} \end{matrix}

(24)

v_{i} (x) = u (x) + \{\begin{matrix} 2 λ θ τ_{d} \nabla \partial_{i} I_{1} (x + u_{0} (x)) if ρ_{i} (u (x)) < - 2 λ θ τ_{d} {| \nabla \partial_{i} I_{1} (x + u_{0} (x)) |}^{2} \\ - \frac{ρ_{i} (u (x))}{| \nabla \partial_{i} I_{1} |} \nabla \partial_{i} I_{1} (x + u_{0} (x)) if | ρ_{i} (u (x)) | < 2 λ θ τ_{d} {| \nabla \partial_{i} I_{1} (x + u_{0} (x)) |}^{2} \\ - 2 λ θ τ_{d} \nabla \partial_{i} I_{1} (x + u_{0} (x)) if ρ_{i} (u (x)) > 2 λ θ τ_{d} {| \nabla \partial_{i} I_{1} (x + u_{0} (x)) |}^{2} \end{matrix} .

(25)

for

i = 4, 5

.

\partial_{4} = \partial_{x}

, and

\partial_{5} = \partial_{y}

.

2.3. An Extended Version of Horn-Schunck’s Optical Flow

In [18], the implementation of the classical Horn-Schunk’s optical flow considering a multi-scale pyramid is presented. The model proposed in Equation (9) was extended and solved for the flow

u (x)

with a fixed-point iteration:

u_{1}^{n + 1} (x) = (1 - w) u_{1}^{n} (x) + w \frac{(I_{t} (x) + \partial_{x} I_{1} (x) u_{1}^{n} (x) - \partial_{y} I_{1} (x) (u_{2}^{n} (x) - u_{2}^{0} (x))) \partial_{x} I_{1} (x) + A_{1} (x)}{\partial_{x} I_{1}^{2} (x) + α^{2}}

(26)

and

u_{2}^{n + 1} (x) = (1 - w) {u_{2}}^{n} (x) + w \frac{(I_{t} (x) - \partial_{y} I_{1} (x) (u_{1}^{t} (x) - u_{1}^{0} (x)) + \partial_{y} I_{1} (x) u_{2}^{n} (x)) \partial_{y} I_{1} (x) + A_{2} (x)}{\partial_{x} I_{1}^{2} (x) + α^{2}}

(27)

where w is a constant real value parameter, n is the iteration number, and

u_{1}^{0} (x)

and

u_{2}^{0} (x)

are the vertical and horizontal optical flow estimation components, respectively. In the previous scale,

A_{1} (x) = α^{2} G * u_{1}^{n} (x)

,

A_{2} (x) = α^{2} G * u_{2}^{n} (x)

, ∗ is the convolution operator, and G is given by

G = (\begin{matrix} \frac{1}{12} & \frac{1}{6} & \frac{1}{12} \\ \frac{1}{6} & 0 & \frac{1}{6} \\ \frac{1}{12} & \frac{1}{6} & \frac{1}{12} \end{matrix}) .

An Extended Version of the Horn-Schunck’s Optical Flow

We extended the Horn-Schunk’s optical flow to handle large displacements. We modified the solution to the Horn-Schunck scheme by adding terms that consider new information coming from an exhaustive search,

J (u (x)) = J_{d} (u (x)) + J_{r} (u (x))

(28)

where

J_{d} (u (x))

is given by

J (u (x)) = \int_{Ω} {|\partial_{x} I u_{1} (x) + \partial_{y} I u_{2} (x) + \partial_{t} I (x)|}^{2} d x

and

J_{r} (u (x)) = α^{2} ({|\nabla u_{1} (x)|}^{2} + {|\nabla u_{2} (x)|}^{2}) d x .

We minimized the model in Equation (28) for

u (x)

, obtaining

u_{1}^{n + 1} (x) = (1 - w) u_{1}^{n} (x) + w \frac{(I_{t} (x) + \partial_{x} I_{1} (x) u_{1}^{n} (x) - \partial_{y} I_{1} (x) (u_{2}^{n} (x) - u_{2}^{0} (x))) \partial_{x} I_{1} (x) + B_{1} (x)}{\partial_{x} I_{1}^{2} (x) + α^{2} + κ χ (x) (1 - o (x))}

(29)

and

u_{2}^{n + 1} (x) = (1 - w) u_{2}^{n} (x) + w \frac{(I_{t} (x) - \partial_{y} I_{1} (x) (u_{1}^{t} (x) - u_{1}^{0} (x)) + \partial_{y} I_{1} (x) u_{2}^{n} (x)) \partial_{y} I_{1} (x) + B_{2} (x)}{\partial_{x} I_{1}^{2} (x) + α^{2} + κ χ (x) (1 - o (x))}

(30)

where

u_{e} (x) = (u_{1 e} (x), u_{2 e} (x))

,

B_{1} (x) = α^{2} G * u_{1}^{n} (x) + κ χ (x) (1 - o (x)) u_{1 e} (x)

, and

B_{2} (x) = α^{2} G * u_{2}^{n} (x) + κ χ (x) (1 - o (x)) u_{2 e} (x)

. Let us explain with more detail the term

B_{1} (x)

and

B_{2} (x)

. These terms consider two binary masks

χ (x)

and

o (x)

. The binary mask

χ (x)

indicates where the matching should be computed and the binary mask

o (x)

indicates occluded and visible pixels in the target frame. The way

χ (x)

mask is constructed is explained in Section 3.3.

3. Implementation and Pseudo-Code

The model was solved by a sequence of optimization steps. First, we performed the exhaustive search in specific locations obtaining

u_{e} (x)

flow, and we then fixed

u (x)

to estimate

\bar{v} (x)

. Finally, we estimated the

u (x)

and the dual variable

ξ (x)

. These steps were performed iteratively. Our implementation is based on [1], in which a coarse-to-fine multi-scale approach is employed.

3.1. Exhaustive Search

The parameter P defines the size of a neighborhood in the reference image, i.e., a neighborhood of

(2 P + 1) \times (2 P + 1)

pixels. For each patch in (

I_{0}

), we compute

J_{d} (u_{e}) = \int_{Ω} |I_{0} (x) - I_{1} (x + u_{e} (x))| d x .

(31)

We search for the argument

u_{e} (x)

that minimizes this cost value. The functional in Equation (31) always has a solution that depends on

x

, but in many cases there is not only one solution. This issue occurs when the target image contains auto-similarity. In the presence of high auto-similarity, the minimum of the functional presents many local minima. In order to cope with this problem, we use a matching confidence value. This matching confidence value

c (x)

will be zero where high auto-similarity is present, i.e., in cases where there are many solutions where matching is not incorporated in the proposed model.

3.2. Matching Confidence Value

Let

c (x)

be the confidence measure given to each exhaustive matching. The proposed model for

c (x)

is based on the error of the first candidate (

d_{1}

) and the error of the second candidate (

d_{2}

). We have ordered the matching errors of pixels of the target image. We ordered the matching errors from minimum values to maximum values. After that, we computed the confidence value.

c (x) = (\frac{d_{1} - d_{2}}{d_{1}}) .

(32)

3.3. Construction of $χ (x)$

Here we explain with more detail the different location strategies or seeding process we performed.

3.3.1. Uniform Location

The uniform location depends on a D parameter. The exhaustive search is performed every D-other pixels with coordinates x and y. Let

g_{D}

be this uniform grid where exhaustive matching is performed. In Figure 2, the initial point of the arrow shows the coordinates where a patch is taken in the reference images. The final part of the arrow shows the coordinates where the corresponding patch is located in the target image.

Figure 2. Correspondences of patches located uniformly in the reference image.

We compute the exhaustive matching in

N_{D}

positions given by

N_{D} = [\frac{N_{x}}{D}] \times [\frac{N_{y}}{D}],

where

N_{x}

and

N_{y}

represent the height and width of the image, respectively, and [ ] represents an integer division.

3.3.2. Random Location

The random location strategy defines a grid

g_{R}

where the matching is located. In Figure 3, the matching and its correspondences are represented with white arrows. The initial point of the arrow shows the coordinates where a patch is taken in the reference images. The final part of the arrow shows the coordinates where the corresponding patch is located in the target image. In every realization of the strategy, the grid

g_{R}

is changed.

Figure 3. Correspondences of patches located randomly in the reference image.

3.3.3. Location in Maximum Values of the Gradient Magnitude

We computed the gradient magnitude of the reference image. We ordered that gradient from the smallest magnitude to the maximum magnitude. We considered the

N_{D}

as the largest magnitude of the gradient. The grid

g_{g}

is defined by the

N_{D}

maximum magnitude positions of the gradient in the reference image. In Figure 4, we show the correspondences given the location in the maximum magnitude of the gradient.

Figure 4. Correspondence of patches located in the largest magnitude of the gradient. (a) Gradient magnitude of the reference image. (b) Ordered gradient from the minimum magnitude to the maximum magnitude. (c) Correspondences of the patches located in the maximum gradient magnitudes (white arrows).

In Figure 4a, we show the computed magnitude of the gradient. We represent the magnitude of the gradient using intensities: white means the maximum magnitude of the gradient, and black means the minimum magnitude. We ordered the gradient magnitude and plotted it. We took the last

N_{D}

magnitude of the gradient. These gradients correspond to locations in the image. In those locations, we performed exhaustive matching. We observe in Figure 4c that the matching is located on the edges of the girl’s cloth, on the dragon face, and on rock edges.

3.3.4. Location in the Large Magnitudes of the Gradient in a Uniform Location

The main idea is to compute matching in a location, where large gradients are present using a uniform grid (a large gradient uniform grid). We divided the gradient of the image into three subsets: a large gradient set, a medium gradient set, and a small gradient sets. In Figure 5c, we show the position of matching in the large gradient uniform grid. We show in (a) the magnitude of the gradient of the image. In (b), we show a binary image where 1 indicates the location of large gradients that coincides with regions that present edges and highly textured surfaces.

Figure 5. (a) Gradient of the reference image. (b) Large gradient set. (c) Matching result of patches located in the large gradient uniform grid.

In (c), we observe that the matching is located in a uniform location but only in places where the high gradient is present.

3.3.5. Location in the Medium Magnitudes of the Gradient in a Uniform Location

We show in Figure 6a the magnitude of the gradient of the image and the located matching In (b), we show a binary image where 1 indicates the location of medium gradients.

Figure 6. (a) Gradient of the reference image. (b) Medium gradient set. (c) Matching result of the patches located in the medium gradient uniform grid.

If we compare Figure 6c with Figure 5c, we observe that they are complementary; i.e., locations that are considered in one strategy are not considered in the other.

3.3.6. $χ (x)$ Value

The value of the

χ (x)

is given by the confidence value

c (x)

, i.e.,

χ (x) = c (x) .

(33)

The

c (x)

is the weight of matching. In the presence of auto-similarity of the target image,

c (x)

will be zero.

3.4. Pseudo Code

The model was implemented based on the available code in IPOL [1] for the TV-L1 model. The model was programmed in C in a laptop with i7 processor and 16 GB RAM. The pseudo code is presented in Algorithm 1.

The algorithm to compute the optical flow using additional information:

Algorithm 1: Integration of additional information coming from exhaustive matching

I n p u t

: Two consecutive color images

I_{0} (x)

,

I_{1} (x)

.

P a r a m e t e r s

α, λ, P,

θ_{0}

,

τ_{d}

, β,

M a x I t e r

, β, κ,

N u m b e r_{s c a l e s}

,

N u m b e r_{w a r p i n g s}

,

θ_{o c c}

.

O u t p u t

: optical flow

u (x) = (u_{1} (x), u_{2} (x))

Down-scale I_{0} (x), I_{1} (x)

Initialization

u (x) = v (x); w = 0

f o r

s c a l e s \leftarrow N u m b e r_{s c a l e s}

to 1

Construct ψ

In specific locations defined by the strategy compute

u_{e} (x)

using Equation (31)

Using

u_{e} (x)

compute

c (x)

and update

χ (x)

.

f o r

w \leftarrow 1

to

N u m b e r_{w a r p i n g s}

Compute

α (x)

and occlusion

o (x)

.

Compute

v_{i} (x)

, Equations (24) and (25).

Compute

u

, Equation (21).

Compute ξ Equations (22) and (23).

update

κ_{n} = κ {(0.5)}^{s c a l e s}

e n d f o r

up-sample

u (x)

e n d f o r

Out

u (x)

.

4. Experiments and Database

4.1. Middlebury Database

The proposed model has been tested in the available Middlebury database training set [19]. This database contains eight image sequences, and its ground truth is also available. We divided the Middlebury sequences into two groups: (i) a training set (two sequences) and (ii) an evaluation set containing the rest of the sequences. These images present illumination changes and shadows (Figure 7e–h).

Figure 7. Images of the Middlebury database containing small displacements. (a) and (b) frame10 and frame11 of sequence Grove2, respectively. (c) and (d) frame10 and frame11 of sequence Grove3, respectively. (e) and (f) frame10 and frame11 of sequence RubberWhale, respectively. (g) and (h) frame10 and frame11 of sequence Hydrangea, respectively. (i) and (j) frame10 and frame11 of sequence Urban2, respectively and finally (k) and (l) of sequence Urban2, respectively.

Figure 7 shows two consecutive frames, namely, Frame 10 and Frame 11. In our model,

I_{0}

corresponds to Frame 10,

I_{1}

to Frame 11. Figure 7a,b correspond to the Grove2 sequence, (c) and (d) to he Grove3 sequence, (e) and (f) to RubberWhale, (g) and (h) to the Hydra sequence, (i) and (j) to the Urban2 sequence, and (k) and (l) to the Urban3 sequence.

4.2. The MPI-Sintel Database

The MPI-Sintel database [20] presents long synthetic video sequences containing large displacements and several image degradations as blur or reflections as well as different effects such as fog, shadows, reflections, and blur.

Moreover, there are two versions of the MPI-Sintel database: clean and final. The final version is claimed to be more challenging and includes all effects previously mentioned. For our evaluation, we take the final version of MPI-Sintel. In Figure 8 we compare the same frame extracted from clean version and final version.

Figure 8. Image extracted from MPI-Sintel clean version and MPI-Sintel final version. We extracted frame_0014 from sequence ambush_2. (a) frame_0014 clean version. (b) frame_0014 final version.

In Figure 8a we show the clean version of frame_0014 of sequence ambush_2. The Figure 8b includes fog and blur effects making this MPI-Stintel version more challenging.

The optical flow (ground truth) of this database is publicly available in order to compare different algorithms. The optical flow was color-coded using the color code presented in Figure 9.

Figure 9. Color code used for the optical flow.

Figure 10 displays some examples of the MPI-Sintel database. There are images with large displacements: around 170 pixels for the cave_4 sequence and around 300 pixels for temple_3. In the cave_4 sequence, a girl fights with a dragon inside a cave, shown in Figure 10a,b,d,e. In (a) and (b), the girl moves her lance to attack the dragon. In (d) and (e), the dragon moves its jaws very fast. In (g) and (h), we show the girl trying to catch the small dragon, but a claw appears and takes the small dragon away. In (j) and (k), the girl falls in the snow. We observe large displacement and deformation with respect to her hands. We also display the optical flow of the video sequence.

Figure 10. Examples of images of the MPI-Sintel database video sequence. (a,b) frame_0010 and frame_0011, (c) color-coded ground truth optical flow of the cave_4 sequence, (d) ground truth represented with arrows. (e,f) frame_0045 and frame_0046, (g) color-coded ground truth, and (h) arrow representation of optical flow of the cave_4 sequence, (i,j) frame_0030 and frame_0031, (k) color-coded ground truth optical flow, (l) arrow representation (in blue) of the temple3 sequence. (m,n) frame_0006 and frame_0007, (o) color-coded ground truth and (p) arrow representation (in green) ground truth optical flow of the ambush_4 sequence.

We present in Table 1 the number of images and the names of each sequence of the final MPI-Sintel.

Table 1. Number of images in each image sequence.

4.3. Experiments with the Middlebury Database

In this section, we present quantitative results obtained by the optical flow estimation model presented above. We have divided the database in the training set and the evaluation set. We used the sequences Grove3 and RubberWhale as the training set (TRAINING_SET). The other sequences are used to validate the method.

4.4. Parameter Estimation

Initially, we estimated the best parameters in our training set considering

κ = 0.0

, i.e., the model cannot handle large displacements. We scan the parameter θ and λ, setting

τ = 1.0

and

β = 0.0

and estimating the

E P E

(End Point Error) and

A A E

(Average Angular Error) error.

We obtained the minimum value for both

E P E

and

A A E

error in

θ = 0.1

and

λ = 0.1

. We selected these parameter values. In Figure 11, we show the obtained optical flow, setting θ and λ parameters.

Figure 11. Color-coded optical flow. (a) Color-coded optical flow for Grove2. (b) Color-coded optical flow for RubberWhale. (c) Ground truth for the Grove2 sequence. (d) Ground truth for the RubberWhale sequence.

With these parameters we obtained an average

E P E = 0.16

and

A A E = 3.78

.

We fixed the value of θ and λ and then varied

τ_{d}

from

0.1

to

2.3

and β from 1 to 8. We obtained a minimum in

τ_{d} = 1.9

and

β = 2.0

. With these parameters we obtained and average

E P E = 0.1369

and

A A E = 3.1186

. In Figure 11c,d, we present the ground truth for the Grove2 sequence and the RubberWhale sequence, respectively, in order to facilitate the visual evaluation of our results.

In Figure 12, we show the optical flow and the weight map

α (x)

.

Figure 12. Color-coded optical flow. (a) Color-coded coded optical flow for Grove2. (b) Color-coded optical flow for RubberWhale. (c) Ground truth for the Grove2 sequence. (d) Ground truth for the RubberWhale sequence. (e) Weight map

α (x)

for Grove2. (f) Weight map

α (x)

for RubberWhale.

We can observe in Figure 12a,b the obtained optical flow using weight map

α (x)

. There is an improvement in both estimated optical flows. In (c) and (d), we present the

α (x)

function for both sequences. Low gray levels represent low values of

α (x)

; i.e., the model uses the gradient constancy assumption. Higher gray level means that the model uses the brightness constancy assumption. As we can see in (d), the model uses gradients on shadows in the right side of the whale and the left side of the wheel. In (c), in most locations,

α (x) = 0.5

.

4.5. Experiments with MPI-Sintel

We performed two evaluations:

A

and

B

. Evaluation

A

considers the proposed model, evaluating different strategies to locate matching. These strategies are (i) uniform locations, (ii) random locations, (iii) locations in maximum gradients of the reference image, (iv) uniform locations in the largest gradients of the image, and (v) uniform locations in the medium gradients of the reference image. In these five evaluations, the matching was computed comparing colors between pixels. Additionally, we performed the previous five evaluations using gradient patches instead of color patches.

In Evaluation

B

, considering the extended Horn-Schunck model, we evaluated its performance in different location strategies. These strategies are the same in Evaluation

A

, i.e., (i) uniform, (ii) random, (iii) maximum gradients, (iv) uniform locations in maximum gradients, and (v) uniform locations in medium gradients.

4.6. Parameter Estimation of the Proposed Model with MPI-Sintel

4.6.1. Parameter Model Estimation

Parameters λ, θ, and β y

τ_{b}

were estimated by the PSO algorithm [21]. The PSO algorithm is an optimization method that does not compute derivatives of the functional to minimize. It minimizes a functional by iteratively trying to improve many candidate solutions. Each iteration is performed taking into account the functional, and it is minimized considering these candidate solutions and updating them in the space-domain according to dynamics behavior for the position and velocity of solution candidates. Let

f : R^{n} : \to R

be a functional to optimize (in this case, we considered the total

E P E

as the functional to minimize), where n is the number of parameters to estimate. Let

μ_{i}

be a candidate solution that minimizes the optical flow estimation error. In each iteration, a global best candidate parameter is stored (

μ_{g}

) and jointly is stored the parameter candidate that presents the best performance in the iteration (

μ_{b}

). Each parameter vector

μ_{i}

is updated by

ν_{i}^{t + 1} = ω ν_{i}^{t} + φ_{g} (μ_{i}^{t} - μ_{g}) + φ_{b} (μ_{i}^{t} - μ_{b})

μ_{i}^{t + 1} = μ_{i}^{t} + ν_{i}^{t}

where ω is the evolution parameter for each solution

μ_{i}

, and

φ_{g}

and

φ_{b}

are positive weight parameters. A saturation for

ν_{i}

is usually considered. In our case, we used

ν_{m a x} = 2

and

ν_{m i n} = - 2

. In Table 2, we show the parameters of our PSO algorithm.

Table 2. PSO parameters.

In each estimated optical flow of the sequence, we computed the

E P E

and

A A E

for each sequence, and each PSO iteration we minimized the following functional:

J (λ, θ, β, τ_{d}) = \sum_{i = 1}^{8} (E P E_{i} + A A E_{i})

(34)

where

E P E_{i}

and

A A E_{i}

is the end point error and the average angular error of each considered sequence, respectively, defined as

\begin{matrix} c E P E = \frac{1}{n} \sum_{i = 1}^{n} \sqrt{{(g_{1 i} - u_{1 i})}^{2} + {(g_{2 i} - u_{2 i})}^{2}} \\ A A E = \frac{1}{n} \sum_{i = 1}^{n} c o s^{- 1} (\frac{1 + g_{1 i} u_{1 i} + g_{2 i} u_{2 i}}{\sqrt{1 + g_{1 i}^{2} + g_{2 i}^{2}} \sqrt{1 + u_{1 i}^{2} + u_{2 i}^{2}}}) \end{matrix}

(35)

where

g (x) = (g_{1} (x), g_{2} (x))

is the ground truth optical flow, and

u (x) = (u_{1} (x), u_{2} (x))

is the estimated optical flow.

In order to estimate the parameters of the model, we selected eight pairs of consecutive images video sequences of the MPI-Sintel database. We selected the first two images from sequences: alley_1, ambush_2, bamboo_2, cave_4, market_5, bandage_1, mountain_1, and temple_3.

In Figure 13, we show image sequences used to estimate the parameter, the ground truth optical flow, and our results.

Figure 13. Image sequences used to estimate model parameters. In the parameter estimation, we used the first two frames of each considered sequence: (a–d) the frames of the alley_1 sequence, the ground truth, and our results, respectively. (e,f) the ambush_2 sequence, (g) the ground truth, and (h) our result. (i,j) frames of the bamboo_2 sequence, (k) the ground truth, and (l) our result. (m,n) the frame of bandage_1 sequence (o) the ground truth, and (p) our results.

We observe in Figure 13a,b a girl that moves a fruit upward. Our method estimates correctly the direction of the motion.

E P E = 0.22

. Figure 13e,f show the lance as well as the girl and the man fighting. The method cannot estimate correctly the motion due to large displacements.

E P E = 10.21

. Figure 13l shows that the large displacements of the butterfly cannot be correctly estimated. In Figure 13p, we observe that the displacements are correctly estimated.

E P E = 0.78

.

In Figure 14b, we show the performance of the PSO algorithm minimizing the functional in Equation (34).

Figure 14. Performance of the PSO algorithm. (a) Performance of each individual. (b) Performance of the best individual in each generation.

Figure 14a shows the performance of each individual for 20 generations. It is observed that the variation of the performance decreases along the iterations. In Figure 14, the performance of the best individual for each iteration decreases its functional value; the final value is

17.29

.

We present in Table 3 the final

E P E

and

A A E

value for our selected training set.

Table 3. Results obtained by PSO in the MPI-Sintel selected training set.

As is shown in Table 3, the best performance was for the sequence alley_1 that presents small displacements, and the worst performance was for the sequence market_5, which presents large displacements. Final parameters for the proposed model are

λ = 0.2894

,

θ = 0.1422

,

β = 1.2303

, and

τ_{d} = 0.5687

.

4.6.2. Exhaustive Search Parameter Estimation

Parameters P and D are estimated once the model parameter is estimated. Because these parameters are discrete, we estimate them, scanning P and D values in an exhaustive way. We varied P from 2 to 20 pixels with a step of 2 pixels, and D varied from 32 to 64 with a step of 4 pixels. In Figure 15, we show the obtained performance by the functional in Section 4.6.1 for the MPI-Sintel selected video sequences.

Figure 15. Performance of the proposed model considering variation in P and D parameters.

We show in Figure 15 the performance

E P E + A A E

for a different grid size D and for a different patch size P. It is observed that the minimum of the surface is located at

D = 64

and

P = 12

, reaching a performance of

16.33

. We prefer the second minimum located at

D = 36

and

P = 8

, reaching a performance of

16.53

. This second minimum implies a more dense grid that would perform better in terms of handling small objects. The model using patches outperforms the original model, which does not consider exhaustive search correspondences.

4.7. Parameter Estimation of the Extended Horn-Schunk

4.7.1. $α_{h s}$ Parameter

The multi-scale model has a principal parameter α. This parameter represents the balance between the data term and regularization. We adjusted this parameter using the selected training set which contains eight video sequences. In Figure 16, we show the obtained

E P E + A A E

values for different α values:

Figure 16. Performance of the Horn-Schunck method for different

α_{h s}

values.

Figure 16 shows that the minimum value of the error is reached for

α_{h s} = 32

. We presented in Table 4 the performance obtained by Horn-Schunck for each sequence.

Table 4. Results obtained by PSO in the MPI-Sintel selected training set.

Total average

E P E

and

A A E

is

7.53

and

23.19

, respectively. If we compare this with Table 3, where the results of our proposed model are presented, it is evident that the proposed model outperforms the extended Horn-Schunck model.

4.7.2. Exhaustive Search Parameter Estimation

We varied the P and D with discrete steps in order to find the combination to improve the performance of the optical flow estimation. The parameter P varied from 2 to 20 with a step of 2 pixels, and D varied from 32 to 64 with a step of four pixels. We evaluated the performance of the extended Horn-Schunck method in the selected training set. In Figure 17, we present the obtained results of this experiment.

Figure 17. Performance of the Horn-Schunck model for different P and D parameters.

For these parameters, the obtained

E P E

and

A A E

are presented in Table 5.

Table 5. Results obtained by the PSO in the MPI-Sintel selected training set.

Comparing this table with Table 4, we observe that in sequence alley_1, bandage_1, market_5, and temple_3, there is an improvement in performance. Sequences ambush_2, bamboo_2, and mountain_1 approximately maintain the performance. Sequence cave_4 present an increment in the

E P E

from

5.47

to

6.42

. On average, the total performance is

E P E = 7.41

and

A A E = 22.11

, which represent an improvement compared with the results obtained in Table 4.

5. Results

5.1. Reported Results in Middlebury

We show in Table 6 the reported values for TV-L1 in [1] using gray level images of the Middlebury database.

Table 6. The reported performance of TV-L1 in Middlebury [1].

In Table 7, we present our results obtained using the set of parameters determined above.

Table 7. Obtained results of our model in Middlebury with

κ = 0.0

.

In Table 7, we observe that, in most cases, the obtained result is similar to that reported in [1]. For real sequences Dime and Hydra, our model outperforms the model in [1]. In the sequence Urban3, our method presents a larger error (

A A E

) than that given in Table 6.

5.2. Specific Location of Matching $κ \neq 0$

Considering the integration of exhaustive matching to our variational model, we perform an empirical evaluation to determine the best location of exhaustive matching in order to improve the optical flow estimation. We compare three strategies to locate exhaustive matching: (i) uniform location in a grid

G_{D} \subset Ω

, (ii) random location of matching, and (iii) matching located in the maximum of the magnitude of the gradient.

5.2.1. Uniform Locations

This is the most simple strategy to locate exhaustive matching. Location depends on the size of the grid

G_{D}

.

In Figure 18, we present exhaustive correspondences computed in a uniform grid. We present a zoomed area for Grove2 and Rubberwhale sequences.

Figure 18. Exhaustive matching represented with white arrows. (a) Exhaustive matching using

D = 18

and

P = 10

for Grove2. (b) Exhaustive matching using

D = 6

and

P = 8

for Rubberwhale.

We vary these parameters P and D from 2 to 10 and from 2 to 18, respectively. The minimum of the error is obtained for large values of D since the exhaustive search frequently yields some false matching, as can be seen in Figure 18b. Increasing the D parameter produces more confident matching. Thus, we select

P = 4

and

D = 16

. With these parameters, we include in the model around 900 matches. Considering these selected parameters, we obtained an average of

E P E = 0.1332

and

A A E = 3.0080

in TRAINING_SET. In VALIDATION_SET, we obtained the results showed in Table 8.

Table 8. Results obtained by the uniform matching location strategy in VALIDATION_SET.

For the Urban3 sequence, we used

P = 24

and

α (x) = 1.0

. This image is produce by CGI and presents auto-similarities.

5.2.2. Random Locations

We performed experiments locating the matching and using a grid created randomly. We computed around 900 matches in each experiment. We performed each experiment three times, obtaining results shown in Table 9.

Table 9. Average

E P E

and

A A E

obtained by Random Location Strategy in VALIDATION_SET set.

5.2.3. Locations for Maximum Magnitudes of the Gradient

We present in Table 10 our results using the maximum gradient.

Table 10. Average

E P E

and

A A E

obtained by the maximum gradient location strategy in VALIDATION_SET.

Comparing the average

E P E

and

A A E

obtained for each strategy, we observe that the best performance was obtained by the uniform location strategy, the second best performance by the random location strategy, the third best by the maximum gradient strategy.

5.3. Evaluation in MPI-Sintel

5.3.1. Uniform Location

As a first experiment, we computed the optical flow for the whole MPI-Sintel training database. We did not compute the optical flow in the images used to estimate the parameters of the algorithm. We computed 1033 optical flow and evaluated its

E P E

and

A A E

for each sequence. In Table 11, we present our results for MPI-Sintel:

Table 11. Summary of results. The end point error and average angular error obtained by uniform locations in the MPI-Sintel training set.

In Table 11, we show our obtained results by our proposed model using uniform location. We obtained

5.58

as the

E P E

error. We observe that the best performance was obtained for the sleeping sequence. This sequence presents the dragon and the sleeping girl and contains small displacements. The worst performance was obtained for the Ambush sequence, where the girl fights with a man in the snow. This sequence presents extremely large displacements that are very hard to handle.

5.3.2. Random Location

In the second experiment, we computed the optical flow for the MPI-Sintel training set. We performed two realizations of the optical flow estimation and computed the

E P E

and

A A E

average value on these realizations. In Table 12, we present our results:

Table 12. Summary of results. The end point error and average angular error obtained by random locations in the MPI-Sintel training set.

5.3.3. Location on Maximum Gradients

In the third experiment, we computed the optical flow for the MPI-Sintel training set. We performed matching with patches located in the maximum gradient magnitude of the current frame. Our obtained results are presented in Table 13.

Table 13. Summary of results. The end point error and average angular error obtained by locations on maximum gradients in the MPI-Sintel training set.

5.4. Summary

We summarize our results of these three sections in Table 14.

Table 14. Summary of the obtained results for different strategies.

The best performance of these three strategies was obtained by uniform locations. We processed the MIP-Sintel test database and uploaded our results to the website http://www.sintel.is.tue.mpg.de/results. The test database consists of eight sequences of 500 images in two versions: clean and final. The final MPI-Sintel version considers effects such as blur, shadows, illumination changes, and large displacements. This database is very challenging and represents the state-of-the-art evaluation dataset.

In Figure 19, we show the position reached by our method:

Figure 19. Obtained results by our method in the MPI-Sintel test database (at 18 October 2018).

In Figure 19, we can observe that our method (TV-L1+EM) obtained

E P E = 8.916

and is located in Position 124 of the 155 methods. Our proposal outperforms TV-L1 and LDOF. Our performance is similar to that obtained by MDP-Flow2 [3].

5.4.1. Combined Uniform–Large Gradient Locations

We obtained the best performance using uniform locations, so we decided to mix two strategies: uniform locations with large gradient locations. In order to implement this new strategy, we took the magnitude of the gradient of the reference frame and ordered them from large to small values. We then considered the first third of this ordered data as a large gradient and the second third of the data as the medium gradient. Pixels in the reference frame can belong to a large gradient set, a medium gradient set, or a small gradient set. In uniform locations where the gradient of pixels belongs to the large gradient, we set

χ (x) = 1

; else, we set

χ (x) = 0

.

In Table 15, we show our results obtained by this strategy using the MPI-Sintel database.

Table 15. Summary of results. The end point error and average angular error obtained by the combination of uniform and large gradients in the MPI-Sintel training set.

We observe in Table 15 that the performance of the strategy is outperformed by the uniform locations strategy. It seems that those two conditions, uniform location and large gradients, eliminate good correspondences located in medium or small gradients magnitude.

5.4.2. Combined Uniform–Medium Gradient Locations

We combined the location between uniform and medium gradient locations.

In Table 16, we show our results obtained by this combined strategy in the MPI-Sintel database.

Table 16. Summary of results. The end point error and average angular error obtained by the combination of uniform and medium gradients in the MPI-Sintel training set.

We observe in Table 16 that the performance of the strategy is outperformed by the uniform locations strategy. It seems that those two conditions, uniform location, and medium gradients, eliminate good correspondences located in large or small gradients magnitude.

5.4.3. Results Obtained by Horn-Schunck

As a starting point, we evaluated the implementation of Horn-Schunck in [18] in MPI-Sintel. The obtained results are shown in Table 17.

Table 17. Summary of results. The end point error and average angular error obtained by Horn-Schunck in the MPI-Sintel training set.

In Table 17, we show the performance obtained by the Horn-Schunck method using

α = 32

. The performance

E P E + A A E = 25.50

is almost double the performance of our proposed model.

5.4.4. Results Obtained by Horn-Schunck Using Uniform Locations

We performed the evaluation of MPI-Sintel using the extended version of Horn-Schunck to handle large displacements. In Table 18, we show the obtained results:

Table 18. Summary of results. The end point error and average angular error obtained by Horn-Schunck using uniform locations in the MPI-Sintel training set.

In Table 18, we show that the performance obtained by the Horn-Schunck using patches in uniform locations is outperformed by the results in Table 17. This extended version of Horn-Schunck presents

E P E + A A E = 30.76

. We explain this fact considering correct and false correspondences (outliers) given by the exhaustive search. The Horn-Schunck model is not robust to outliers because the method considers in its formulation an

L^{2}

data term and an

L^{2}

regularization term. On the other hand,

T V - L 1

is robust to outliers and performs better using additional information.

6. Discussion

We present here a method to estimate optical flow for realistic scenarios using two consecutive color images. In realistic scenarios, occlusions, illumination changes, and large displacements occur. Classical optical flow models may fail in realistic scenarios. The presented model incorporates the occlusion information on its energy based on the maximum error of the optical flow estimation.

This model handles illumination changes using a balance-term between gradients and intensities. The inclusion of this balance term allows the model to improve the performance of the optical flow estimation in scenarios with illumination changes, reflections, and shadows.

Thanks to the use of supplementary matching (given by exhaustive searches in specific locations), this model is able to handle large displacements.

We tested five strategies of specific matching locations: uniform locations, random locations, maximum gradient locations, uniform-maximum gradient locations, and uniform-medium gradient locations. The obtained results show that the best performance was obtained by the uniform location strategy and the second best performance was achieved by the random location strategy. Empirically, we demonstrated that the best criteria to localize exhaustive matching does not depend on image gradients but on a uniform spatial location. Additionally, we have used gradient patches instead of color patches to compute correspondences and have used this information in order to estimate optical flow. The obtained results show that the use of gradient patches outperform the model based on color.

We also tested the classical Horn-Schunk model and extended it to handle additional information coming from an exhaustive search. The obtained results show that this extended Horn-Schunck model cannot handle large displacements in this way due to the presence of many outliers given by the exhaustive search. The Horn-Schunck model uses an

L^{2}

norm in its formulation, which is not robust to outliers. On the other hand, the proposed model that uses the

T V - L 1

norm is robust to outliers and performs better using additional information. The use of an occlusion estimation that depends on optical flow error lets us eliminate simultaneously false exhaustive correspondences and good correspondences. Some correspondences (i.e., good correspondences) present large errors due to illumination changes, blur, reflections, shadows, or fog that may be present in the MPI-Sintel sequence. These correspondences are eliminated because they present large optical flow errors.

We estimated parameters of the model using the PSO algorithm, which helped us to improve the global performance based on the

E P E

and

A A E

error computed in a training subset of MPI-Sintel. The final result obtained in the whole database was similar to the one obtained in this small dataset. That means that the small dataset was representative of the database.

The proposed model needs a methodology to validate matching processes. This new methodology should assign to each match a reliable value in order to eliminate false matching.

The number of good matching is necessary to be increased in order to improve optical flow estimations. One way to do that is by the use of deformable patches. Performing an exhaustive search using deformable patches is necessary to have a good correspondence between two patches of the reference frame and the target frame. Once a good correspondence in the target frame is found, a patch in the reference frame can be divided into smaller patches. For each smaller patch, a local search on the target image can be performed in order to increase the number of matching.

Funding

This research received no external funding.

Acknowledgments

We thank Francisco Rivera for correcting grammatical and spelling of this manuscript.

Conflicts of Interest

The author declare no conflict of interest.

References

Sánchez, J.; Meinhardt-Llopis, E.; Facciolo, G. TV-L1 Optical Flow Estimation. Image Process. Line 2013, 2013, 137–150. [Google Scholar] [CrossRef]
Horn, B.K.P.; Schunck, B.G. Determining Optical Flow. Artif. Intell. 1981, 17, 185–204. [Google Scholar] [CrossRef]
Xu, L.; Jia, J.; Matsushita, Y. Motion Detail Preserving Optical Flow. In Proceedings of the IEEE CVPR, San Francisco, CA, USA, 13–18 June 2010. [Google Scholar]
Palomares, R.P.; Haro, G.; Ballester, C. A Rotation-Invariant Regularization Term for Optical Flow Related Problems. In Lectures Notes in Computer Science, Proceedings of the ACCV’14, Singapore, 1–5 November 2014; Springer: Cham, Switzerland, 2014; Volume 9007, pp. 304–319. [Google Scholar]
Wedel, A.; Pock, T.; Zach, C.; Bischof, H.; Cremers, D. An Improved Algorithm for TV-L1 Optical Flow, Statistical and Geometrical Approaches to Visual Motion Analysis. In Proceedings of the International Dagstuhl Seminar, Dagstuhl Castle, Germany, 13–18 July 2008; Volume 5604. [Google Scholar]
Brox, T.; Bregler, C.; Malik, J. Large Displacemnet Optical Flow. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami Beach, FL, USA, 20–25 June 2009. [Google Scholar]
Stoll, M.; Volz, S.; Bruhn, A. Adaptive Integration of Features Matches into Variational Optical Flow Methods. In Proceedings of the 11th Asian Conference on Computer Vision, Daejeon, Korea, 5–9 November 2012. [Google Scholar]
Lazcano, V. Some Problems in Depth Enhanced Video Processing. Ph.D. Thesis, Universitat Pompeu Fabra, Barcelona, Spain, 2016. Available online: http://www.tdx.cat/handle/10803/373917 (accessed on 17 October 2018).
Bruhn, A.; Weickert, J.; Feddern, C.; Kohlberger, T.; Schnoerr, C. Real-time Optical Flow Computation with Variational Methods. In Proceedings of the International Conference on Computer Analysis of Images and Patterns, Groningen, The Netherlands, 25–27 August 2003; pp. 222–229. [Google Scholar]
Weinzaepfel, P.; Revaud, J.; Harchaoui, Z.; Schmid, C. DeepFlow: Large displacement optical flow with deep matching. In Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013. [Google Scholar]
Timofte, R.; van Gool, L. Sparseflow: Sparse matching for small to large displacement optical flow. In Proceedings of the 2015 IEEE Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 5–9 January 2015. [Google Scholar]
Kennedy, R.; Taylor, C.J. Optical flow with geometric occlusion estimation and fusion of multiple frames. In Energy Minimization Methods in Computer Vision and Pattern Recognition, Proceedings of the International Workshop on Energy Minimization Methods in Computer Vision and Pattern Recognition, Hong Kong, China, 13–16 January 2015; Springer: Cham, Switzerland, 2015; Volume 8932, pp. 364–377. [Google Scholar]
Fortun, D.; Bouthemy, P.; Kervrann, C. Aggregation of local parametric candidates with exemplar-based occlusion handling for optical flow. Comput. Vis. Image Underst. 2016, 145, 81–94. [Google Scholar] [CrossRef]
Lazcano, V.; Garrido, L.; Ballester, C. Jointly Optical Flow and Occlusion Estimation for Images with Large Displacements. In Proceedings of the 13th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, Madeira, Portugal, 27–29 January 2018; Volume 5, pp. 588–595. [Google Scholar] [CrossRef]
Lazcano, V. Study of Specific Location of Exhaustive Matching in Order to Improve the Optical Flow Estimation. Information Technology—New Generations. In Proceedings of the 15th International Conference on Information Technology, Las Vegas, NV, USA, 16–18 April 2018; pp. 603–661. [Google Scholar]
Steinbruecker, F.; Pock, T.; Cremers, D. Large Displacement Optical Flow Computation without Warping. In Proceedings of the 2009 IEEE 12th International Conference on Computer Vision, Kyoto, Japan, 29 September–2 October 2009; pp. 185–203. [Google Scholar]
Xiao, J.; Cheng, H.; Sawhney, H.; Rao, C.; Isnardi, M. Bilateral Filetring-based Flow Estimation with Occlusion Detection. In Proceedings of the European Conference on Computer Vision, Graz, Austria, 7–13 May 2006; pp. 221–224. [Google Scholar]
Meinhardt-Llopis, E.; Sanchez, J.; Kondermann, D. Horn-Schunck Optical Flow with a Multi-Scale Strategy. Image Processing Line 2013, 2013, 151–172. [Google Scholar] [CrossRef]
Baker, S.; Scharstein, D.; Lewis, J.; Roth, S.; Black, M.; Szelinsky, R. A Database and Evaluation Methodology for Optical Flow. Int. J. Comput. Vis. 2011, 92, 1–31. [Google Scholar] [CrossRef]
Butler, D.J.; Wulff, J.; Stanley, G.B.; Black, M.J. A naturalistic open source movie for optical flow evaluation. In Proceedings of the European Conference on Computer Vision (ECCV), Florence, Italy, 7–13 October 2012; Part IV, LNCS 7577. Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C., Eds.; Springer: Berlin/Heidelberg, Germany, 2012; pp. 611–625. [Google Scholar]
Kennedy, J.; Eberhart, R. Particle Swarm Optimization. In Proceedings of the IEEE International Conference on Neural Networks. IV, Perth, Australia, 27 November–1 December 1995; pp. 1942–1948. [Google Scholar]

Figure 1. Occlusion estimation comparing optical flow error with a threshold

θ_{o c c}

. (a) Frame_0049 of sequence ambush_5; (b) Frame_0050; (c) occlusion estimation.

Figure 1. Occlusion estimation comparing optical flow error with a threshold

θ_{o c c}

. (a) Frame_0049 of sequence ambush_5; (b) Frame_0050; (c) occlusion estimation.

Figure 2. Correspondences of patches located uniformly in the reference image.

Figure 3. Correspondences of patches located randomly in the reference image.

Figure 4. Correspondence of patches located in the largest magnitude of the gradient. (a) Gradient magnitude of the reference image. (b) Ordered gradient from the minimum magnitude to the maximum magnitude. (c) Correspondences of the patches located in the maximum gradient magnitudes (white arrows).

Figure 5. (a) Gradient of the reference image. (b) Large gradient set. (c) Matching result of patches located in the large gradient uniform grid.

Figure 6. (a) Gradient of the reference image. (b) Medium gradient set. (c) Matching result of the patches located in the medium gradient uniform grid.

Figure 7. Images of the Middlebury database containing small displacements. (a) and (b) frame10 and frame11 of sequence Grove2, respectively. (c) and (d) frame10 and frame11 of sequence Grove3, respectively. (e) and (f) frame10 and frame11 of sequence RubberWhale, respectively. (g) and (h) frame10 and frame11 of sequence Hydrangea, respectively. (i) and (j) frame10 and frame11 of sequence Urban2, respectively and finally (k) and (l) of sequence Urban2, respectively.

Figure 8. Image extracted from MPI-Sintel clean version and MPI-Sintel final version. We extracted frame_0014 from sequence ambush_2. (a) frame_0014 clean version. (b) frame_0014 final version.

Figure 9. Color code used for the optical flow.

Figure 10. Examples of images of the MPI-Sintel database video sequence. (a,b) frame_0010 and frame_0011, (c) color-coded ground truth optical flow of the cave_4 sequence, (d) ground truth represented with arrows. (e,f) frame_0045 and frame_0046, (g) color-coded ground truth, and (h) arrow representation of optical flow of the cave_4 sequence, (i,j) frame_0030 and frame_0031, (k) color-coded ground truth optical flow, (l) arrow representation (in blue) of the temple3 sequence. (m,n) frame_0006 and frame_0007, (o) color-coded ground truth and (p) arrow representation (in green) ground truth optical flow of the ambush_4 sequence.

Figure 11. Color-coded optical flow. (a) Color-coded optical flow for Grove2. (b) Color-coded optical flow for RubberWhale. (c) Ground truth for the Grove2 sequence. (d) Ground truth for the RubberWhale sequence.

Figure 12. Color-coded optical flow. (a) Color-coded coded optical flow for Grove2. (b) Color-coded optical flow for RubberWhale. (c) Ground truth for the Grove2 sequence. (d) Ground truth for the RubberWhale sequence. (e) Weight map

α (x)

for Grove2. (f) Weight map

α (x)

for RubberWhale.

Figure 12. Color-coded optical flow. (a) Color-coded coded optical flow for Grove2. (b) Color-coded optical flow for RubberWhale. (c) Ground truth for the Grove2 sequence. (d) Ground truth for the RubberWhale sequence. (e) Weight map

α (x)

for Grove2. (f) Weight map

α (x)

for RubberWhale.

Figure 13. Image sequences used to estimate model parameters. In the parameter estimation, we used the first two frames of each considered sequence: (a–d) the frames of the alley_1 sequence, the ground truth, and our results, respectively. (e,f) the ambush_2 sequence, (g) the ground truth, and (h) our result. (i,j) frames of the bamboo_2 sequence, (k) the ground truth, and (l) our result. (m,n) the frame of bandage_1 sequence (o) the ground truth, and (p) our results.

Figure 14. Performance of the PSO algorithm. (a) Performance of each individual. (b) Performance of the best individual in each generation.

Figure 15. Performance of the proposed model considering variation in P and D parameters.

Figure 16. Performance of the Horn-Schunck method for different

α_{h s}

values.

Figure 16. Performance of the Horn-Schunck method for different

α_{h s}

values.

Figure 17. Performance of the Horn-Schunck model for different P and D parameters.

Figure 18. Exhaustive matching represented with white arrows. (a) Exhaustive matching using

D = 18

and

P = 10

for Grove2. (b) Exhaustive matching using

D = 6

and

P = 8

for Rubberwhale.

Figure 18. Exhaustive matching represented with white arrows. (a) Exhaustive matching using

D = 18

and

P = 10

for Grove2. (b) Exhaustive matching using

D = 6

and

P = 8

for Rubberwhale.

Figure 19. Obtained results by our method in the MPI-Sintel test database (at 18 October 2018).

Table 1. Number of images in each image sequence.

Sequence Name	Number of Images
alley	100
ambush	174
bamboo	100
bandage	100
cave	100
market	140
mountain	50
shaman	100
sleeping	100
temple	100
Total images	1064

Table 2. PSO parameters.

Parameter	Value
NIndividuals	3
NGeneration	20
NSequences	8
$ω$	0.5
$φ_{g}$	0.5
$φ_{b}$	1.0

Table 3. Results obtained by PSO in the MPI-Sintel selected training set.

	alley_1	ambush_4	bamboo_2	bandage_1	cave_4	market_5	mountain_1	temple_3
$E P E$	$0.22$	$10.21$	$0.23$	$0.78$	$3.49$	$18.37$	$9.71$	$6.50$
$A A E$	$2.30$	$6.68$	$4.32$	$7.25$	$11.91$	$25.53$	$0.74$	$30.07$

Table 4. Results obtained by PSO in the MPI-Sintel selected training set.

	alley_1	ambush_4	bamboo_2	bandage_1	cave_4	market_5	mountain_1	temple_3
$E P E$	$0.48$	$16.72$	$0.34$	$1.81$	$5.47$	$24.82$	$1.14$	$9.47$
$A A E$	$4.81$	$10.79$	$6.27$	$26.57$	$22.69$	$37.87$	$14.96$	$61.54$

Table 5. Results obtained by the PSO in the MPI-Sintel selected training set.

	alley_1	ambush_4	bamboo_2	bandage_1	cave_4	market_5	mountain_1	temple_3
$E P E$	$0.48$	$16.87$	$0.34$	$1.80$	$6.42$	$23.30$	$1.15$	$8.92$
$A A E$	$4.77$	$10.89$	$6.22$	$26.32$	$22.94$	$34.87$	$14.97$	$55.27$

Table 6. The reported performance of TV-L1 in Middlebury [1].

Error	Dime	Grove3	Hydra	Urban3	Venus	Average
$E P E$	0.162	0.721	0.258	0.711	0.394	0.4492
$A A E$	2.888	6.590	2.814	6.631	6.831	5.1508

Table 7. Obtained results of our model in Middlebury with

κ = 0.0

.

Table 7. Obtained results of our model in Middlebury with

κ = 0.0

.

Error	Dime	Grove3	Hydra	Urban3	Venus	Average
$E P E$	0.0925	0.7090	0.1729	0.7078	0.3492	0.4063
$A A E$	1.8248	6.5913	2.0626	6.9080	6.0818	4.6937

Table 8. Results obtained by the uniform matching location strategy in VALIDATION_SET.

Error	Dime	Grove3	Hydra	Urban3	Venus	Average
$E P E$	0.0975	0.6924	0.1672	0.4811	0.3034	0.3485
$A A E$	1.8739	6.4759	2.0160	4.334	4.2259	3.7872

Table 9. Average

E P E

and

A A E

obtained by Random Location Strategy in VALIDATION_SET set.

Table 9. Average

E P E

and

A A E

obtained by Random Location Strategy in VALIDATION_SET set.

Error	Dime	Grove3	Hydra	Urban3	Venus	Average
$E P E$	0.0967	0.6798	0.1666	0.5446	0.2988	0.3573
$A A E$	1.8876	6.3781	2.0214	4.8060	4.1977	3.8581

Table 10. Average

E P E

and

A A E

obtained by the maximum gradient location strategy in VALIDATION_SET.

Table 10. Average

E P E

and

A A E

obtained by the maximum gradient location strategy in VALIDATION_SET.

Error	Dime	Grove3	Hydra	Urban3	Venus	Average
$E P E$	0.0938	0.7147	0.1772	0.7338	0.3014	0.4042
$A A E$	1.8596	6.5793	2.1159	5.9172	4.3285	4.1601

Table 11. Summary of results. The end point error and average angular error obtained by uniform locations in the MPI-Sintel training set.

Sequence Name	$EPE$	$AAE$
alley	0.36	3.05
ambush	19.72	26.07
bamboo	1.68	13.66
bandage	0.64	7.01
cave	9.26	13.13
market	8.79	13.61
mountain	1.02	10.29
shaman	0.38	7.18
sleeping	0.11	1.79
temple	9.94	13.71
Total $E P E$	5.58	10.87
$E P E + A A E$	16.45

Table 12. Summary of results. The end point error and average angular error obtained by random locations in the MPI-Sintel training set.

Sequence Name	$EPE$	$AAE$
alley	0.36	3.04
ambush	19.59	25.70
bamboo	1.69	13.69
bandage	0.64	7.02
cave	9.54	13.52
market	8.78	13.72
mountain	1.02	10.31
shaman	0.38	7.19
sleeping	0.11	1.79
temple	10.16	14.03
Total $E P E$	5.62	10.91
$E P E + A A E$	16.53

Table 13. Summary of results. The end point error and average angular error obtained by locations on maximum gradients in the MPI-Sintel training set.

Sequence Name	$EPE$	$AAE$
alley	0.37	3.05
ambush	19.83	26.50
bamboo	1.66	13.56
bandage	0.65	7.04
cave	9.72	14.18
market	8.88	13.78
mountain	1.00	10.22
shaman	0.37	7.12
sleeping	0.11	1.79
temple	9.90	14.46
Total $E P E$	5.64	11.10
$E P E + A A E$	16.73

Table 14. Summary of the obtained results for different strategies.

Strategy	$EPE$	$AAE$
Uniform	5.58	10.87
Random	5.62	10.91
Maximum Gradient	5.64	11.10

Table 15. Summary of results. The end point error and average angular error obtained by the combination of uniform and large gradients in the MPI-Sintel training set.

Sequence Name	$EPE$	$AAE$
alley	0.36	3.03
ambush	19.85	26.43
bamboo	1.70	13.60
bandage	0.65	7.03
cave	9.84	14.39
market	9.10	14.07
mountain	0.98	10.24
shaman	0.38	7.16
sleeping	0.11	1.78
temple	10.03	14.31
Total $E P E$	5.70	11.14
$E P E + A A E$	16.83

Table 16. Summary of results. The end point error and average angular error obtained by the combination of uniform and medium gradients in the MPI-Sintel training set.

Sequence Name	$EPE$	$AAE$
alley	0.63	5.99
ambush	26.42	34.81
bamboo	3.16	27.47
bandage	1.19	18.21
cave	12.49	19.18
market	10.76	19.90
mountain	1.43	13.57
shaman	0.67	14.39
sleeping	0.14	2.44
temple	16.21	26.23
Total $E P E$	5.67	11.10
$E P E + A A E$	16.77

Table 17. Summary of results. The end point error and average angular error obtained by Horn-Schunck in the MPI-Sintel training set.

Sequence Name	$EPE$	$AAE$
alley	0.36	3.02
ambush	19.91	26.46
bamboo	1.67	13.59
bandage	0.65	7.03
cave	9.77	14.32
market	9.10	14.07
mountain	0.99	10.24
shaman	0.38	7.16
sleeping	0.11	1.79
temple	9.80	13.90
Total $E P E$	7.76	17.74
$E P E + A A E$	25.50

Table 18. Summary of results. The end point error and average angular error obtained by Horn-Schunck using uniform locations in the MPI-Sintel training set.

Sequence Name	$EPE$	$AAE$
alley	0.64	6.07
ambush	26.01	34.88
bamboo	3.52	28.04
bandage	1.19	18.14
cave	34.44	45.41
market	11.74	20.60
mountain	1.59	14.33
shaman	0.66	11.85
sleeping	0.29	2.95
temple	17.33	27.95
Total $E P E$	10.16	20.95
$E P E + A A E$	30.76

© 2018 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Multiple requests from the same IP address are counted as one view.

An Empirical Study of Exhaustive Matching for Improving Motion Field Estimation

Abstract

1. Introduction

1.1. Optical Flow Estimation

1.2. Robust Motion Estimation

1.3. Related Works

1.4. Contribution of This Version

2. Materials and Methods

2.1. Proposed Model

2.2. Linearization

2.2.1. Decoupling Variable

2.2.2. Color Model

2.2.3. Optical Flow to Handle Large Displacements

2.2.4. Occlusion Estimation

2.2.5. Solving the Model

2.3. An Extended Version of Horn-Schunck’s Optical Flow

An Extended Version of the Horn-Schunck’s Optical Flow

3. Implementation and Pseudo-Code

3.1. Exhaustive Search

3.2. Matching Confidence Value

3.3. Construction of χ ( x )

3.3.1. Uniform Location

3.3.2. Random Location

3.3.3. Location in Maximum Values of the Gradient Magnitude

3.3.4. Location in the Large Magnitudes of the Gradient in a Uniform Location

3.3.5. Location in the Medium Magnitudes of the Gradient in a Uniform Location

3.3.6. χ ( x ) Value

3.4. Pseudo Code

4. Experiments and Database

4.1. Middlebury Database

4.2. The MPI-Sintel Database

4.3. Experiments with the Middlebury Database

4.4. Parameter Estimation

4.5. Experiments with MPI-Sintel

4.6. Parameter Estimation of the Proposed Model with MPI-Sintel

4.6.1. Parameter Model Estimation

4.6.2. Exhaustive Search Parameter Estimation

4.7. Parameter Estimation of the Extended Horn-Schunk

4.7.1. α h s Parameter

4.7.2. Exhaustive Search Parameter Estimation

5. Results

5.1. Reported Results in Middlebury

5.2. Specific Location of Matching κ ≠ 0

5.2.1. Uniform Locations

5.2.2. Random Locations

5.2.3. Locations for Maximum Magnitudes of the Gradient

5.3. Evaluation in MPI-Sintel

5.3.1. Uniform Location

5.3.2. Random Location

5.3.3. Location on Maximum Gradients

5.4. Summary

5.4.1. Combined Uniform–Large Gradient Locations

5.4.2. Combined Uniform–Medium Gradient Locations

5.4.3. Results Obtained by Horn-Schunck

5.4.4. Results Obtained by Horn-Schunck Using Uniform Locations

6. Discussion

Funding

Acknowledgments

Conflicts of Interest

References

Article Metrics

Article Access Statistics

3.3. Construction of $χ (x)$

3.3.6. $χ (x)$ Value

4.7.1. $α_{h s}$ Parameter

5.2. Specific Location of Matching $κ \neq 0$