Next Article in Journal
From the Solution Processing of Hydrophilic Molecules to Polymer-Phthalocyanine Hybrid Materials for Ammonia Sensing in High Humidity Atmospheres
Next Article in Special Issue
Thermal Tracking of Sports Players
Previous Article in Journal
Experimental Energy Consumption of Frame Slotted ALOHA and Distributed Queuing for Data Collection Scenarios
Previous Article in Special Issue
Small Infrared Target Detection by Region-Adaptive Clutter Rejection for Sea-Based Infrared Search and Track
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

Trends in Correlation-Based Pattern Recognition and Tracking in Forward-Looking Infrared Imagery

1
Department of Electrical and Computer Engineering, University of South, Alabama Mobile, AL 36688-0002, USA
2
Department of Electrical Engineering, Tuskegee University, Tuskegee, AL 36088, USA
*
Author to whom correspondence should be addressed.
Sensors 2014, 14(8), 13437-13475; https://doi.org/10.3390/s140813437
Submission received: 18 June 2014 / Revised: 16 July 2014 / Accepted: 16 July 2014 / Published: 24 July 2014

Abstract

: In this paper, we review the recent trends and advancements on correlation-based pattern recognition and tracking in forward-looking infrared (FLIR) imagery. In particular, we discuss matched filter-based correlation techniques for target detection and tracking which are widely used for various real time applications. We analyze and present test results involving recently reported matched filters such as the maximum average correlation height (MACH) filter and its variants, and distance classifier correlation filter (DCCF) and its variants. Test results are presented for both single/multiple target detection and tracking using various real-life FLIR image sequences.

1. Introduction

Pattern recognition deals with the detection and identification of a desired pattern or target in an unknown input scene, which may or may not contain the target, and the determination of the spatial location of any target present. In pattern recognition or classification, the input is an image while the output is a decision signal based on some characteristic features of the input image. The number of features is usually fewer than the total necessary to describe the complete target of interest, and this leads to a loss of information. Because of the crucial role of decision making required in pattern recognition, it is fundamentally an information reduction process, whereby, it is not possible to reconstruct the pattern but it is possible to give a precise decision [13].

Although a great deal of effort has been expended on detecting objects in visual images, only limited amount of work has been reported on the detection and tracking of targets in infrared images. In general, existing methods on infrared images work for a limited number of situations due to various practical constraints. An infrared sensor detects infrared radiation and converts it to an image by converting the temperature difference between an object and the surrounding background. The temperature scale is converted into a color scale or a gray scale on a display and in this way an image is obtained. This type of sensor or camera can image an object through smoke in a burning house, heat leaking from a house or objects in the absence of any reflected light (at night) [4]. The images captured by infrared sensors are becoming an integral part of the ongoing research on automatic target recognition (ATR).

Forward-looking infrared (FLIR) images are frequently used in ATR applications. The detection and discrimination of targets in infrared imagery has been a challenging problem due to low signal-to-noise ratio (SNR) and the variability of target and clutter signatures. The FLIR sequences, tested in this work, are recorded from a moving platform and include independently moving objects under various distortions and background variations. Thus, sensor ego-motion and object motions induce coupled motions into the FLIR images, which make the detection and tracking of the objects extremely complicated. To detect independently moving objects in FLIR image sequences, the sensor properties must also be taken into account.

Real life FLIR imagery demonstrates a number of well-known challenges such as significantly high level of variability of target thermal signatures, size/aspect, locations within the scene; large number of target classes; lack of prior information; obscured targets; competing cluttered background scenery; different geographic, meteorological and weather conditions; time of the day; high ego-motion; sensor noise; and variations caused by translation, rotation, and scaling of the targets. Furthermore, inconsistencies in the signature of targets, similarities between the signatures of different targets, limited training and testing data, camouflaged targets, non-repeatability of target signatures, and difficulty in exploiting contextual information make the recognition problem even more challenging in target detection in FLIR imagery. In the case of FLIR images, additional challenges are caused due to the following important differences [57] with visual sequences:

  • The thermal images are obtained by sensing the radiation in the infrared spectrum, which is either emitted or reflected by the object in the scene. Due to this property the images obtained from an infrared sensor have extremely low SNR, which results in limited information for performing detection or tracking task.

  • FLIR imagery smoothes out object edges and corners leading to a reduction of distinct features.

  • The generation and maintenance of kinetic energy usually heats up a moving object (e.g., friction, engine combustion). Consequently, moving objects often appear brighter than the background.

  • FLIR images are noisy and have less contrast. Moreover, they often contain dirt on the lens, or local sensor failure at certain pixel locations.

  • FLIR sequences are not easily available (especially not from controlled experiments) and have a lower resolution. The sequences available to us are 128 × 128 pixels as compared to the 512 × 512 pixels, and more, of standard visual cameras.

  • FLIR sequences are often under difficult circumstances and may have abrupt discontinuities in motion.

Due to the limitations and difficulties of FLIR imagery, they demand more robust techniques than visual sequences. This paper presents some widely used pattern recognition and target tracking techniques adopted for FLIR imagery. In this paper, we discuss several target detection and tracking algorithms which are based on the recently reported matched filter-based correlation techniques such as the MACH, EMACH, DCCF, and PDCCF filters. The performance of these algorithms was tested using real life FLIR image sequences supplied by the Army Missile Command.

2. Matched Filter-Based Correlation

The matched filter-based correlator was first introduced in 1964 [8]. In this technique, the input signal f(σ,ε) is Fourier transformed to yield

F ( ο , υ ) = [ f ( σ , ɛ ) ]
where ℑ represents the Fourier transform operation, σ and ε represents the spatial domain variables, and σ and υ represents the frequency domain variables, respectively. The correlation output is obtained by inverse Fourier transform operation given by
g ( σ , ɛ ) = 1 [ F ( ο , υ ) H ( ο , υ ) ]
where, ℑ −1 represents 2D inverse Fourier transform operation.

If the input object is moved laterally in the input plane, the Fourier transform remains fixed in space but is multiplied by a phase factor that depends on the lateral movement. Therefore, the coordinates of the bright correlation output is proportional to the coordinates of the signal f(σ,ε) located at the input plane. The intensity of the bright correlation spot is proportional to the degree to which the input and the filter functions are matched. This is also valid for multiple objects present at different locations of the input plane. This correlation system provides a great deal of sensitivity since it is both phase matched and amplitude matched [911].

For the complex matched filter (CMF), the frequency plane filter function is expressed by

H cmf = R ( ο , υ ) = | R ( ο , υ ) | exp [ j ϕ ( ο , υ ) ]
where, |R(ο,υ)| is the amplitude and φ(ο,υ) is the phase factor of the Fourier spectrum of the reference function r(σ,ε). When the input is similar to r(σ,ε), the phase variation is canceled at the Fourier plane, thus producing a plane wave of light. The correlation peak corresponding to CMF is not very sharp due to the squaring of the magnitude of r(σ,ε). Consequently, the resulting diffraction efficiency of a CMF is very poor. This filter is also unacceptably sensitive to even small changes in the reference signal or image. Currently available spatial light modulators (SLMs) cannot accommodate the full complex frequency response needed by CMFs.

3. Phase Only Filter (POF)

The optimum case with respect to light efficiency for a matched filter is realized by a phase only filter (POF) structure [10]. This filter is obtained by omitting the amplitude information and is defined as

H pof ( ο , υ ) = R ( ο , υ ) | R ( ο , υ ) | = exp [ j ϕ ( ο , υ ) ]

Besides the improvement in light efficiency, the correlation peak intensity is also enhanced with a POF. However, although the autocorrelation peak intensity is higher than that of a CMF, it is not as sharp as might be produced by an IF since the product Hpof(ο,υ)R(ο,υ) is not necessarily a constant.

4. Amplitude-Modulated Phase Only Filter (AMPOF)

Amplitude modulated phase only filter (AMPOF) [12] is given by

H ampof ( ο , υ ) = A B + | R ( ο , υ ) | exp [ j ϕ ( ο , υ ) ]
where A and B are either constants or functions of ο and υ. The gain control factor A guarantees that the transmittance of the filter is less than unity. With the inclusion of B; the pole problem is solved and at the same time it is possible to yield very high autocorrelation peak.

5. Synthetic Discriminant Functions (SDF)

The SDF-based correlation filters had shown robust performance for distortion tolerant pattern recognition [1315]. Assume x1(σ,ε), x2(σ,ε), …, xN(σ,ε) denote N training images representing possible distortions to a reference image x(σ,ε). The 2D Fourier transform of x(σ,ε) may be expressed as

X ( ο , υ ) = x ( σ , ɛ ) exp [ j 2 π ( ο σ + υ ɛ ) ] × d σ d ɛ

A composite image h(σ,ε) is designed from the training images such that when the complex conjugate of its Fourier transform, denoted as H*(ο,υ), is correlated with input Fourier transform, a similar output is obtained for all N inputs, x1(σ,ε), x2(σ,ε), …, xN(σ,ε). This type of correlator is known as frequency plane correlator. For this filter, the resulting spatial domain correlation output c(τσ,τε) may be expressed as [13]

c ( τ σ , τ ɛ ) = F ( ο , υ ) H ( ο , υ ) exp [ j 2 π ( ο τ σ + υ τ ɛ ) ] d ο d υ = h ( σ , ɛ ) f ( σ + τ σ , ɛ + τ ɛ ) d σ d ɛ = h ( σ , ɛ ) f ( σ , ɛ )
where ⊙ denotes a two-dimensional cross-correlation operation.

In the Equal Correlation Peak SDF (ECP-SDF) design, the objective is to select a filter impulse response h(σ,ε) such that the resulting crosscorrelations with all the N training images are the same, which is impossible in practice. Hester and Casasent [13] introduced a technique that requires that only the values at the origin of these crosscorrelations should be the same as shown in the following equation,

h ( σ , ɛ ) x i ( σ , ɛ ) | τ σ = 0 , τ ɛ = 0 = h ( σ , ɛ ) x i ( σ , ɛ ) d σ d ɛ = c , i = 1 , 2 , . N
where c is a prespecified constant. Equation (8) shows that a h(σ,ε) would yield the same constant value c at the origin (location of the autocorrelation peak) for all N training images (i.e., x1(σ,ε) to xN(σ,ε)). When the input is a non-training image from the same class, the crosscorrelation output at the origin will be similar to this constant c and it can be recognized. The success of this approach depends on selecting a proper training set.

Assume that h(σ,ε) is a linear combination of the N training images, given by

h ( σ , ɛ ) = a 1 x 1 ( σ , ɛ ) + .. a N x N ( σ , ɛ )
where the coefficients a1, a2, … , aN are determined in a way to satisfy the constraints of Equation (8). Substituting Equation (9) into Equation (8), we get
i = 1 N a i R i j = c , j = 1 , 2 , , N ,
where
R i j = x i ( σ , ɛ ) x j ( σ , ɛ ) d σ d ɛ
is the inner product, i.e., the crosscorrelation at the origin of the training images xi(σ,ε) and xj(σ,ε). If the training images are real, then there is no need for the conjugate operation shown in Equation (11). Equation (10) represents N complex linear equations with N complex unknowns, a1, a2, …, aN, respectively. These equations can be solved by using any standard techniques, such as Gauss-Seidel Elimination method.

6. Modified Synthetic Discriminant Functions

The ECP-SDF is designed to produce a value ci at the origin of the output plane when the i-th training image is used as the input. However, there are some practical problems in using this filter for pattern recognition applications [1618]. Consequently, some improvements on ECP_SDF are suggested which are introduced in the following subsections.

6.1. Generalized SDF

Assume that the training images x1(σ,ε), x2(σ,ε), …, xN(σ,ε) are sampled to yield arrays x1(m,n), x2(m,n), …, xN(m,n) each with ρ = ρ1ρ2 pixels, where ρ1 is the number of pixels in the vertical direction while ρ2 is the number of pixels in the horizontal direction of each image. It is also assumed that ρ-dimensional column vectors x1, x2, …, xN are obtained by placing the elements in these training images in vectors where the scanning direction is from left to right and from top to bottom. Similarly, the ρ-dimensional column vector h is used to denote the composite image h(m,n). Then the constraints in Equation (8) can be rewritten as

h + x i = c i , i = 1 , 2 , 3 , . N
where the superscript + denotes the conjugate transpose operation. The data matrix X is assumed to have the vector xi as its i-th column and is thus a ρ × N matrix. It is also assumed that ρ ≫ N, i.e., the number of pixels in the training images is much larger than the number of training images, and that the columns of this matrix are linearly independent. Using this notation, Equation (12) can be rewritten as
X + h = c

The ECP-SDF assumes that the composite image h is of the following form

h = Xa
where a is the vector of coefficients. Substituting Equation (14) into Equation (13) and solving for a yields
a = ( X + X ) 1 c

The filter vector h can be obtained by substituting Equation (15) into Equation (14) to get

h ECP = X ( X + X ) 1 c

A general expression for h satisfying Equation (13) is given by

h GSDF = X ( X + X ) 1 c + [ I d X ( X + X ) 1 X + ] z
where Id is the ρ × ρ diagonal identity matrix and z is any column vector with ρ complex entries. The ECP-SDF is obtained from Equation (17) when z = 0. The filter vector in Equation (17) is known as the generalized SDF [17].

6.2. Minimum Variance SDF

Consider a situation where the input image is one of the training images xi corrupted by additive noise n. Then the resulting output value y, i.e., the value of the crosscorrelation at the origin is given by

y = h + ( x i + n ) = c i + h + n
where h is designed to satisfy Equation (12). From Equation (18) it is evident that the output y is the desired output ci corrupted by the random variable (h+n). The minimum variance synthetic discriminant function (MVSDF) [18] attempts to design h such that the variance in the output caused by input noise is minimized while satisfying the constraints in Equation (13).

Assume that the real noise vector n is a zero-mean vector with a ρ × ρ covariance matrix Σ. The variance of y corresponding to h+n can be expressed as

σ y 2 = E { | h + n | 2 } = E { h + n n + h } = h + h

It is desired that σ y 2 in Equation (19) is as small as possible, which will ensure that the output values are close to the constrained values even in the presence of noise. Minimizing σ y 2 in Equation (19) subject to the constraints in Equation (13) leads to the following MVSDF [18]

h MVSDF = 1 X ( X + 1 X ) 1 c

This MVSDF is indeed optimal from noise tolerance considerations. One difficulty in using this MVSDF is that often Σ is not known. Even when it is known, it is impossible to calculate its inversion. Another problem is that the MVSDF controls only one point (the origin) in the output-correlation plane. Thus, large sidelobes may be observed in the correlation output.

6.3. Frequency-Domain SDFs

It is often more convenient to design the filters in the frequency domain [1618]. Assume f(σ,ε) is the image and H*(ο,υ) is the complex filter function. Then the resulting correlation output c(τσ,τε) at the origin is given by

c ( 0 , 0 ) = H ( ο , υ ) F ( ο , υ ) d ο d υ = h ^ f ^
where ˆ indicates that the corresponding vector or matrix is obtained by sampling frequency domain functions and the superscript + indicates a conjugate transpose operation. Because ci(0,0) is constrained to be ci, i = 1, 2, …, N, the constraints can be rewritten as
F ^ + h ^ = c
where F ^ is a matrix with N columns with the i-th column containing f ^ i. It is obvious that the Equations (13) and (22) are similar.

6.4. Minimum Average Correlation Energy (MACE) Filter

The correlation filters discussed so far control only one point in the correlation plane. For good location accuracy and discrimination, it is necessary to design filters capable of producing sharp correlation peaks. One such filter is the minimum average correlation energy (MACE) filter [19]. Assume x1(σ,ε), x2(σ,ε), …, xN(σ,ε) denote the N training images and X1(ο,υ), …, XN(ο,υ) denote their 2D Fourier transforms, respectively. If H*(ο,υ) denotes the transmittance of the filter function, then the filter may be constructed to satisfy the following condition.

X i ( ο , υ ) H ( ο , υ ) d ο d υ = c i , i = 1 , 2 , , N

In addition, the MACE filter minimizes the average correlation plane energy as shown below.

E ave = 1 N i = 1 N | c i ( τ σ , τ ɛ ) | 2 d τ σ d τ ɛ = 1 N i = 1 N | X i ( ο , υ ) | 2 | H ( ο , υ ) | 2 d ο d υ

By minimizing Eave, it is possible to keep the sidelobes in the correlation plane as low as possible. This is essentially an indirect attempt at reducing the problem of sidelobes. To carry out the minimization of Eave, the usual vector notation is used. If x ^ i denote the ρ-dimensional complex column vector obtained by sampling Xi(ο,υ), then the constraints in Equation (23) can be rewritten as

X ^ + h ^ = c
where X ^ is a ρ × N matrix with x ^ i as its i-th column. The Eave in Equation (24) can be expressed as
E ave = h ^ + D ^ h ^
where D ^ is a ρ × ρ diagonal matrix. The entries along the diagonal are obtained by averaging |Xi(ο,υ)|2, i = 1, 2, …, N, and then scanning the average from left to right and from top to bottom. Minimizing Eave in Equation (26) subject to the constraints in Equation (25) leads to the following filter
h ^ MACE = D ^ 1 X ^ ( X ^ + D ^ 1 X ^ ) 1 c

In many simulation studies, filters designed using this approach produced sharp correlation peaks. However, MACE filters appear to have two drawbacks. The first is that there is no noise tolerance built into these filters. The second is that these filters seem to be more sensitive to intra-class variations. Casasent et al. [20] proposed Gaussian MACE filters to reduce the sensitivity of the MACE filters to intra-class variations. The idea behind Gaussian MACE filters is to reduce the sharpness of the resulting correlation peak and thus improve its noise tolerance. MACE filters appear to be the first set of composite filters that attempt to control the entire correlation plane.

6.5. Minimum Squared Error SDF (MSE-SDF) Filter

This SDF design approach yields better approximation of arbitrary output correlation shapes in the minimum squared error (MSE) sense over the MACE filter and this filter is termed as MSE-SDF [21]. Like MACE, this filter must satisfy the usual SDF constraint of Equation (25). Besides, in MSE-SDF, the filter function H(ο,υ) must make the correlation function ci approximate a prespecified desired shape ti, i = 1, 2, …, N. One measure of how well ci approximates ti is the average squared error E defined as

E = 1 N i = 1 N | t i ( τ σ , τ ɛ ) c i ( τ σ , τ ɛ ) | 2 d τ σ d τ ɛ = 1 N i = 1 N | T i ( ο , υ ) X i ( ο , υ ) H ( ο , υ ) | 2 d ο d υ
where Ti is the Fourier transform of ti. If X ^ i D is obtained by converting the vectors x ^ i into diagonal matrices, then the average squared error of Equation (28) becomes
E = [ E d h ^ + p ^ p ^ + h ^ + h + M ^ h ^ ]
where
E d = 1 N i = 1 N ( t i + t i )
p ^ = 1 N i = 1 N ( X i D t i )
M ^ = 1 N i = 1 N ( X i D X i D )

Minimizing E in Equation (29) subject to the constraints in Equation (25) leads to the following MSE-SDF filter

h ^ MSE SDF = M ^ 1 p ^ + M ^ 1 X ^ [ ( F ^ + M ^ 1 F ^ ) 1 ] [ c ^ F ^ + M ^ 1 p ^ ]

The MSE-SDF filter allows approximating arbitrary correlation shapes rather than zero shape implied in MACE filter design. This explicit control has two benefits [21]. First, correlation shapes can be selected with the linear/nonlinear postprocessing in mind and second, those correlation shapes that lead to better filter design can be used instead of simply minimizing the average correlation energy.

7. Maximum Average Correlation Height (MACH) Filter

The primary objective of the correlation filters is to achieve distortion-tolerant recognition of objects in the presence of clutter. This problem is easier to solve for in-plane rotations and scale changes. However, the prevalent method for handling out-of-plane distortions is to use a training set of representative views of the object. Traditionally, in the design of SDF-type correlation filters, linear constraints are imposed on the training images to yield a known value at specific locations in the correlation plane. However, placing such constraints in the correlation plane satisfies conditions only at isolated points in the image space but does not explicitly control the filter's ability to generalize over the entire domain of the training images. Various filters exhibit different levels of distortion tolerance even with the same training set and constraints.

The MACH filter adopts a statistical approach for filter design [22,23]. In addition to yielding sharp peaks and being computationally simple, this filter offers improved distortion tolerance. The reason lies in the fact that training images are not treated as deterministic representations of the object but as samples of a class whose characteristic parameters should be used in encoding the filter.

It is assumed that the training set consists of N images, and that each image of size ρ1 × ρ2 contains ρ = ρ1ρ2 pixels. The i-th training image for the target class is denoted by xi(m,n) in the spatial domain, which is represented in the frequency domain by a ρ × 1 vector xi, obtained by lexicographically reordering its two-dimensional discrete Fourier transform, Xi(k,l). The Fourier domain filter is denoted by the ρ × 1 vector h. The two-dimensional filter H(k,l) is obtained by rearranging h into a two-dimensional image. In this paper, matrices are denoted by uppercase bold-face and vectors by lowercase bold-face characters. The correlation of the i-th training image and the filter can be expressed in the frequency domain as

g i = X i h
where Xi is a ρ × ρ diagonal matrix containing the elements of xi. Here, gi denotes the discrete Fourier transform of the i-th correlation output. The deviation in the shape of the correlation plane with respect to some ideal shape vector f is quantified by the average squared error (ASE), defined as
ASE = 1 N i = 1 N ( g i f ) + ( g i f )

Thus, ASE is a measure of distortion with respect to reference shape f, which can be chosen as desired.

In fact, the shape vector f can be treated as a free parameter in the distortion minimization problem. In the design of MSE-SDF [21], f is specified as Gaussian or ring-like shapes in order to sculpt the correlation surface into these forms. In MACH, the choice of f is such that it causes least variation among the correlation planes and offers minimum ASE. To find the optimum shape fopt, the gradient of ASE with respect to f is set to zero, given by

f ( ASE ) = 2 N i = 1 N ( g i f ) = 0
or,
f opt = 1 N i = 1 N g i = g ¯
where
g ¯ = 1 N i = 1 N X i h = Mh
is the average correlation plane and M = ( 1 / N ) i = 1 N X i is the average training image expressed as a diagonal matrix. Thus, among all possible reference shapes, the average correlation plane g ¯ offers the smallest possible ASE and the least distortion (in the squared error sense) among the correlation planes.

Substituting f = g ¯ in the ASE expression, the average similarity measure (ASM) is obtained as

ASM = 1 N i = 1 N ( g i g ¯ ) + ( g i g ¯ ) = 1 N i = 1 N ( X i h Mh ) + ( X i h Mh ) = h + [ i = 1 N ( X i M ) ( X i M ) ] h = h + S x h
where
S x = i = 1 N ( X i M ) ( X i M )
is a diagonal matrix measuring the similarity of the training images to the class mean in the frequency domain. For example, if all training images are identical, then Sx would be an all-zero matrix. From Parseval's theorem, it is easy to show that the average squared distance from the correlation planes to their mean is the same as that defined by Equation (39) in the frequency domain [22].

The ASM is one possible metric for distortion since it represents the average deviation of the correlation planes from the mean correlation shape, g ¯. It is also a measure of the compactness of the class. If filter h is viewed as a linear transform, then ASM measures the distances of the training images from the class center under this transform. Minimizing ASM, therefore, leads to a compact set of correlation planes that resemble each other and exhibit the least possible variations. The distortions of the object in the input plane are represented by the training images, xi. These distortions are reflected in the output as variations in the structure and shape of the corresponding correlation planes, gi, and are quantified by ASM. If the filter successfully reduces the distortions, then distorted input images should yield similar output planes, leading to a small value of ASM. Conversely, if ASM is minimum and it is well shaped by design, then all true-class correlation planes are expected to resemble g ¯ and to exhibit well-shaped structures.

The MACH filter relaxes the correlation peak constraints and maximizes the peak intensity of the average training image. The peak intensity of the average training image is | g ¯ ( 0 , 0 ) | 2 expressed as

| g ¯ ( 0 , 0 ) | 2 = | 1 N i = 1 N h + x i | 2 = | h + m | 2 = h + m m + h
where m is the Fourier transform of the average training image expressed as a vector.

Here, it is assumed without the loss of generality that the peak occurs at the origin of the correlation plane.

The smaller the value of ASM, the more invariant the response of the filter is. In other words, if ASM is small, then all true-class correlation planes are expected to resemble g ¯. Therefore, it is required by h to produce high correlation peak with the mean image while making ASM small. In addition, it is also required to obtain some degree of noise tolerance to reduce the output noise variance (ONV). For additive input noise, ONV = h+Dh, where D is the diagonal power spectral density matrix [23]. While practical noise may be multiplicative and more complicated than implied by simple additive noise, the simple additive noise model at least provides some robustness. The performance criterion used to optimize the MACH filter may be expressed as

J ( h ) = ( Average peak height ) 2 ASM + ONV = | g ¯ ( 0 , 0 ) | 2 ASM + ONV = | h + m | 2 h + Sh + h + Dh = h + m m + h h + ( S + D ) h

The optimum solution is found by setting the derivative of J(h) in Equation (42) with respect to h to zero and is given by [22,23]

h = ( S + D ) 1 m

The filter in Equation (43) is referred to as the MACH filter because it maximizes the height of the mean correlation peak relative to the expected distortions. For cases where an estimate of D is not available, the white noise covariance matrix is substituted for D, i.e., D = σ2I, where I is a diagonal identity matrix. Hence the simplified MACH filter becomes

h = ( S + σ 2 I ) 1 m

Replacing σ2 by another constant γ, the filter equation becomes

h = ( S + γ I ) 1 m

The robustness of the MACH filter is attributed to the inclusion of the ASM criterion, which reduces the filter sensitivity to distortions, and to the removal of hard constraints on the peak. The later fact enables the correlation planes to adjust to suitable values for optimizing the performance criterion. However, MACH filter can handle those distortions that are well represented in the training set.

The Fourier domain MACH filter obtained in Equation (45) may be converted to the 2D shape or size of the input training images expressed as H(k,l). A 2D test image z(m,n) is Fourier transformed to obtain Z(k,l) which is then correlated with the 2D filter in the Fourier domain using the expression

G ( k , l ) = Z ( k , l ) H ( k , l )

The output spatial domain correlation is obtained by applying the inverse Fourier transform operation to Equation (46) and recording the intensity, given by

g ( m , n ) = | 1 [ G ( k , l ) ] | 2

8. Extended MACH (EMACH) Filter

The average training image used in the MACH filter design is good in representing the average behavior of the desired class, but it fails to capture the finer details of the desired class [24,25]. In fact, the average of training images sometimes looks like a clutter image. Thus, the MACH filter may be inadequate in discriminating the desired class from the clutter, leading to increased false alarm rate. The extended MACH (EMACH) filter is aimed at improving this clutter rejection capability.

The MACH filter is designed to maximize the intensity of the average correlation output at the origin due to training images. The average of correlation peaks is the correlation output due to the average training image. It also maximizes the similarity between the average training image correlation output and those outputs due to all training images from the desired class. Thus, the MACH filter forces all images from the desired class to follow the behavior of the average training image from that class. The MACH filter relies heavily on the mean training image. It amplifies the high-energy (usually low-frequency) components, and at the same time, attenuates the low-energy (usually high-frequency) components of the training set. Thus, by using the mean image m as the only example that represents all training images, a filter may be obtained that does not capture the finer details of the training images. Therefore, this filter may fail to discriminate the desired class from the clutter. The MACH filters possess attributes that may lead to detect clutter images as targets. One such attribute is that all training images follow the same behavior as the average training image. However, the average training image is not necessarily a good representative of the desired class.

To control the relative contribution of the desired class training images as well as their average, a new metric, called all image correlation height (AICH) [25], is introduced and defined as

AICH = 1 N i = 1 N [ h + ( x i β m ) ] 2 = 1 N i = 1 N ( h + x i β h + m ) 2
where β is a parameter that takes a value between 0 and 1 and governs the relative significance of the average training image in the filter design. By controlling β, the designed filter is prevented from being overwhelmed by the biased treatment of the low-frequency components represented by the average image. Here, the AICH must be optimized and to be able to do that, Equation (48) may be rewritten as
AICH = 1 N i = 1 N ( h + x i β h + m ) ( h + x i β h + m ) + = h + [ 1 N i = 1 N ( x i β m ) ( x i β m ) + ] h = h + C x β h
where
C x β = 1 N i = 1 N ( x i β m ) ( x i β m ) +

Thus, AICH can be described as the average of the correlation peak intensities of N exemplars where the i-th exemplar (xiβm) is the i-th training image with part of the mean subtracted. Hence, it is desirable for all images in the training set to follow these exemplars' behavior. This can be done by forcing every image in the training set xi to have a similar correlation output plane to an ideal correlation output shape f. To find the f that best matches all these exemplars' correlation output planes, its deviation from their correlation planes is minimized. This deviation can be quantified by ASE, defined as

ASE = 1 N i = 1 N ( g i f ) + ( g i f )
where,
g i = ( X i β M ) h

In Equation (52), the superscript * represents the complex conjugate operation. To find the optimum shape vector fopt, the gradient of ASE with respect to f is set to zero, yielding

f opt = 1 N g i = 1 N i = 1 N ( X i β M ) h = ( 1 β ) M h

The ASM is modified such that it measures the dissimilarity of the training images to (1 − β)Mh*. This new measure is called as the modified ASM (MASM), given by

MASM = 1 N i = 1 N [ X i h ( 1 β ) M h ] + [ X i h ( 1 β ) M h ] = h { 1 N i = 1 N [ X i ( 1 β ) M ] [ X i ( 1 β ) M ] } h = h S x β h = h + S x β h
where the superscript' represents the transpose operation and where it is considered that MASM is real in deriving the last equality in Equation (54). The diagonal matrix S x β is given by
S x β = 1 N i = 1 N [ X i ( 1 β ) M ] [ X i ( 1 β ) M ]

The ASM is a good measure for distortion tolerance; however, it lacks some discrimination capability that explains part of the MACH filter's inability to reject some clutter images. On the other hand, the MASM measure captures finer details of the training set that makes the EMACH filter more sensitive against clutter.

By maximizing the AICH and minimizing the MASM while controlling the parameter β, it is expected to explicitly keep a balance between the distortion tolerance and clutter rejection performance. Therefore, it is necessary to optimize the following new criterion

J β ( h ) = AICH h + γ Ih + h + S x β h = h + C x β h h + ( γ I + S x β ) h
where h+ γIh is the ONV term assuming an additive white noise with variance γ. The ONV helps to maintain noise tolerance when β increases, especially, at those low energy components. By maximizing the preceding criterion, the following condition is obtained for the EMACH filter
( γ I + S x β ) 1 C x β h = λ h
where λ is a scalar identical to Jβ(h). Thus, h must be an eigenvector of ( γ I + S x β ) 1 C x β with the corresponding eigenvalue λ. Since λ is identical to Jβ(h), h should be the eigenvector that corresponds to the maximum eigenvalue. The other eigenvectors corresponding to the other nonzero eigenvalues provide smaller Jβ(h) values. However, they may provide better discriminatory performance, as β is not known a priori. So the EMACH filter may be expressed as
h = Dominant eigenvector { ( γ I + S x β ) 1 C x β }

9. Distance Classifier Correlation Filter (DCCF)

This is a correlation-based distance classifier scheme for recognition and classification of multiple similar or dissimilar objects. The underlying theory uses shift-invariant filters to compute distances between the input image and ideal references under an optimum transformation. The two ideas of relaxing the constraints on the correlation values at the origin and looking at the entire correlation plane rather than just the peak value led to the development of distance classifier correlation filters DCCFs [2628]. The DCCF formulation can be used with any number of classes.

In the DCCF design, a global transformation is determined such that the transformed images from the same class are close to each other, whereas transformed images from different classes are separated from each other. This global transform leads to one correlation filter for each class. The use of these correlation filters is similar to the use of other correlation filters except in the final step. The test image is correlated with the correlation filter, and the resulting correlation peak is determined. This correlation-peak value is used to determine the distance of the test image to this class. Distances of the test image to all classes are determined and the class yielding the smallest distance is chosen. This paradigm allows for the relaxation of the correlation output constraints and the use of the entire correlation output.

It is important to realize that the use of DCCFs is similar to the use of other correlation filters. This means that the correlation peaks move by the same amount corresponding to the shift in the input, i.e., this is a shift-invariant operation. The DCCF concept has demonstrated promising performance on both infrared as well as synthetic aperture radar imagery [28]. Test results show that DCCFs outperform other correlation filters in recognizing targets while rejecting noise, clutter, and other confusing objects.

It is assumed that the training images are segmented and registered at a desired point. Fourier transform of an image x(m,n) of size ρ1 × ρ2 containing ρ = ρ1ρ2 pixels can be expressed as a ρ × 1 dimensional column vector x or as a ρ × ρ diagonal matrix X with the elements of x as its diagonal elements, i.e., diagonal {x} = X. Sometimes, the same quantity may be expressed both as a vector, say mx, and as a diagonal matrix Mx. This implies that Hmx and Mxh are equivalent.

The distance classifier uses a global transform denoted by H to separate the classes maximally while making them as compact as possible. For shift invariance, this transform matrix must be diagonal in the frequency domain. Multiplication of a vector x by a diagonal matrix H is equivalent to multiplying X(k,l) by H(k,l).

Here, a general C-class distance-classifier problem is analyzed by assuming that the peak correlation values are as different as possible for each of the classes although hard constraints are not used to enforce this. In addition, for each class, the correlation planes or their inverse Fourier transforms should be almost similar to the transformed ideal reference shape for this class. The correlation peaks are most likely at the origin for the registered training images but can occur elsewhere in the test cases depending on the location of the target.

Assume xik is the ρ-dimensional column vector containing the Fourier transform of the i-th image of the k-th class, 1 ≤ iN and 1 ≤ kC, where each class contains N training images. If mk is the mean Fourier transform of class k, then

m k = 1 N i = 1 N x i k , 1 k C

The correlation peak at the origin between the mean training image from the k-th class and the filter h is given by m k + h. Thus, the overall mean Fourier transform of the entire training set becomes

m = 1 C k = 1 C m k

The correlation peak of the overall mean image m with the filter h is given by

m + h = 1 C k = 1 C m k + h
which represents the overall average of the origin values of all correlation planes.

If the transformation h makes the in-class correlation planes similar, then the in-class peak values should be similar to each other and to their mean. Thus, to make the interclass separation between the correlation peaks large, the mean peak values of the classes are made as different as possible. Although several possible criteria might achieve this objective, the approach here is to increase the distance of all classes from the central mean. Toward this end, the following distance measure, called class separation, has been formulated

A ( h ) = 1 C k = 1 C | m k + h m + h | 2 = 1 C k = 1 C h + ( m m k ) ( m m k ) + h = h + Wh
where
W = 1 C k = 1 C ( m m k ) ( m m k ) +
is a ρ × ρ, full (i.e., non-diagonal) matrix of rank less than or equal to (C-1). The rank of W is less than or equal to (C-1), because it is obtained by the addition of C outer products of vectors (mmk), but these C vectors add up to a zero vector. If A(h) of Equation (62) is maximized, the class mean correlation peaks ( m k + h ) will differ significantly. It is also desired that the distance of transformed inputs to their average be small. This distance B(h), which measures the compactness of each class, is the same as ASM defined for each class as
ASM k = 1 N i = 1 N | g i k g ¯ k | 2 1 k C
where
g i k = X i k h g ¯ k = M k h
are the Fourier transforms of the correlation outputs due to the i-th training image xik and the average training image mk, respectively from class k. Note that Xik and Mk are diagonal matrices with xik and mk along the diagonal. The ASM is a measure of the similarity of the training images of a class to their mean and hence a measure of the compactness of the class after transformation by H. Using Equations (64) and (65), ASMk can be rewritten as
ASM k = 1 N i = 1 N ( X i k h M k h ) + ( X i k h M k h ) = h [ 1 N i = 1 N ( X i k M k ) ( X i k M k ) ] h = h + S k h
where
S k = 1 N i = 1 N ( X i k M k ) ( X i k M k )

In Equation (67), Sk is a ρ × ρ diagonal matrix where each training image contains ρ pixels. The overall ASM for C classes is defined as

ASM = B ( h ) = 1 N k = 1 C h + S k h = h + Sh
where
S = 1 C k = 1 C S k

To make the in-class metric B(h) small and to make the inter-class distance metric A(h) large, the filter h is designed to maximize the ratio

J ( h ) = A ( h ) B ( h ) = h + Wh h + Sh
with respect to h. The filter h that maximizes J(h) in Equation (70) is the eigenvector of S−1W with the largest eigenvalue [28]. Because W is a non-diagonal matrix of rank less than or equal to (C-1), finding the dominant eigenvector of S−1W requires a special algorithm when the training images, and thus the desired filter is of larger size [28]. When J(h) is maximum, the correlation shape produced by an input image is expected to be similar to the mean shape for its true class with a peak value different from the average peak value of any other class. The DCCF filter may be expressed as
h = Dominant eigenvector { S 1 W }

The DCCF is the first technique proposed for shift-invariant transform-domain distance calculations with a correlator and that are specifically designed to accommodate multiple targets at different locations in the same image. This filter deals with the entire correlation plane and not just one point at the origin. It transforms the input image into a new space in which the distance of a test input from the classes is computed. Given a test input z, the distances dk between the transformed input and the ideal shape for class k is computed as

d k = | H z H m k | 2 = | H z | 2 + | H m k | 2 2 { z + H H m k } = p + b k 2 { z + h k }

In Equation (72), p = |H*z|2 is the energy (independent of the class) of the transformed input test image z; bk = |H*mk|2 is the energy (independent of the z) of the transformed mean of class k and hk = HH*mk is viewed as the effective filter for class k. Because there are only C classes for which distances must be computed, only C such filters are required.

In general, the targets may be anywhere in the input image. For shift-invariant distance calculation the interest is in the smallest value of dk over all possible shifts of the target with respect to the class references. In Equation (72), because p and bk are both positive and independent of the position of the target, the smallest value of dk over all shifts is obtained when the third term (i.e., z+hk) is as large as possible. Therefore, this term is chosen as the peak value for full cross correlation of z and hk.

10. Polynomial DCCF (PDCCF)

Linear transformations such as the DCCF are attractive because of their optimality when the underlying statistics are Gaussian with equal covariances [29]. However, the DCCF uses transformations based on the second order statistics and does not capture higher-order statistics in images. Hence, it does not necessarily capture all of the discrimination information in some cases. Examples of such cases are encountered when signal dependent or multiplicative noises are present, and when inputs have non-Gaussian statistics. In non-Gaussian statistics cases, DCCF capabilities may be improved by applying different nonlinearities to the input image. By using nonlinear transformations, it may be possible to extract more useful information for discrimination. Thus, the classes that are not well separated in the original image space may become more separated in the nonlinearly mapped space. Also, point nonlinearities are preferred because they reduce the computational complexity because a simple nonlinearity is being applied to each point without considering all points in the neighborhood.

The polynomial DCCF (DCCF) extends the DCCF to include point nonlinear mappings of the input patterns [29]. Examples of such nonlinear mappings correspond to powers of pixels of the input images. Even though the resulting PDCCF system is not linear with respect to the input patterns, it is still linear in the kernel. This property allows frequency domain techniques for the design, analysis, and implementation of this filter. Another important property is that this system works on different powers of input image pixels, which corresponds to a multi-dimensional correlation operation and thus extends the linear DCCF classification optimization criterion to a nonlinear one, and no nonlinear optimization is involved. Moreover, the PDCCF system provides a new framework for combining different correlation filters, where each filter in the system is optimized jointly with other filters.

The PDCCF first maps the input image x(m,n) into xj(m,n) via point nonlinearities ηj, where j = 1, 2, …, n. Thus xj(m,n) is related to x(m,n) through the following relationship

η j : x ( m , n ) x j ( m , n )

In Equation (73), all xj(m,n) are assumed to have the same size as the input x(m,n). Examples of such mappings include various powers, logarithms, cosines, etc. The nonlinearly mapped input image x(m,n) is transformed by the filter hj(m,n) built using xj(m,n). The overall distance is obtained by adding the distances resulting from the shift-invariant minimum mean squared error computations between every transformed image of the input and its respective ideal transformed reference. Those ideal transformed references are computed a priori by using the nonlinear functions η1, η2, …, and ηn followed by the application of the filters h1(m,n), h2(m,n), …, and hn(m,n), respectively.

In the linear DCCF, the filtered image g(m,n) resulting from transformation of x1(m,n) by h1(m,n) can be written as

g ( m , n ) = h 1 ( m , n ) x 1 ( m , n )
where, ⊙ denotes the crosscorrelation and x1(m,n) = x(m,n). By augmenting the input image with x2(m,n), another term can be added involving the crosscorrelation of the x2(m,n) with h2(m,n) to obtain an output transformed by the two-term PDCCF, as shown below.
g ( m , n ) = h 1 ( m , n ) x 1 ( m , n ) + h 2 ( m , n ) x 2 ( m , n )

Thus, the PDCCF has more terms at its disposal with which it can achieve better discrimination. Clearly, these new nonlinear versions of inputs are completely dependent on the original inputs, and in that sense no new information is being created. However, the new representations enable the correlation filters to provide better recognition. By continuing to add more terms such as hj(m,n)⊙xj(m,n), the n-term PDCCF can be obtained as

g ( m , n ) = j = 1 n h j ( m , n ) x j ( m , n )

If the focus is on the point-wise power nonlinearities for all ηj's the nonlinear mapping of the input, xj(m,n) can be defined as

x j ( m , n ) = [ x ( m , n ) ] j
where, j ∈ ℝ∞. The power nonlinearity plays an important role in SAR images as it can enhance the bright scatters or the overall contrast [29]. The filters hj(m,n) are computed jointly, and thus the advantages of the closed form solutions and nonlinear systems are combined and exploited. In the following analysis, Ψ represents the set of all powers used to construct a particular PDCCF. Although the nonlinearity is applied in the spatial domain, in the analyses to follow (e.g., filter formation, distance calculation, etc.) all the quantities are actually in Fourier domain. For example, x is a vector obtained by lexicographical rearrangement of the Fourier transform of x(m,n). All vectors and matrices with superscript j (e.g., xj and Xj) represent these vectors and matrices (x and X) with each of their elements in the image (spatial) domain raised to the j-th power.

Assume m k j is the mean of class k of the Fourier transforms of training images resulting from raising all their pixels to the j-th power and hj is the filter built for images raised to the j-th power. Each of the Fourier transforms of the original image along with the Fourier transforms of its variations can be represented by use of a single block vector. Thus, the Fourier transform of the mean images of class k after augmentation becomes

m k = ( m k 1 m k 2 . . m k n )

Further, the filters, h1, h2, … and hn can be combined into one filter, h as follows

h = ( h 1 h 2 . . h n )

With Equations (75), (78), and (79), the correlation peak at the origin produced in response to the mean image of class k is given by

g ¯ k ( 0 , 0 ) = j = 1 n h j + m k j = h + m k

The distance between the classes, after being augmented and then transformed by the filters h1, h2, …, and hn can be expressed as

A ( h ) = 1 C k = 1 C | j = 1 n h j + m k j j = 1 n h j + m j | 2 = 1 C k = 1 C | h + m k h + m | 2 = h + Wh
where
W = 1 C k = 1 C ( m k m ) ( m k m ) +

To separate the classes as much as possible, the filter is required to produce a large A(h). Simultaneously, the compactness of the classes needs to be increased after transformation by h1, h2, …, and hn. The compactness is measured by the similarity of the training images of a class to their mean. It can be represented by the ASM of that class. In general, the ASM for class k is defined as

ASM k = 1 N i = 1 N | g i k g ¯ k | 2 1 k C
where
g i k = X i k j h j g ¯ k = M k j h j
are the Fourier transforms of the filtered images produced by the transform filters in response to the input image xj and the mean image mk, respectively. Thus, from Equations (83) and (84), ASMk can be written as
ASM k = 1 N i = 1 N | X i k 1 h 1 + X i k 2 h 2 + + X i k n h n M k 1 h 1 M k 2 h 2 ... M k 2 h n | 2 = 1 N i = 1 N [ ( X i k 1 M k 1 ) h 1 + ( X i k 2 M k 2 ) h 2 + + ( X i k n M k n ) h n ] + [ ( X i k 1 M k 1 ) h 1 + ( X i k 2 M k 2 ) h 2 + + ( X i k n M k n ) h n ] = u = 1 n v = 1 n h u + S kuv h v = ( h 1 + h 2 + ... h n + ) [ S k 11 S k 12 S k 1 n S k 21 S k 22 S k 2 n . . S k n 1 S k n 2 S knn ] ( h 1 h 2 . . h n ) = h + S k h
where
S kuv = [ 1 N i = 1 N X i k u ( X i k v ) ] M k u ( M k v )
are all diagonal matrices and
S k = [ S k 11 S k 12 S k 1 n S k 21 S k 22 S k 2 n . . S k n 1 S k n 2 S knn ]

The overall ASM for C classes is then defined as

ASM = B ( h ) = 1 N k = 1 C h + S k h = h + Sh
where
S = 1 C k = 1 C S k

The filter h that maximizes the ratio A(h)/B(h), is the dominant eigenvector of S−1W as in case of DCCF and given by

h = Dominant eigenvector { S 1 W }

Given a test input z, the distances dk between the transformed input and the ideal shape for class k is computed by using MSE-then-total approach [30]. In this approach, n distances are computed for class k. The j-th distance, djk, is defined as follows

d j k = | H j z j H j m k j | 2 1 j n

The distance djk can be rewritten as

d j k = p j + b j k 2 { ( z j ) + h j k }
where
p j = | H j z j | 2
b j k = | H j m k j | 2
h j k = H j H j m k j

The inner products, shown as the third terms in Equation (92), are between the j-th variation of the input, zj and the corresponding j-th filter, hkj. The total distance dk to a class is then found by

d k = j = 1 n d j k

The input image is assigned to the class with the least total distance.

11. Target Tracking in FLIR Imagery

In general, tracking of a moving pattern/target requires recognizing and then locating the target in a scene, finding the target motion, understanding the direction of motion of the target, and then following that target as it moves through the sequence of image frames. The detection and tracking of desired targets in a real life image corrupted by noise, clutter, illumination and other three-dimensional (3D) artifacts, poses a very complex problem and demands sophisticated solutions using pattern recognition and motion estimation methods [31,32]. Things become more complicated if there are more than one target in the scene and simultaneous multiple targets tracking is required.

Forward-looking infrared (FLIR) images are frequently used in automatic target recognition (ATR) applications. It is challenging to detect and track targets in FLIR imagery. To detect independent moving objects in FLIR image sequences, the sensor properties have to be taken into account. Additional challenges are caused due to many important differences of FLIR images with visual sequences [33,34]. Many researchers have investigated various approaches for detection, recognition, classification and pose estimation of targets from FLIR images including both matched spatial filter (MSF) based correlators and joint transform correlators [3340]. However, the application of MSFs or their variants (e.g., MACH, DCCF) for the FLIR imagery is very limited; although those have been used for the simulated and real synthetic aperture radar (SAR) and laser radar (LADAR) imagery [24,4147].

In this section, three different algorithms are demonstrated for pattern recognition and tracking based on the combination of the detection and classification filters [4851]:

  • MACH filter-based detection and DCCF-based classification (MACH-DCCF)

  • MACH filter-based detection and PDCCF-based classification (MACH-PDCCF)

  • EMACH filter-based detection and PDCCF-based classification (EMACH-PDCCF)

The detection filters are trained by the target images of expected size and orientation variations with expected size of input scene. The classification filters are formulated with the expected size of target images and trained by the target images of expected size and orientation variations. The first step of the real time system is detection, which involves correlating the input scene with all detection filters (one for each desired or expected target class) and combining the correlation outputs. In the second step, a predefined number of ROIs having the expected size of target images are selected based on the regions having higher correlation peak values in the combined correlation output. To ensure that all desired or expected targets are included in the ROIs, the number of ROIs should be at least three times higher than the number of expected targets. Classification filters are then applied to these ROIs and target types along with clutters are identified based on a distance measure and a threshold. Moving target detection and tracking are accomplished by following this technique for all incoming image frames by applying the same filters.

Multiple detection filters and classification filters are formulated for each target based on different size ranges or aspect angles. All of the filters for different ranges can be applied simultaneously through the whole range of the image sequence and decision can be made based on the output of the filter corresponding to the highest correlation peak or minimum distance. However, in this illustration, one detection filter and one classification filter is used for each class of targets for a particular range. A block diagram of the method for real time pattern recognition and tracking is shown in Figure 1 for a two-class detection system.

11.1. Image Dataset

The FLIR image database used in this research is supplied by Army Missile Command (AMCOM). This image database has a total of 50 real life infrared video sequences, some of which contain a single target in the scene and some contain multiple targets. In general, the image sequences are closing sequences, i.e., the targets become closer to the observer as the later frames appear in the scene. Thus, the size and signature of the targets change from the first frame to the last frame. Moreover, as the targets move, there are changes in the targets' orientations from one frame to the next. The database is also associated with ground truth data files containing the list of targets at each frame of each image sequence and their size and location in the frame. Among the 50 sequences, the proposed techniques have been applied for several single and multiple targets image sequences and the techniques are still being tested on other remaining sequences. For this paper, the analysis on 4 single target sequences (L1415, L2018, L2312, M1406), 2 two-target sequences (L1701, L1911) and 1 three-target sequence (L1618) have been reported.

11.2. Single-Target Image Sequences

At first, consider Sequence L2018, which has the highest difficulty level among the selected single target image sequences. In this sequence, there are a total of 448 frames, each of which contains the same target (tank1). The size of this target in the first frame is 3 × 4 pixels whereas that in the final frame is 12 × 23 pixels. For effective detection and classification, proper selection of the training images is important. It is obvious that a single detection or classification filter obtained by exploiting the training images from this large variation of sizes may lose its selectivity. So, for this particular sequence, two different detection filters (MACH or EMACH) are formed for the target to use in two ranges of the image frames depending on the size of the target.

The first range detection filters are trained using the target patches taken from the first 200 frames at an interval of 5 frames (the 1st, 5th, 10th, 15th, …, 200th frames). The ground truth data files are used to read the coordinates and sizes of the targets and then to segment out the target patches from the original frames. Each of these patches is then mean-subtracted and normalized dividing by the RMS value of the mean-subtracted patch. Thereafter, each patch is placed at the center of a 118 × 118-pixel zero padded matrix to form a full size training image for the detection filters. A sample training image of this type for Sequence L2018 is shown in Figure 2. The spatial domain representations of the MACH filter and EMACH filter for Range-1, trained by the above mentioned training images, are shown in Figure 3a,b, respectively.

The detection filters for Range-2, to use from Frame 201 to Frame 448, are trained by the target patches taken from the Frames 201 to 300 at an interval of 5 frames. The corresponding Range-2 MACH and EMACH filters are shown in Figure 4a,b, respectively.

In general, the classification filters (DCCF or PDCCF) are also required to be formulated for different ranges of the targets for better selectivity. The sizes of these filters are usually chosen as the expected size of the targets in the corresponding range of image frames. Target patches are taken as before but they are not normalized for classification filters. Each patch is placed at the center of a zero padded background having the size of the classification filter. If the patch is larger than the filter size, it is truncated at the sides. For Sequence L2018, it is found that the classification filter (either DCCF or PDCCF) of 12 × 16 pixels trained by the target patches of Frame 1 to Frame 200 at an interval of 4 frames works well for almost all frames. A 12 × 16-pixel sample training image of tank1 of Sequence L2018 for classification filter is shown in Figure 5.

It is assumed that the single target sequences are known to have only one target of interest. Hence, in this work, single class (1-class) classification filters (DCCF, PDCCF) are formulated for each particular sequence using the corresponding target patches. Although the classification filters actually need to be formulated using at least two classes of training images, in this work, 1-class filters are formulated by using slight modification in the basic formula. However, these 1-class approximated filters have been found to work well in most cases.

The training data or parameters for all the four filters (MACH, EMACH, DCCF, PDCCF) used for the three algorithms for detecting and tracking the target (tank1) in Sequence L2018 are summarized in Table 1. The first column of the table represents the name of the target along with the number of frames for which the ground truth data is available for it. The second column shows the filter names with their range indices; and the third column shows the range of the actual image frames where a particular filter is applied. The fourth and fifth columns provide the considered range of the frames and the order or interval of the frames used to take the target patch for training. In Table 1, γ is the filter parameter for MACH; β and γ are the filter parameters for EMACH and Ψ is the set of nonlinear powers used for PDCCF formulation.

Filter formulations of the three other single target sequences (L1415, L2312 and M1406) are also similar to Sequence L2018. To overcome the size variation from the initial frames to the final frames, multiple filters of multiple range bins are required in some cases to apply at different ranges. The training data for these three single target sequences are depicted in Tables 24, respectively.

For analysis, consider the first frame (Frame 1) of Sequence L2018 shown in Figure 6a, apply Fourier transform to it and then correlate with the Fourier domain filter MACH Range-1 to obtain the correlation output of Figure 6b. It is obvious that the correlation output contains false peaks and high-energy diffractions at low frequency which make the actual correlation signal negligible. For this reason, it requires 33 ROIs to include the target of interest as shown in Figure 7b.

To eliminate the strong low frequency components, a notch filter is used before applying the inverse Fourier transform operation, which actually suppresses the Fourier domain components along the axes to zero. The correlation output after applying this notch filter is shown in Figure 8a. Five ROIs in Frame 1, selected based on this correlation output, are shown in Figure 8b where the target (tank1) is included within these ROIs. This notch filter is used in the detection stage of each frame using either MACH or EMACH filter for all image sequences analyzed in this work. The detection results using EMACH filter are similar to MACH filter with or without the use of notch filter.

It is obvious that the detection and classification cannot be done correctly with detection filters alone. Because of the presence of multiple identical and different targets and clutters, the highest correlation peak or PSR value is not always guaranteed for the desired target. Therefore, classification filters are used for improved discrimination for the single target image sequences too. It is found that the 1-class classification filters work well in rejecting the clutters and backgrounds. However, to make a decision for identifying the target and rejecting the clutter the ROI having the least distance is considered as the potential target. If the distance to the ROI having the second minimum distance is not higher than the prescribed percentage of the least distance, then the ROI having the least distance is rejected as clutter or background. Since EMACH filter has improved clutter rejection capability, the application of EMACH filter in the detection stage facilitates in lowering the number of ROIs introduced into the classification stage. For this particular sequence (L2018), it is found in the simulation that 8 ROIs are required to include the target in almost all frames in the case of MACH filter, whereas 6 ROIs are sufficient with EMACH filter. However, for generality, 8 ROIs are also considered in case of EMACH filter for this sequence.

All single target sequences are tested by using the three developed algorithms: MACH-DCCF, MACH-PDCCF and EMACH-PDCCF. Some of the frames of Sequence L2018 showing the results after applying the EMACH-PDCCF algorithm are given in Figure 9. The tracking algorithm inserts a “T” mark at the locations of the detected targets (tank1) at each frame as shown in Figure 9. The number at the lower left corner of each frame indicates the frame number. In Figure 9, some sample frames are also included where the classification is incorrect or no decision is inferred. The detection and tracking results for all single target sequences are summarized for all the three algorithms in Table 5. In the threshold column of the table, a single value indicates that a single threshold is used for all ranges. From Table 5, it is observed that the poorest performance is achieved for Sequence L2018 among the four single target sequences presented. This may be attributed to the scene complexities and the drawback of using same classification filters through all the ranges of the sequence.

11.3. Two-Target Image Sequences

To assess the performance of the algorithms with two targets present in the scene, consider Sequences L1701 and L1911. The two targets in Sequence L1701 are Bradley and pickup. Out of 388 frames of this sequence, ground truth data for Bradley is available for 371 frames and that for pickup is available for 43 frames. The pickup disappears from the scene at Frame 31 and reappears at Frame 81 and again disappears at a later frame. The two targets, APC1 and tank1, in Sequence L1911 are present in all the 165 frames of the sequence. In these image sequences, the target sizes increase significantly from the first frame to the final frame. Therefore, different filters must be used for different ranges. For a particular range, two detection filters (MACH or EMACH) and a 2-class classification filter (DCCF or PDCCF) are required for a two-target sequence. Table 6 shows the training data/parameters selected for different filters for Sequence L1701 while Table 7 displays the same parameters for Sequence L1911. It may be mentioned that even if one target disappears after a few frames, both the detection filters and the 2-class classification filter are continued to apply to the remaining image frames; because in reality, it is not known whether any target is going out or coming back. This ensures the detection of a target that may reappear after a few frames.

Using the detection and classification filters having the design parameters in Tables 6 and 7, all algorithms are tested for detection, classification and tracking of the objects in Sequences L1701 and L1911. The results obtained after applying the EMACH-PDCCF algorithm on Sequence L1701 are shown for some sample frames in Figure 10. The tracking algorithm inserts “T1” for the detected Class-1 type targets (Bradley) and “TII” for the detected Class-2 type targets (pickup) at their corresponding location in the image frame. The frame number is shown at the lower left corner of each frame displayed. From Figure 10, it is obvious that the tracking algorithm can successfully detect and classify the targets when they are present in the input scene. The complete tracking results for both the sequences are summarized in Table 8 for all algorithms. In the threshold column of the table, a single value indicates that a single threshold is used for all ranges. Otherwise, different threshold values in that column displays the thresholds used for various ranges of classification filters employed. It is observed that MACH-DCCF algorithm fails for Sequence L1911 in detecting and classifying the targets, and rejecting the clutters. On the other hand, EMACH-PDCCF algorithm provides the best results considering all the factors simultaneously, such as, required number of ROIs, percentage of successful detection and the total number of false alarms.

11.4. Three-Target Image Sequences

The three target images in Sequence L1618 are APC1, M60 and truck. Out of 300 frames in this sequence, ground truth data for APC1 is available for 291 frames, ground truth data for M60 is found for 101 frames and that for truck is found for 6 frames. The truck disappears from the sequence at Frame 18 and M60 disappears from the sequence at Frame 104. Like the other sequences, the size of the targets increases significantly from the first frame to the final frame. Thus, different filters are required for different ranges of this sequence. Assuming three expected targets in the scene for any particular range, three detection filters (MACH or EMACH) and a three-class classification filter (DCCF or PDCCF) are required for every frame in each range of this sequence. The training data/parameters for different filters, chosen for Sequence L1618 are displayed in Table 9.

To evaluate the performance of all the algorithms for detection, clutter rejection, and classification as well as tracking of the three objects in three-target sequence, the designed detection and classification filters of Table 9 for different ranges are applied to Sequence L1618. The results obtained for the EMACH-PDCCF algorithm with Sequence L1618 are shown for some sample frames in Figure 11. The tracking algorithm places “T1” for the detected Class-1 type targets (APC1), “TII” for the detected Class-2 type targets (M60) and “TIII” for the detected Class-3 type targets (truck) at their corresponding locations in the frames. In Figure 11, the frame numbers are shown at the lower left corner of each frame. The complete tracking results for the sequence are summarized in Table 10 for all algorithms. In the threshold column of the table, a single value indicates that a single threshold is used for all ranges. Otherwise, different threshold values in that column displays the thresholds used for various ranges of classification filters employed.

12. Conclusions

Pattern recognition and tracking in FLIR imagery is a challenging problem due to various factors such as low resolution, low signal-to-noise ratio, different 3D orientations of the targets, effects of global motion, and close proximity with similar objects. In this paper, we reviewed the recent trends and advancements in distortion-invariant pattern recognition algorithms for single/multiple target detection and tracking in FLIR imagery using correlation filters. Each detection/tracking algorithm utilizes various properties of targets and image frames of a given FLIR sequence. Test results using real life FLIR image sequences are presented to verify the effectiveness of the filter-based pattern recognition and tracking techniques. Future work in this area would include a review of techniques beyond correlation that are particularly useful for high resolution targets. Also, development and inclusion of techniques with a dynamic update of the target model may be considered.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Schalkoff, R. Pattern Recognition, Statistical, Structural and Neural Approaches; John Wiley & Sons: New York, NY, USA, 1992. [Google Scholar]
  2. Duda, R.O.; Hart, P.E.; Stork, D.G. Pattern Classification, 2nd ed.; Wiley-Interscience: New York, NY, USA, 2000. [Google Scholar]
  3. Gonzalez, R.C.; Woods, R.E. Digital Image Processing, 2nd ed.; Prentice-Hall, Inc.: Taiwan, 2002. [Google Scholar]
  4. Brunndstrom, K.; Schenkman, B.N.; Jacobson, B. Object detection in cluttered infrared images. Opt. Eng. 2003, 42, 388–399. [Google Scholar]
  5. Strehl, A.; Aggarwal, J.K. Detecting moving objects in airborne forward looking infra-red sequences. J. Mach. Vis. Appl. 2000, 11, 267–276. [Google Scholar]
  6. Yilmaz, A.; Shafique, K.; Shah, M. Target tracking in airborne forward looking infrared imagery. Image Vis. Comput. J. 2003, 21, 623–635. [Google Scholar]
  7. Alam, M.S.; Haque, M.; Khan, J.F.; Kettani, H. Fringe-adjusted JTC based target detection and tracking in FLIR image sequence. Opt. Eng. 2004, 43, 1407–1413. [Google Scholar]
  8. VanderLugt, A. Signal detection by complex spatial filtering. IEEE Trans. Inf. Theory 1964, 10, 139–146. [Google Scholar]
  9. Casasent, D.; Furman, A. Sources of correlation degradation. Appl. Opt. 1977, 16, 1652–1661. [Google Scholar]
  10. Gianino, P.D.; Horner, J.L. Phase-only matched filtering. Appl. Opt. 1984, 23, 812–816. [Google Scholar]
  11. Mu, G.G.; Wang, X.M.; Wang, Z.Q. Amplitude-compensated matched filtering. Appl. Opt. 1988, 27, 3461–3463. [Google Scholar]
  12. Awwal, A.A.S.; Karim, M.A.; Jahan, S.R. Improved correlation discrimination using an amplitude modulated phase-only filter. Appl. Opt. 1990, 29, 233–236. [Google Scholar]
  13. Hester, C.F.; Casasent, D. Multivariant technique for multiclass pattern recognition. Appl. Opt. 1980, 19, 1758–1761. [Google Scholar]
  14. Alam, M.S.; Awwal, A.A.S. Scale invariant amplitude modulated phase-only filtering. J. Opt. Laser Technol. 2000, 32, 231–234. [Google Scholar]
  15. Alam, M.S.; Chen, X.; Karim, M.A. Distortion invariant fringe-adjusted joint transform correlator. J. Appl. Opt. 1997, 36, 7422–7427. [Google Scholar]
  16. Casasent, D. Unified synthetic discriminant function computational formulation. Appl. Opt. 1984, 23, 1620–1627. [Google Scholar]
  17. Bahri, Z.; Kumar, B.V.K.V. Generalized synthetic discriminant functions. J. Opt. Soc. Am. A 1988, 5, 562–571. [Google Scholar]
  18. Kumar, B.V.K.V. Minimum variance synthetic discriminant function. J. Opt. Soc. Am. A 1986, 3, 1579–1584. [Google Scholar]
  19. Mahalanobis, A.; Kumar, B.V.K.V.; Cassasent, D. Minimum average correlation energy filters. Appl. Opt. 1987, 26, 3633–3640. [Google Scholar]
  20. Casasent, D.; Ravichandran, G.; Bollapraggada, S. Gaussian MACE correlation filters. Appl. Opt. 1991, 30, 5176–5181. [Google Scholar]
  21. Kumar, B.V.K.V.; Mahalanobis, A.; Song, S.; Sims, S.R.F.; Epperson, J.F. Minimum squared error synthetic discriminant functions. Opt. Eng. 1992, 31, 915–922. [Google Scholar]
  22. Mahalanobis, A.; Kumar, B.V.K.V.; Sims, S.R.F.; Epperson, J. Unconstrained correlation filters. Appl. Opt. 1994, 33, 3751–3759. [Google Scholar]
  23. Mahalanobis, A.; Kumar, B.V.K.V. Optimality of the maximum average correlation height filter for detection of targets in noise. Opt. Eng. 1997, 36, 26423–2648. [Google Scholar]
  24. Alkanhal, M.; Kumar, B.V.K.V.; Mahalanobis, A. Improved clutter rejection in automatic target recognition (ATR) synthetic aperture radar (SAR) imagery using the extended maximum average correlation height (EMACH) filter. Proc. SPIE. 2000, 4053, 332–339. [Google Scholar]
  25. Alkanhal, M.; Kumar, B.V.K.V.; Mahalanobis, A. Improving the false alarm capabilities of the maximum average correlation height correlation filter. Opt. Eng. 2000, 39, 1133–1141. [Google Scholar]
  26. Mahalanobis, A.; Carlson, D.W.; Kumar, B.V.K.V.; Sims, S.R.F. Distance classifier correlation filters. Proc. SPIE 1994, 2238, 2–13. [Google Scholar]
  27. Mahalanobis, A.; Kumar, B.V.K.V.; Sims, S.R.F. Distance classifier correlation filters for distortion tolerance, discrimination and clutter rejection. Proc. SPIE 1993, 2026, 325–335. [Google Scholar]
  28. Mahalanobis, A.; Kumar, B.V.K.V.; Sims, S.R.F. Distance classifier correlation filters for multiclass target recognition. Appl. Opt. 1996, 35, 3127–3133. [Google Scholar]
  29. Alkanhal, M.; Kumar, B.V.K.V. Polynomial distance classifier correlation filter for pattern recognition. Appl. Opt. 2003, 42, 4688–4708. [Google Scholar]
  30. Muise, R.; Mahalanobis, A.; Mohapatra, R.; Li, X.; Han, D.; Mikhael, W. Constrained quadratic correlation filters for target detection. Appl. Opt. 2004, 43, 304–314. [Google Scholar]
  31. Hwang, J.; Ooi, Y.; Ozawa, S. Visual feedback control system for tracking and zooming a target. Proc. Int. Conf. Power Electron. Motion Control IEEE 1992, 2, 740–745. [Google Scholar]
  32. Lipton, A.J.; Fujiyoshi, H.; Patil, R.S. Moving target classification and tracking from real-time video. Proceedings of the Workshop of the Application of Computer Vision, Princeton, NJ, USA, 19–21 October 1998; pp. 8–14.
  33. Bal, A.; Alam, M.S. Automatic Target tracking in FLIR image sequences using intensity variation function and template modeling. IEEE Trans. Instrum. Meas. 2005, 54, 1846–1852. [Google Scholar]
  34. Dawoud, A.; Alam, M.S.; Bal, A.; Loo, C. Target tracking in infrared imagery using weighted composite reference function based decision fusion. IEEE Trans. Image Process. 2006, 15, 404–410. [Google Scholar]
  35. Alam, M.S.; Bal, A.; Horache, E.; Goh, S.F.; Loo, C.; Regula, S.; Sharma, A. Metrics for evaluating the performance of joint transform correlation based target recognition and tracking algorithms. Opt. Eng. 2005, 44, 067005. [Google Scholar]
  36. Bal, A.; Alam, M. Dynamic target tracking using fringe-adjusted joint transform correlation and template matching. Appl. Opt. 2004, 43, 4874–4881. [Google Scholar]
  37. Dawoud, A.; Alam, M.S.; Bal, A.; Loo, C. Decision fusion algorithm for target tracking in infrared imagery. Opt. Eng. 2005, 44. [Google Scholar] [CrossRef]
  38. Alam, M.S.; Khan, J.; Bal, A. Heteroassociative multiple-target tracking by fringe-adjusted joint transform correlation. Appl. Opt. 2004, 43, 358–365. [Google Scholar]
  39. Mahalanobis, A. Correlation filters for object tracking target re-acquisition and smart aimpoint selection. Proc. SPIE 1997, 3073, 25–32. [Google Scholar]
  40. Mahalanobis, A.; Muise, R. Advanced detection and correlation based automatic target detection. Proc. SPIE 2001, 4379, 466–471. [Google Scholar]
  41. Mahalanobis, A.; Carlson, D.W.; Kumar, B.V.K.V. Evaluation of MACH and DCCF correlation filters for SAR ATR using MSTAR public data base. Proc. SPIE 1998, 3370, 460–468. [Google Scholar]
  42. Mahalanobis, A.; Ortiz, L.A.; Kumar, B.V.K.V. Performance of the MACH filter and DCCF algorithms on the 10-class public release MSTAR data set. Proc. SPIE 1999, 3721, 285–289. [Google Scholar]
  43. Carlson, D.W.; Riddle, J.G. Clutter background spectral density estimation for SAR target recognition with composite correlation filters. Proc. SPIE 2003, 5106, 64–71. [Google Scholar]
  44. Perona, M.T.; Mahalanobis, A.; Zachery, K.N. LADAR automatic target recognition using correlation filters. Proc. SPIE 1999, 3718, 388–396. [Google Scholar]
  45. Perona, M.T.; Mahalanobis, A. System-level evaluation of LADAR ATR using correlation filters. Proc. SPIE 2000, 4050, 69–75. [Google Scholar]
  46. Nevel, A.V.; Mahalanobis, A. Comparative study of maximum average correlation height filter variants using ladar imagery. Opt. Eng. 2003, 42, 541–550. [Google Scholar]
  47. Sims, S.R.F.; Mahalanobis, A. Performance evaluation of quadratic correlation filters for target detection and discrimination in infrared imagery. Opt. Eng. 2004, 43, 1705–1711. [Google Scholar]
  48. Bhuiyan, S.; Alam, M.S.; Alkanhal, M. A new two-stage correlation based approach for target detection and tracking in FLIR imagery using EMACH and PDCCF filters. Opt. Eng. 2007, 46, 086401. [Google Scholar]
  49. Islam, M.F.; Alam, M.S. Improved clutter rejection in automatic target recognition and tracking using eigen-extended maximum average correlation height (EEMACH) filter and polynomial distance classifier correlation filter (PDCCF). Proc. SPIE 2006, 6245, 62450B. [Google Scholar]
  50. Bhuiyan, S.M.A.; Alam, M.S.; Sims, S.R.F. Target detection, classification and tracking using MACH and PDCCF filter combination. Opt. Eng. 2006, 45, 116401. [Google Scholar]
  51. Bhuiyan, S.; Khan, J.F.; Alam, M.S. Power enhanced extended maximum average correlation height filter for target detection, to appear. Proceedings of the IEEE SoutheastCon, Lexington, Kentucky, KY, USA, 13–16 March 2014.
Figure 1. Schematic diagram of the proposed technique for (a) detection stage; and (b) classification stage.
Figure 1. Schematic diagram of the proposed technique for (a) detection stage; and (b) classification stage.
Sensors 14 13437f1 1024
Figure 2. (a) A full size (118 × 118-pixel) training image for detection filters created from the target patch of Frame 100 of L2018; and (b) 3D mesh plot of (a).
Figure 2. (a) A full size (118 × 118-pixel) training image for detection filters created from the target patch of Frame 100 of L2018; and (b) 3D mesh plot of (a).
Sensors 14 13437f2 1024
Figure 3. Spatial domain representation of Range-1 detection filters for the target (tank1) of Sequence L2018 (a) MACH filter; and (b) EMACH filter.
Figure 3. Spatial domain representation of Range-1 detection filters for the target (tank1) of Sequence L2018 (a) MACH filter; and (b) EMACH filter.
Sensors 14 13437f3 1024
Figure 4. Spatial domain representation of Range-2 detection filters for the target (tank1) of Sequence L2018 (a) MACH filter; and (b) EMACH filter.
Figure 4. Spatial domain representation of Range-2 detection filters for the target (tank1) of Sequence L2018 (a) MACH filter; and (b) EMACH filter.
Sensors 14 13437f4 1024
Figure 5. (a) A 12×16-pixel training image for classification filters created from the target patch of Frame 100 of L2018; and (b) 3D mesh plot of (a).
Figure 5. (a) A 12×16-pixel training image for classification filters created from the target patch of Frame 100 of L2018; and (b) 3D mesh plot of (a).
Sensors 14 13437f5 1024
Figure 6. (a) Frame 1 of Sequence L2018; and (b) correlation output with the filter maximum average correlation height (MACH) Range-1.
Figure 6. (a) Frame 1 of Sequence L2018; and (b) correlation output with the filter maximum average correlation height (MACH) Range-1.
Sensors 14 13437f6 1024
Figure 7. (a) 32 ROIs in Frame 1 of Sequence L2018; and (b) 33 ROIs in Frame 1 of Sequence L2018.
Figure 7. (a) 32 ROIs in Frame 1 of Sequence L2018; and (b) 33 ROIs in Frame 1 of Sequence L2018.
Sensors 14 13437f7 1024
Figure 8. (a) Correlation output for Frame 1 of Sequence L2018 using MACH Range-1 filter along with notch filter; and (b) 5 ROIs in Frame 1.
Figure 8. (a) Correlation output for Frame 1 of Sequence L2018 using MACH Range-1 filter along with notch filter; and (b) 5 ROIs in Frame 1.
Sensors 14 13437f8 1024
Figure 9. Target detection and classification results of Sequence L2018.
Figure 9. Target detection and classification results of Sequence L2018.
Sensors 14 13437f9 1024
Figure 10. Target detection and classification results of Sequence L1701.
Figure 10. Target detection and classification results of Sequence L1701.
Sensors 14 13437f10a 1024Sensors 14 13437f10b 1024
Figure 11. Target detection and classification results of Sequence L1618.
Figure 11. Target detection and classification results of Sequence L1618.
Sensors 14 13437f11 1024
Table 1. Training data for Sequence L2018.
Table 1. Training data for Sequence L2018.
Target (No. of Frames)FilterWorking Frame RangeTraining FramesFilter SizeβγΨ

Range TakenInterval Taken
tank1 (448)MACH Range-11–2001–2005118 × 118-0.1-
MACH Range-2201–448201–3005118 × 118-1.0-
EMACH Range-11–2001–2005118 × 1180.20.1-
EMACH Range-2201–448201–3005118 × 1180.11.0-

-DCCF1–4481–200412 × 16---
-PDCCF1–4481–200412 × 16--1.0, 1.5, 2.0
Table 2. Training data for Sequence L1415.
Table 2. Training data for Sequence L1415.
Target (No. of Frames)FilterWorking Frame RangeTraining framesFilter SizeβγΨ

Range TakenInterval Taken
Mantruck (281)MACH1–2811–1005118 × 118-1-
EMACH1–2811–1005118 × 1180.11-

-DCCF1–2811–100512 × 16---
-PDCCF1–2811–10056 × 8--1.0, 1.5, 2.0
Table 3. Training data for Sequence L2312.
Table 3. Training data for Sequence L2312.
Target (No. of Frames)FilterWorking Frame RangeTraining FramesFilter SizeβγΨ

Range TakenInterval Taken
APC1 (368)MACH1–3681–1005118 × 118-1-
EMACH1–3681–30010118 × 1180.11-

-DCCF1–3681–3001012 × 16---
-PDCCF1–3681–300108 × 12--1.0, 1.5, 2.0
Table 4. Training data for Sequence M1406.
Table 4. Training data for Sequence M1406.
Target (No. of Frames)FilterWorking Frame RangeTraining FramesFilter SizeβγΨ

Range TakenInterval Taken
Bradley (380)MACH1–3801–1005118 × 118-1-
EMACH1–3801–1005118 × 1180.11-

-DCCF1–3801–3001014 × 16---
-PDCCF Range-11–2001–10058 × 10--1.0, 1.5, 2.0
-PDCCF Range-2201–380201–30058 × 10--1.0, 1.5, 2.0
Table 5. Tracking results of single-target sequences.
Table 5. Tracking results of single-target sequences.
Seq. NameTotal FramesNo. of ROIs TakenThresholdTarget NameNo. of Frames Target PresentNo. of Frames Detected CorrectlyTotal No. of False AlarmsPercentage of Successful Detection
MACH-DCCF Algorithm

L141528141.00mantruck2812810100
L201844880.99tank14483573680
L231236841.00APC1368366299
M140638041.00Bradley3803800100

MACH-PDCCF Algorithm

L141528141.00mantruck281263994
L201844880.99tank14483713783
L231236841.00APC13683680100
M140638041.00Bradley3803532793

EMACH-PDCCF Algorithm

L141528141.00mantruck281274198
L201844881.00tank14483866086
L231236841.00APC13683680100
M140638041.00Bradley3803542693
Table 6. Training data for Sequence L1701.
Table 6. Training data for Sequence L1701.
Target (No. of Frames)FilterWorking Frame RangeTraining FramesFilter SizeβγΨ

Range TakenInterval Taken
Bradley (371)MACH Range-11–1001–1005118 × 118-1.0-
MACH Range-2101–200101–2005118 × 118-1.0-
MACH Range-3201–300201–3005118 × 118-0.1-
MACH Range-4301–388301–3705118 × 1180.1-
EMACH Range-11–1001–1005118 × 1180.11.0-
EMACH Range-2101–200101–2005118 × 1180.11.0-
EMACH Range-3201–300201–3005118 × 1180.10.1-
EMACH Range-4301–388301–3705118 × 1180.10.1-

Pickup (43)MACH Range-11–3881–433118 × 1181.0-
EMACH Range-11–3881–433118 × 1180.11.0-

-DCCF Range-11–1001–10056 × 8---
-DCCF Range-2101–200201–30058 × 12---
-DCCF Range-3201–300201–300510 × 18---
-DCCF Range-4301–388301–370510 × 18---
-PDCCF Range-11–1001–10058 × 8--1.0, 1.5, 2.0
-PDCCF Range-2101–200101–20058 × 12--1.0, 1.5, 2.0
-PDCCF Range-3201–300201–300510 × 18--1.0, 1.5, 2.0
-PDCCF Range-4301–388301–370510 × 18--1.0, 1.5, 2.0
Table 7. Training data for Sequence L1911.
Table 7. Training data for Sequence L1911.
Target (No. of Frames)FilterWorking Frame RangeTraining FramesFilter SizeβγΨ

Range TakenInterval Taken
APC1 (165)MACH Range-11:1001:1005118 × 118-0.1-
MACH Range-2101:165101:1505118 × 118-0.1-
EMACH Range-11:1001:1005118 × 1180.10.1-
EMACH Range-2101:165101:1505118 × 1180.10.1-

tank1 (165)MACH Range-11:1001:1005118 × 118-0.1-
MACH Range-21:165101:1505118 × 118-0.1-
EMACH Range-11:1001:1005118 × 1180.10.1-
EMACH Range-21:165101:1505118 × 1180.10.1-

-DCCF Range-11:1001:10058 × 16---
-DCCF Range-2101:130101:13058 × 16---
-DCCF Range-3131:165131:160516 × 36---
-PDCCF Range-11:1301:10058 × 16--1.0, 1.5, 2.0
-PDCCF Range-21:1301:10058 × 16--1.0, 1.5, 2.0
-PDCCF Range-3131:165131:160516 × 36--1.0, 1.5, 2.0
Table 8. Tracking results of two-target sequences.
Table 8. Tracking results of two-target sequences.
Seq. NameTotal FramesNo. of ROIs TakenThresholdTarget NameNo. of Frames Target PresentNo. of Frames Detected CorrectlyTotal No. of False AlarmsPercentage of Successful Detection
MACH-DCCF Algorithm

L170138860.70, 0.30, 0.05, 0.04Bradley38823024859
pickup4526258

L191116560.40APC1---Fails
Tank1---Fails

MACH-PDCCF Algorithm

L170138860.40, 0.05, 0.06, 0.06Bradley3883691295
pickup4528162

L191116560.40APC1165152092
tank11651653100

EMACH-PDCCF Algorithm

L170138840.40, 0.05, 0.06, 0.06Bradley388369995
pickup4531869

L191116540.40APC1165160097
tank11651655100
Table 9. Training data for Sequence L1618.
Table 9. Training data for Sequence L1618.
Target (No. of Frames)FilterWorking Frame RangeTraining FramesFilter SizeβγΨ

Range TakenInterval Taken
APC1 (291)MACH Range-11–1001–1005118 × 118-1.0-
MACH Range-2101–200101–2005118 × 118-1.0-
MACH Range-3201–300201–2905118 × 118-1.0-
EMACH Range-11–1001–1005118 × 1180.11.0-
EMACH Range-2101–200101–2005118 × 1180.11.0-
EMACH Range-3201–300201–2905118 × 1180.11.0-

M60 (101)MACH Range-11–3001–1005118 × 118-1.0-
EMACH Range-11–3001–1005118 × 1180.11.0-

Truck (6)MACH Range-11–3001–61118 × 118-1.0-
EMACH Range-11–3881–61118 × 1180.11.0

-DCCF Range-11–1001–10056 × 12---
-DCCF Range-2101–200201–30058 × 16---
-DCCF Range-3201–300201–29058 × 16---
-PDCCF Range-11–1001–10056 × 12--1.0, 1.5, 2.0
-PDCCF Range-2101–200101–20058 × 16--1.0, 1.5, 2.0
-PDCCF Range-3201–300201–29058 × 16--1.0, 1.5, 2.0
Table 10. Tracking results of three-target sequences.
Table 10. Tracking results of three-target sequences.
Seq. NameTotal FramesNo. of ROIs TakenThresholdTarget NameNo. of Frames Target PresentNo. of Frames Detected CorrectlyTotal No. of False AlarmsPercentage of Successful Detection
MACH-DCCF Algorithm

L161830060.30, 0.52, 0.52APC13002735491

M6010371069

truck17000

MACH-PDCCF Algorithm

L161830060.52APC13002972099

M6010399096

truck1712071

EMACH-PDCCF Algorithm

L161830040.52APC13002972399

M6010399096

truck1712071

Share and Cite

MDPI and ACS Style

Alam, M.S.; Bhuiyan, S.M.A. Trends in Correlation-Based Pattern Recognition and Tracking in Forward-Looking Infrared Imagery. Sensors 2014, 14, 13437-13475. https://doi.org/10.3390/s140813437

AMA Style

Alam MS, Bhuiyan SMA. Trends in Correlation-Based Pattern Recognition and Tracking in Forward-Looking Infrared Imagery. Sensors. 2014; 14(8):13437-13475. https://doi.org/10.3390/s140813437

Chicago/Turabian Style

Alam, Mohammad S., and Sharif M. A. Bhuiyan. 2014. "Trends in Correlation-Based Pattern Recognition and Tracking in Forward-Looking Infrared Imagery" Sensors 14, no. 8: 13437-13475. https://doi.org/10.3390/s140813437

Article Metrics

Back to TopTop