1. Introduction
Synthetic aperture radar (SAR) are widely applied in various civil and military fields, such as aerial remote sensing to detect targets [
1], environmental monitoring [
2], and maritime surveillance [
3]. Automatic target-recognition systems using SAR sensors continue to be developed for a number of applications, particularly in the area of military defense. The goal of these systems is to detect and classify military targets using various image- and signal-processing techniques. The conventional architecture of target-recognition systems consists of three separate stages: the pre-screener identifies local regions of interest using a constant false alarm rate (CFAR) detector, allowing all targets and numerous false alarms to pass. It is followed by a discriminator that aims to eliminate all natural false alarms. Finally, the classifier receives all man-made objects and attempts to categorize each input image as a specific target type contained in the training set [
4]. General reviews of automatic target-recognition concepts and the SAR target-detection technologies can be found in [
4,
5]. This paper focuses on the final classification stage of the SAR automatic target-classification system. Target images obtained by SAR are significantly different with target optical images because microwave imaging is based on a scattering mechanism. Usually, SAR images are not as intuitive and fine as optical images. Moreover, the characteristics of SAR target images are very sensitive to azimuth and elevation angles [
4]. If these angles change significantly, the characteristics of the SAR target images also change significantly, which is challenging for their classification.
There are three mainstream paradigms governing SAR target images classification: template-based methods [
5], model-based methods [
6], and machine–learning based methods [
7]. The template-based methods compare the test sample image with training sample images to determine the target category. These methods require a large number of target image templates and expensive computation. Moreover, background clutter also creates greater interference in template matching. For the model-based method, target images are described using a complex scattering model based on a scattering mechanism, and the target type is determined according to the likelihood estimation of the model parameters. However, since these complex scattering models are non-linear, it is difficult to obtain accurate estimation of the parameters. This method is easily affected by clutter interference and noise corruption. In addition, performance of the method at extended operating conditions will significantly decrease because of the obvious difference between training samples and test samples.
Recently, machine-learning based methods have attracted apparent interest for their use in SAR target images classification. One of the most popular methods is representation learning [
8,
9]. The most typical representation learning method is sparse representation. The sparse representation of one signal is based on the following theory: the observed signal can be represented by a linear combination of a series of known signals called atoms. Under sparse constraints (
-norm minimum) of the representative coefficient vector, a unique representative coefficient solution can be obtained, and the target type can be identified according to the minimum reconstruction error for each class. Sparse representation has been widely applied in a variety of recognition tasks, such as face recognition [
10,
11], speech recognition [
12], and hyperspectral image classification [
13]. For SAR target images classification, some researchers have investigated a number of approaches based on sparse representation. Zhang et al. proposed a method of multi-view joint sparse representation by exploring the correlation between target multi-view images [
14]. Combining low-rank matrix recovery, Cheng et al. proposed an improved joint sparse representation approach to classify SAR target images [
15]. Dong et al. investigated a method based on joint sparse representation with monogenic features, and developed the method on a Grassman manifold [
16,
17]. Furthermore, Liu et al. studied the Dempster–Shafer fusion of multiple sparse representations for target images recognition [
18]. Sun et al. proposed a SAR target-recognition method combined with dynamic sparse representation and dictionary learning [
19]. Song et al. reported an approach of supervised discriminant dictionary learning and sparse representation for SAR target recognition [
20]. Lately, Liu et al. developed a new scatter center feature-extraction and target-recognition method based on sparse representation and a refinement dictionary [
21].
Another alternative method of representation learning is collaborative representation [
22]. Collaborative representation can be interpreted as an optimization problem under
-norm minimum constraints. It has also been widely used in face recognition, hyperspectral classification, etc. [
23]. Compared to sparse representation, collaborative representation can greatly reduce the computational cost while maintaining high classification accuracy [
23].
In recent years, the combination of multi-task learning and representation learning has become an important trend in pattern recognition. The idea behind multi-task learning is that when these tasks are similar enough or correlative to a certain extent, exploiting the correlation between them is beneficial for improving the generalization of recognition. Some research results have confirmed the advantages of multi-task learning [
24]. Yan et al. extracted various types of features from images, and viewed each type’s sparse representation recognition as a task in order to model multi-feature joint sparse representation recognition as a multi-task sparse learning problem [
25]. Fang et al. proposed a face-recognition method based on local Gabor feature adaptive multi-task sparse representation [
26]. Li et al. proposed a hyperspectral image-classification method by combining multi-task learning with collaborative representation [
27]. Luo et al. investigated a manifold regularized multi-task learning (MRMTL) algorithm, which can effectively control the model complexity and ensure that the functions in the shared hypothesis space are smooth along the data manifold [
28]. Furthermore, Luo et al. proposed a novel large margin multi-modal multi-task feature extraction framework, which can not only handle correlated and noisy features but also utilize the complementarity of different modalities to reduce feature redundancy [
29]. Both multi-task sparse representation and multi-task collaborative representation can exploit shared training sample patterns between different tasks to ensure that the correct training samples are selected, excluding interferential atoms. In addition, recently some SAR target-recognition technologies have been based on deep learning. Ding et al. developed the target-recognition algorithm combined with a data enhancement convolutional neural network [
30]. Chen et al. investigated the application of deep convolution networks for SAR target recognition in detail [
31].
In this work, we propose a two stage multi-task representation learning method for SAR target images classification.
Figure 1 shows a schematic view of the approach. The approach first extracts three kinds of features for all training samples and test samples, which include principal component analysis (PCA) features, wavelet transform features, and 2D Slice Zernike Moments (2DSZM) features [
32]. The first stage is to represent each feature of the test sample as a linear combination of the corresponding features of the training set, and to determine the
neighbor samples of the test sample in the training set by using multi-task sparse representation. This is because, in principle, the current test sample and its neighboring samples should come from the same class, which means that
neighboring samples make the greatest contribution to identifying the test sample. Thus, the first step in this algorithm is to detect training samples that are far from the test sample, it being assumed that these samples have no effect on the classification decision. This is helpful for accurate classification of test samples. In fact, using part of the training samples rather than all training samples to identify a test sample can greatly reduce the interference of those training samples that are far away from the test sample. The second stage of this method is to represent the test sample with the new dictionary consisting of
neighbors in the framework of multi-task collaborative representation. Furthermore, the representation results are used to infer the test sample label. We adopt multi-task collaborative representation in the second stage because of its uncomplicated closed solution and low computation cost. The proposed method is based on the following factors: the first stage confirms a number of training samples that are most relevant to the current test sample. Since the class labels of the chosen training samples are usually a subset of all class labels, the final classification becomes a problem that determines the test sample label from a small number of candidate class labels. It would be beneficial to have an accurate inference in the second stage that the real label of the test sample was one of the training subset labels. The proposed approach not only exploits the ability of combined multiple features representation learning but also greatly reduces the interference of those irrelevant atoms in the dictionary, which leads to enhanced classification performance. Evaluation of the proposed method is conducted with the Moving and Stationary Target Acquisition and Recognition (MSTAR) benchmark data sets. Experimental results validate the effectiveness and superiority of the proposed approach.
The paper is organized as follows. In
Section 2, we briefly describe the three types of feature-extraction methods used in the work, and review basic sparse representation and collaborative representation. In
Section 3, a two-stage multi-task representation learning algorithm is developed in detail. In
Section 4, experiments are carried out with the MSTAR database, and the performance of the proposed approach is described. Finally, we conclude the paper in
Section 5.
3. Two-Stage Multi-Task Representation Learning
For image-classification tasks, taking into account the shared patterns among various tasks is beneficial for improving the generalization performance of classification. Numerous studies have confirmed the superior performance of the multi-task learning framework in theory and practice. For example, multi-task sparse representation has been applied in face recognition, and multi-task collaborative representation has been used in hyperspectral image classification. In this work, the classification by each type of feature can be considered as an individual task. Multi-feature classification constitutes a multi-task model, in which multiple tasks should share feature subsets of the training set for a test sample.
Both multi-task sparse representation and multi-task collaborative representation utilize all training samples for classification. However, many studies show that the local subset rather than all of the training samples mainly contributes to classification. That is, many training samples play the role of interference. For instance, it has been illustrated that local PCA is superior to global PCA in classification [
38]. Vural proposed the use of local dependencies of samples to conduct the classification [
39], while Xu et al. investigated two-phase sparse representation with a local subset of training samples [
40].
In this section, we propose a two-stage multi-task representation learning (TSMRL) algorithm for SAR target images classification. The first stage of the algorithm is to represent each feature of the test sample as a linear combination of the corresponding features of all training samples. The -norm regularized multi-task sparse learning is adopted to determine the nearest neighbor training samples for the test sample. Using local training samples instead of all training samples to identify a test sample can greatly reduce the interference of those training samples far away from the test sample. The new dictionary is constructed by the nearest neighbor training samples. In the second stage of the algorithm, multi-task collaborative representation is used for classification with the new dictionary, leading to the final decision of the test sample. In this section, we will describe the proposed two-stage multi-task representation learning method in detail.
3.1. The First Stage: -Norm Regularized Multi-Task Sparse Representation Learning
Firstly, it is assumed that the test sample features and the training samples features approximately satisfy the following equation:
where
here is the
k-th pattern features of the test sample, and
is the representation coefficient. Equation (12) can be rewritten in the following matrix form:
According to the sparse representation principle, each pattern feature of the test sample should share features from the same training samples. Taking into account errors caused by noise, we can achieve the following
-norm regularized multi-task sparse representation model:
where
is the balance parameter,
denotes the Frobenius norm of a matrix, and
denotes the
-norm,
.
The regularization term in Equation (14) is chosen as the
-norm regularization term of
. This is because, for all kinds of features from the same test sample, the location of non-zero coefficients in the corresponding sparse coefficient vectors should be similar. Additionally, the corresponding coefficient values of these shared atoms are different due to differences between feature types. Under this assumption, non-zero coefficients of the representation coefficient matrix should be in the same row. A regularized
-norm can be imposed on
to select a small number of non-zero rows. The optimal problem (14) can be obtainedby the accelerated proximal gradient algorithm [
41].
As is known, each column of is a corresponding representation coefficient vector of each feature. For representing the test sample, each training sample contributes to classification differently. The contribution of a training sample can be estimated by the corresponding representation coefficient value. A large coefficient value means that the training sample makes a large contribution to the representation. Since there are three representation coefficient vectors, we adopt the following method to obtain a local training subset comprising nearest neighbor samples. First, the atoms with the largest coefficient values are selected in each coefficient vector, leading to three groups. Due to the -norm regularized multi-task sparse constraints, most of the atoms come from the same training samples. However, there are still a few atoms from different training samples. A new subset will be formed by three groups. We get a subset of atoms after mixing all three groups and merging the same atoms. We assume that is the k-th pattern feature vector of the j-th atom in the subset, and let . According to the ascending order of , we select the first atoms from to form the final nearest neighbor local subset. For each type of feature, the final nearest neighbor atoms are reserved, while other atoms are set to zero vectors. In this way, the dictionary is updated to a new dictionary . Thus, the interference caused by irrelevant atoms can be greatly reduced, which can improve correct judgement.
3.2. The Second Stage: Multi-Task Collaborative Representation Learning
The second stage of the TSMRL algorithm is to represent the test sample based on the multi-task collaborative representation with the new dictionary
, as follows:
where
is the balance parameter.
is the collaborative representation coefficient matrix, and
. The optimal problem of multi-task collaborative representation has analytical solution as follows [
27]:
We employ multi-task collaborative representation because of its simplicity in computation and its accurate classification ability.
Then, the class label of
is predicted to the class with the lowest total reconstruction error accumulated over all tasks, i.e.,
In summary, the steps of TSMRL are shown in Algorithm 1.
Algorithm 1 Two-stage multi-task representation learning for SAR target images classification |
Input: |
All training samples |
: All test samples |
Output: the identity of Steps: - 1)
Extract three types of features from respectively; let , , and . - 2)
Select a test sample from , and three types of features of are extracted; let , , and . - 3)
Using -norm regularized multi-task sparse representation, represent {,,} with the dictionaries {,,}; obtain the representation matrix . Then, obtain the local subset comprising nearest neighbors, and construct new dictionaries {,,}. - 4)
Using multi-task collaborative representation, represent {,,} with the new dictionaries {,,}. Then, obtain the representative matrix .
|
- 5)
Decide the label of the test sample based on the criterion of the total minimum reconstruction error of multi-task collaborative representation.
|
- 6)
If all testing samples are classified, go to step 7). Otherwise, return to step 1).
|
- 7)
End
|