1. Introduction
Synthetic aperture radar (SAR) is capable of working every day, in all weather conditions, and all the time, to provide high resolution images, and so it plays a significant role in surveillance and battlefield reconnaissance [
1,
2]. Automatic target recognition (ATR) is the process of automatic target acquisition and classification, which is capable of recognizing targets or other objects, based on data obtained from the sensors, which has good application prospects in both military and civilian areas [
3]. The process of SAR ATR can be summarized as finding regions of interest (ROIs) in the observed SAR image and classifying the category of each ROI (e.g., T72 or BTR70) [
4]. Some earlier methods of SAR ATR can be found in [
5,
6,
7,
8,
9].
Traditional SAR ATR techniques mainly include four steps: detection, discrimination, feature extraction, and target recognition/classification [
10]. For target detection, potential ROIs are extracted from the input SAR image according to the local brightness or the shape of targets; CFAR [
11] is a classical algorithm used to detect targets against a background of noise, cluster, and conduct interference from SAR images by detecting every pixel. In the discrimination phase, the ROIs obtained from the previous step of detection are processed to remove false alarms, with the purpose of reducing classification cost. The feature extractor is specific to particular tasks in the interpretation of SAR images, which can suppress the dimension of the feature space to interpret the SAR imagery. Some researchers use a feature-based approach to deal with the problem of SAR ATR [
12,
13]. After detection and discrimination, the remaining ROIs are input into the recognition/classification stage to obtain the type of target (i.e., armored personnel carrier, howitzer, or tank). There are mainly two traditional methods, the most common one is based on template-matching methods. The second is based on classifier models, such as support vector machines (SVM) [
5] and adaptive boosting [
14]. However, traditional SAR ATR methods depend heavily on handcrafted features and have a large computational burden or poor generalization performance [
15]. The accuracy will also decrease significantly if any stage of the SAR ATR is not well designed or not suitable for the current operating conditions [
16].
Recently, deep learning (DL) algorithms have been significantly developed. Girshick proposed regions with CNN features (R-CNN) [
17] in 2014, and object detection based on deep learning began to come into favor. Subsequently, many improved algorithms based on R-CNN have been proposed, such as Fast R-CNN [
18] and Faster R-CNN [
19], which have achieved high accuracies in recognizing targets in optical images. However, these methods have been too computationally intensive for embedded systems and, even with high-end hardware, too slow for real-time applications.
For the sake of speeding up computation, some researchers proposed methods based on a single network, which predicts bounding boxes directly without region proposals. Redmon proposed You Only Look Once (YOLO) [
20], a regression-based method which directly recognizes different kinds of objects with different sizes in optical images and gives confidence ratios. However, it had a problem with inaccurate positioning. Liu proposed a Single Shot MultiBox Detector (SSD) [
21], which showed a compromise of accuracy and speed in the field of optical object detection.
Inspired by the successful application of deep learning methods in optical areas, some researchers introduced DL methods for dealing with problems in the processing of SAR images. Ref. [
22,
23] effectively extracted a high-level feature representation for SAR images by using a Deep Convolutional Neural Network (DCNN) which learned high-level features automatically, rather than requiring handcrafted features. Ref. [
24] proposed an efficient feature extraction and classification algorithm, based on a visual saliency model. Ref. [
25] proposed a target detection and discrimination method, based on a visual attention model, and the experimental results on synthetic images and the miniSAR image data set demonstrated that the proposed target detection and discrimination method coukld detect and discriminate the targets from complex background clutter with a high accuracy and fast speed for high-resolution SAR images, which provides an effective way to overcome the drawbacks in target detection and discrimination in SAR images with large, complex scenes. Ref. [
15] used CNN to recognize SAR targets, and achieved a competitive classification performance with existing methods considered to be state-of-the-art [
22]. Their work proved that deep learning methods can be used in every process of SAR ATR. However, these methods mainly just focus on one of the four steps of SAR ATR.
To date, Wang [
26] used faster R-CNN which achieved detection and recognition integration in the field of optical target detection, to realize the integration of detection and recognition in the field of SAR ATR, and obtained a system dealing with large-scene SAR images. Ref. [
27] proposed a region-based convolutional neural network to process the problem of SAR target recognition in large-scene images. However, the processing time of these systems can be further decreased.
For the sake of integrating the traditional four steps of SAR ATR as a whole system, we were encouraged by the previous works in adopting deep learning methods for target detection in optical images to the field of SAR images. By encapsulating all computation in a single deep neural network, the integration of target detection and recognition of large scene SAR images can be realized.
The proposed D-ATR system can directly recognize targets from complex background clutter with a high accuracy and fast speed in large-scene SAR images. Transfer learning and data augmentation methods, such as horizontal flip and random crop, are used in this paper, at the stage where the available SAR images are limited for training. To meet the requirement of input size of the neural network, a method of fast sliding is used to cut the large-scene SAR images into sub-images with a suitable size for the input of the neural network, and to guarantee that every target exists completely in one of the sub-images. Finally, non-maximum suppression between sub-images (NMSS) is proposed to suppress the predicted boxes among the sub-images, for more accurate recognition performance.
The organization of this paper is as follows.
Section 2 introduces the structure and components of the deep convolutional neural network.
Section 3 provides experimental results, by several experiments, to compare the performance of the proposed method. Finally, Section
Section 4 makes a conclusion of this paper and prospects for the future work.