1. Introduction
Synthetic aperture radar (SAR) is a microwave remote sensing observation system which can obtain high-resolution images due to its ability of working all day and under all weather conditions. Thus, it plays an irreplaceable and important role in both the military and civilian fields [
1,
2]. As important fields of SAR applications, target detection and target classification have developed slowly due to the problems of hardware equipment and manual feature extraction [
3,
4,
5]. In 2006, Hinton et al. proposed the concept of deep learning, noting that the multi-layer convolutional neural network (CNN) has the ability of automatic feature extraction, which is of great research significance for target classification [
6]. In 2012, Krizhevesky et al. proposed the first CNN model used for image classification, with its top-five error ratio only being 17.0%, making deep learning popular in the field of image classification [
7,
8]. Chen et al. [
9] used an unsupervised sparse self-encoder in place of typical back-propagation in 2014, so as to apply CNN to SAR image recognition. Since then, more attention has been paid to radar image processing based on CNN [
10,
11]. After CNN began being widely used in the field of image processing, scholars started to carry out considerable research on whether CNN can show comparable performance in the field of target detection. At present, CNN-based target detection frameworks can be mainly divided into two types: one-stage frameworks [
12,
13,
14] and two-stage frameworks [
15,
16,
17,
18]. The first two-stage framework, called a regional convolutional neural network (RCNN), was introduced by Girshick et al. in 2014 [
15]. Then, the improved RCNN series such as Fast RCNN [
16], Faster RCNN [
17] and Mask RCNN [
18] were also proposed. Compared with two-stage frameworks, one-stage frameworks can directly output the locations and categories of different targets, which improve the efficiency of target detection. The algorithms represented by YOLO (You Only Look Once) series and single-shot multibox detectors (SSD) [
12] are widely used in target detection. In 2015, Redmon et al. [
13] proposed a YOLO algorithm based on a single neural network, which turns target detection into a regression problem and uses the entire information of images to predict the target locations and categories. Then, in 2020, Bochkovskiy et al. developed a target detection model called YOLOv4 [
14]. It integrates several advanced target detection techniques, which facilitates real-time processing and improves the target detection ability of YOLO algorithms.
When it comes to target classification, different from optical images, SAR images have both amplitude and phase information. However, traditional CNNs only use amplitude information for SAR target classification, similar to the target classification of optical images. Many research methods on how to use the phase information of SAR images have been proposed in recent years, which show better performance than amplitude-based methods [
19,
20,
21]. Zhang et al. proposed a novel classification network called complex-valued CNN (CV-CNN) in 2017 to extract the phase information of SAR images [
19]. Compared with typical CNNs that only use amplitude information, CV-CNN achieves lower classification error rate in experiments based on polarimetric SAR datasets. In 2018, Coman et al. adopted the method of amplitude–real–imaginary three-layer data to form the input of the network and achieved about 90% accuracy in the experiments based on the MSTAR dataset, which alleviated the over-fitting problem caused by the lack of training data [
20]. In 2020, Yu et al. proposed a new framework on the basis of CV-CNN, named the complex-valued fully convolutional neural network (CV-FCNN), in which the pooling layers and fully connected layers are replaced by convolutional layers to avoid complex pooling operation and over-fitting [
21]. Experiments on MSTAR demonstrated that CV-FCNN improves the classification accuracy, showing better performance over CV-CNN.
In the last decade or so, sparse signal processing technologies were widely used in SAR imaging. The limitation of conventional Shannon–Nyquist sampling theory is broken by sparse SAR imaging algorithms. Benefiting from this, the sparse SAR imaging algorithms can achieve the high-quality recovery of sparse scenes with less data, reducing the complexity of radar systems [
22,
23]. In sparse SAR imaging, typical sparse recovery algorithms, such as orthogonal matching pursuit (OMP) [
24,
25] and iterative soft thresholding (IST) [
26,
27], could only improve the image quality of observed scenes with ruined background statistical characteristics and phase information. This will lose the feature information of focused targets and further hinder the development of sparse SAR image processing. The introduction of complex approximate message passing (CAMP) [
28,
29] and a novel iterative soft thresholding algorithm (BiIST) [
30,
31] into sparse SAR imaging solves these problems. Both CAMP and BiIST-based sparse SAR imaging methods can acquire two kinds of sparse SAR images, i.e., sparse solution and non-sparse solution of the interested scenes. The sparse solution of BiIST and CAMP is similar to the results of typical sparse reconstruction algorithms. However, non-sparse solutions of CAMP can retain similar background statistical distributions to matched filtering (MF)-based images and the non-sparse solution of BiIST can retain phase information, which will offer more feature information for SAR target classification and detection, so as to theoretically improve the performance of proposed methods.
In this paper, experiments will be carried out based on the sparse SAR images recovered by the CAMP and BiIST algorithms. In the case of the CAMP-based sparse SAR imaging method, a target detection framework based on sparse SAR images is introduced. It firstly constructs the sparse SAR image dataset using the results of CAMP. Then, YOLOv4 is introduced into target detection on the basis of reconstructed datasets. When it turns to BiIST-based sparse imaging method, a classification network based on the amplitude and phase information of SAR images is used to classify targets in a sparse SAR image dataset composed of the non-sparse solution of BiIST.
The rest of this paper is organized as follows. Sparse SAR imaging methods based on complex image data are introduced in
Section 2.
Section 3 describes the models of YOLOv4 and the amplitude–real–imaginary classification network. Experimental results on the basis of the MSTAR dataset are shown in
Section 4, and performance analysis under different situations is discussed in
Section 5. Finally,
Section 6 concludes our work.
5. Experimental Analysis
Experimental results on SAR target detection can be divided into two scenarios. In SOC, the dataset shows similar performance to the MF-based dataset in the aspect of mAP, but it outperforms MF by 0.91% in the aspect of IOU. However, the dataset underperforms compared to the MF-based dataset by 2.56% mAP and 11.35% IOU. In EOC, both the and datasets present excellent performance in target detection. can obtain a 92.00% mAP and a 65.21% IOU, which is higher than the 90.80% mAP and 55.36% IOU of the MF-recovered images. The dataset achieves the best performance with a 96.32% mAP and a 70.85% IOU, which outperforms the MF-based dataset by 5.52% mAP and 15.49% IOU. The different results under SOC and EOC can be explained by the sparse estimations recovered by the CAMP algorithm highlighting the main features with lots of detailed features being ruined, and the non-sparse estimations highlight the main features while retaining detailed features simultaneously. In SOC, the difference between the testing set and training set is small, so that almost all the features learned by the network can match the features of the actual testing set. Therefore, the non-sparse estimations retain more detailed characteristics, showing better performance in target detection. However, in EOC, the difference between the training set and the testing set is huge. Some features learned from the training set may be changed or these features might not even be in testing set, so the detection performance of non-sparse estimations will be worse and the sparse estimations retaining main features will be better at target detection in this case. In summary, the non-sparse estimations of CAMP can be used for routine target detection tasks, and when it comes to more complex target detection tasks, the sparse estimations will be more suitable.
In addition, results of SAR target classification show that the amplitude–real–imaginary CNN performs better than typical CNNs based on sparse SAR datasets. Compared with the MF-based dataset, a higher classification accuracy can be achieved with sparse SAR images as the input data. Moreover, it should be noted that the accuracy of the proposed classification framework is 95.47% under SOC when the training samples decrease to 100 per class, outperforming the amplitude-based CNN with a sparse dataset and the amplitude–real–imaginary CNN with an MF-based dataset by 1.53% and 1.45%, respectively. Similar conclusions can be obtained when the number of training samples come to 800 under EOC, which shows great application potential of the proposed classification network in the situation of limited samples.