1. Introduction
Synthetic Aperture Radar (SAR) has the advantages of working in all types of weather, all day, having a long observation distance, and having a small actual aperture. Compared with optical sensors, the weather will affect the monitoring effect when there are clouds and fog, while SAR imaging is not interfered with by extreme weather. SAR is increasingly used in both civilian and military applications.
In the field of marine monitoring and marine rescue, ship target detection based on SAR images is undoubtedly a very meaningful topic, which is not only conducive to the maintenance of maritime rights but also provides information support for the management and supervision of marine ships. With the further development of SAR technology, the resolution of SAR imaging has gradually improved. Ship target detection in SAR images has developed from point target detection to distributed target detection. This requires a higher performance in ship detection. However, existing problems also make ship detection in SAR images challenging. The echo signal is the vector sum of the echoes of many ideal target signals, resulting in random fluctuations in the scattering echo intensity of the target based on the scattering coefficient, and finally, the speckle noise will be manifested in SAR images. Moreover, the azimuth defocus of the moving ship on the high-resolution SAR image will seriously affect the recognition of the ship target. Therefore, ship detection in SAR images is challenging and meaningful.
According to [
1], as early as the 80s of the 20th century, Lincoln Laboratory proposed a hierarchical and modular three-level processing process for SAR Automatic Target Recognition (ATR), which mainly includes three stages: detection, identification, and classification. This three-level process has a clear idea and reasonable structure. It has become a general process widely used in SAR ATR systems. The first stage is called object detection or prescreening. The purpose is to extract small suspected areas that may contain targets of interest from the large scene in SAR images and eliminate areas that do not contain targets, but this stage will produce a large number of clutter false alarms. The second level is called target identification, which is essentially a dichotomous problem (distinguishing between target class and clutter class); it is a post-processing stage after the detection stage, the purpose of which is to retain the real target while removing the natural clutter false alarm and some artificial clutter false alarms and obtain the target Region of Interest (ROI). The third stage is called target classification/recognition, and through more complex processing such as feature extraction, feature selection, and classifier calculation of the ROI region, the artificial target false alarm is further eliminated, and the target category, model, and other information are finally obtained.
The constant false alarm rate (CFAR) is the most widely used method in SAR target detection. The CFAR algorithm requires that the target has a strong contrast relative to the background. The threshold is calculated by fitting the hypothetical distribution and setting a constant false alarm rate. Since the distribution of targets and clutters must overlap, false alarms and missed detections are inevitable. By setting a small false alarm rate and using the threshold to separate the targets from the clutters, a fairly good detection result can be obtained.
Typical CFAR algorithms include the two-parameter CFAR, global CFAR, and two-stage CFAR. The two-parameter CFAR means that the CFAR detector fits a certain distribution by determining the shape and scale parameters of the K distribution [
2] or the mean and standard deviation of the Gaussian distribution [
3]. Once the hypothetical distribution has been determined, the threshold calculation equation can be deduced. The global CFAR is an algorithm for quickly extracting suspicious targets, which detects targets through a global threshold instead of setting sliding windows for local statistics and calculations [
4]. Combining the advantages of the local CFAR and global CFAR, a two-stage CFAR is proposed. The first stage uses the global CFAR to coarsely filter the ROI from the SAR image. The fine local CFAR is used to achieve fine detection [
5]. Typical CFAR detectors include Cell-Averaging CFAR (CA-CFAR) detectors [
6], Greatest of CFAR (GO-CFAR) detectors [
7], Smallest of CFAR (SO-CFAR) detectors [
7], and order-statistic CFAR (OS CFAR) detectors [
8].
Traditional machine learning differs from deep learning. Machine learning utilizes statistical methods, linear algebra, and optimization algorithms to classify or predict unknown data. Features, once manually designed and extracted, can be used to train models with machine learning. Classic machine learning classifiers include random forests [
9], support vector machines (SVMs) [
10], naive Bayes classifiers [
11], logistic regression [
12], etc.
Compared to machine learning, deep learning can automatically extract and learn features from data. Common deep learning models include Convolutional Neural Networks (CNNs) [
13] and Recurrent Neural Networks (RNNs) [
14]. In terms of object detection, deep learning networks include YOLO [
15], Fast R-CNN [
16], Cascade R-CNN [
17], and others. Although optical image object detection has achieved significant success, there are still many challenges in SAR image object detection. For example, the speckle noise and defocusing specific to SAR images greatly interfere with object detection.
Following the procedure of SAR ATR proposed by Lincoln Laboratory, our work raises a steady CFAR algorithm to achieve the location of targets at the first stage, utilizes ACM to obtain a finer ROI to extract more accurate features, and modifies GBDT to achieve a better classification performance at the second stage. Since this paper aims at detection rather than the classification of target types, the third stage is not within our consideration. The procedure of our work is shown in
Figure 1.
Our work makes contributions in the following areas:
The proposal of a steady CFAR algorithm, which has a steady performance with the influence of the bright targets rejected;
The design and extraction of four useful features and the use of ACM to refine the ROI;
The proposal of a knowledge-oriented GBDT classifier, which generates the base learner based on certain prior criteria.
The other sections of this paper are arranged as follows:
Section 2 introduces the related work about our approach;
Section 3 reveals the methodology and its principle;
Section 4 shows the setting of the experiments and the results; and
Section 5 concludes the whole work and looks into future work.
3. Methodology
In contrast to the algorithm we proposed in the previous work [
36], we follow the basic module design and processing procedure but change some details in the novel algorithm. Moreover, a knowledge-oriented GBDT structure is proposed, aiming at solving the specific problem and complying with prior knowledge. The adjusted algorithm flow chart is shown in
Figure 3.
In short, after the input of the SAR images, they are fed into a steady CFAR detector, which eliminates the land at first and extracts the targets after obtaining the sea region in the image. Then we utilize the feature extraction module to extract four reliable features: rectangularity, length–width ratio, area, and contrast. Finally, the four features are trained and tested by the knowledge-oriented GBDT, and detection results are shown.
3.1. A Steady CFAR Detector
A CFAR detector is used to discover ships and other targets from the image. First, a mean filter is applied to the image. In the mean filtering, each pixel is assigned the mean of the pixels in its neighborhood. In most cases, each sea clutter pixel is assigned the mean of the sea clutter pixels in its neighborhood. According to the central limit theorem, the value of this sea clutter pixel should be a Gaussian random variable whatever distribution it obeys originally. So, the probability density function of the sea clutter can be modeled as
where
x is grayscale,
is the mean of the Gaussian distribution, and
is the standard deviation of the Gaussian distribution. The mean is estimated as the peak position of the histogram, and the standard deviation
can be estimated by
where
i is grayscale,
I is the maximum grayscale, and
is the histogram. Here, we use the peak and the left part of the histogram to estimate the parameters of the Gaussian distribution to remove the effect of land, ships, and other targets (
Figure 4). The false alarm rate is written as
where T is the detection threshold. The substitution of (
1) into (
3) yields
For a given
,
T is calculated according to (
4). Typically,
is chosen as
.
After the land extraction, we can generate a sea mask. We can eliminate the land by filtering the raw image with the sea mask. Then, we detect the targets in the target-extraction stage with the images without land. The morphological processing in the land-elimination stage follows the same procedure as mentioned in [
36]. By the combination of erosion, dilation, hole filling, and logic reverse selection, the land connected to the edge of the images is eliminated. The similar operation helps to extract the targets.
ACM is worth mentioning. We utilize the snake model to refine the masks of regions and set a high smooth factor to reject the sidelobe clutter. Moreover, the initial mask of ACM is of great importance, so we take the open operation to the initial ROI with a small disk template to weaken the sidelobe. The disk template size is 5 in our experiments. The template is chosen as a disk for smoothness. In addition, the initial image slice is filtered by a mean filter with the same size as conducted to the whole image. The trend to either contract or expand is set to be more likely to expand as the initial mask is smaller than the original one for the open operation. However, the masks of tenuous clutters and lax bright targets are usually processed to be full zero matrices under the open operation for their intensity distribution or their shape. Then, we recover the masks to the original ones to keep the basic shape information. The ACM data-processing pipeline is shown in
Figure 5.
3.2. Feature Extraction
After the fine processing of the ACM pipeline, four reliable hand-crafted features can be extracted from the refined targets. Three features are shape features, while the other one is grayscale features. The area feature of the targets includes information about the centralized range of the strong scattering points, representing the size of the targets. We can obtain the first feature as
where
is the target mask area. As for the grayscale feature, we keep the contrast feature in our previous work [
36]. We calculate it by dividing the standard deviation of the target by its mean as
where
is the standard deviation of the target area and
is the mean of the target area. A larger contrast reflects stronger fluctuation on the target, which indicates that the target is more likely to be a ship.
The other shape features are obtained by the rectangle fitting for the regions. In detail, we obtain the orientation of the ellipse with the same second-order moment as the region at first. Then, we rotate the major axis in the horizontal direction. In the next step, we calculate the standard deviation of the target and normalize it with its area. The grayscale-weighted second-order moment calculation has better robustness. The weight of the target refers to the sum of its grayscale. A typical result is shown in
Figure 6.
The length–width ratio of the fitted rectangle can be obtained by
where
refers to the root of the second-order central moment along the x-axis and
refers to the root of the second-order central moment along the y-axis. The length
L and the width
W can be described as
The centroid of the rectangle is determined by the centroid of the final mask. We define rectangularity as the intersection area of the target mask and the fitted rectangle divided by the geometric mean of the two region areas as
where
refers to the intersection area,
refers to the rectangle area, and
refers to the target area.
The appropriate of ships ranges from 2.5 to 7.5, while the of a ship can be close to one as sometimes the parts of ships are detected. Besides, the closer the is to 1, the more likely the target is a ship.
3.3. Knowledge-Oriented GBDT
As mentioned above, GBDT is a serial computation structure with each tree fitting the residual between the previous model prediction result and the true label. Different strategies are introduced to prevent overfitting, such as subsamples, dropout, and a small learning rate. We reconsider the generation criterion of the regression tree and set a customized attribute split order to make the regression tree structure closer to the ideal structure decided by prior knowledge.
According to the intensity of the targets, the targets are divided into strong and weak targets. According to the area of the targets, the targets are divided into large and small targets. Combining the two classifications, we can obtain four basic target types. As this GBDT is knowledge-oriented, the first two levels of the regression trees are meaningful. For example, after the area split at the first level and the contrast split at the second level, the weak and small targets are distributed to a node of the four nodes. The detection of weak and small targets is usually a difficult problem, and we can focus on solving problems in this field through this structure.
The structure of the novel regression tree is as follows. On the first level, we allocate the targets to the large target node and the small target node by splitting the parent node according to the area. Then, on the second level, we continue splitting the nodes into the strong target node and the weak target node. Furthermore, to prevent the contrast threshold from drifting, we set a fixed contrast threshold
determined by knowledge. The contrast threshold is 0.79 in our experiments, which minimizes the contrast classification error on the training set. After the generation of the two levels, a combination of the
and
is utilized to generate the deeper levels. The loss function is the Mean Square Error (MSE), which represents the purity of the split. In detail, it is the sum of the MSE of the left node and the MSE of the right node. The MSE loss function is shown as
The negative gradient error updated to the fitting truth of the next iteration can be expressed as
where
y is the fitting truth of the previous iteration and
is the current prediction value. By applying this loss function, we can obtain more precise prediction results by adding the prediction value at each corresponding node in each tree, which reduces the residual after iteration.
Based on the novel regression tree, we keep the Gradient Boosting (GB) method unchanged to combine the regression trees with an additive model. The results of the experiments and ablation study show the effectiveness of each part in the modified GBDT. The novel regression tree model is shown in
Figure 7. Our novel tree simulates the judgment of the human brain in ship detection and has strong interpretability compared with neural networks. The human brain usually judges a target in a specific procedure. First, the area of the target will be considered. Second, the grayscale fluctuations relative to the background will be regarded as an important feature. Then, the basic shape features such as rectangularity are utilized to confirm whether the target is a ship or not.
4. Experiments and Results
4.1. Data Preparation
For the convenience of conducting the CFAR algorithm, we select the AIR-SARShip-1.0 dataset of [
37] as our data source. Note that some images are rejected for their imaging quality and pollution level by noise. We split the indices into indices of the training set and indices of the testing set. The indices of the training and testing set are shown in
Table 1. To construct a complete feature space, an appropriate split is required. Therefore, we finally chose the split in
Table 1 to train our modified GBDT.
As for the label preparation, we transform the bounding box information in the annotation files to the labels of the targets by checking if the target is in the bounding box. Our classification is binary. The label is 1 if the target is a ship, and the label is 0 if the target is not a ship. Moreover, the raw images with 16-bit depth are visualized by clipping them with three times their grayscale mean, and the detection results and some intermediate results will be shown on the visualized data.
4.2. Parameter Settings
Parameters for the steady CFAR detector are set as follows. As for the land-elimination stage of the steady CFAR detector, the mean filter size is 11 × 11, and the morphological filter size is 61 × 61. At the target-extraction stage, the mean filter size is 7 × 7 and the morphological filter size is 3 × 3. The hyperparameters for GBDT are discussed next.
4.3. Analysis of the Influence of Some Hyperparameters on the Model Performance
We choose the maximum depth, the number of base learners, the subsample rate, and the learning rate to be analyzed. All final accuracy results are obtained by calculating the mean of five experiments.
The influence of the maximum depth is shown in
Table 2. As we can see, when the maximum depth is five, the model has the best performance. If the depth is too shallow, the classification ability of the model is lacking. If the depth is too deep, the model overfits. The number of base learners is 25, the subsample rate is 0.5, and the learning rate is 0.1.
The influence of the number of base learners is shown in
Table 3. As it indicates, an appropriate number of base learners helps to improve the accuracy of the model. The maximum depth is five, and other settings remain unchanged. We can infer from
Table 3 that a sudden jump in accuracy from 0.793 to 0.905 occurs when the number of base learners changes from six to seven. It proves the necessity of a sufficient number of base learners, and the accuracy improves with a step effect at a certain number of base learners. With the increase in base learner number, the accuracy converges to an upper limit.
The influence of the subsample rate is shown in
Table 4. As it indicates, the performance of the proposed GBDT is best when the subsample rate is 0.1. The subsample helps to prevent the model from overfitting, and the best subsample rate is relatively lower than that of the previous GBDT. It is inferred that the design of the first two levels and the contrast threshold determined by prior knowledge help the model to fit better on less data. Reasonably less data can relieve the overfitting. The superiority of the novel structure will be discussed in the next section. The number of base learners is 15, and other settings remain unchanged.
The influence of the learning rate is shown in
Table 5. As we can see, the performance is best when the learning rate is 0.1. A lower learning rate needs more base learners and a higher learning rate leads to overfitting. The subsample rate is 0.1, and the other settings remain unchanged.
Globally, there exists a trade-off in the selection of all the four factors. By conducting contrast experiments, we determine the best setting. We set the maximum depth to five, and the number of base learners is 15. Additionally, we set the subsample rate to 0.1, and the learning rate is 0.1. The following experiments are carried out based on this setting.
4.4. Contrast Experiments with the Original GBDT
We conducted contrast experiments with the original GBDT. The results show a stronger classification ability of our modified GBDT than the original one. An ablation study shows the necessity of our improvement. The accuracy is also obtained by calculating the mean of five experiments. The results are shown in
Table 6.
As we can see from
Table 6, the classification accuracy gradually improves as the components are added. The baseline is a decision tree for binary classification. After introducing the gradient boosting method to implement ensemble learning, the residual gradually reduces and is fitted in each iteration. Considering the brain-like judgment process, we utilize a custom split order to split the nodes with obvious meaning at the first two levels, which achieves a better performance. To prevent the contrast threshold at the second level from drifting, we take a prior contrast threshold. The final result shows the improvement by adding this component. The significance of our improvements is verified and the knowledge-oriented GBDT achieves satisfactory results on the present issue. Aided by prior knowledge, we restrict the generation way of the novel GBDT classifier by useful standards, and the threshold calculation is needless at the second level of the regression, which accelerates the training process of our GBDT.
4.5. Contrast Experiments with the Advanced Techniques
Many scholars focus on the application of deep learning in the ship-detection field. Even if the data are limited and noises exist in SAR images, deep learning is powerful and full of possibilities. Our approach combines a classical machine learning method with a steady CFAR detector. The advantage of our work is high reliability and low training costs (compared to deep learning methods). Taking the training of the three deep learning models in the contrast experiments as an example, it takes several hours to achieve a good performance. However, the GBDT training of four low-dimensional features takes several seconds in this experiment. In addition, the training process of deep networks in large scenes is expensive, but our method is less sensitive to the size of the images. In offshore ship detection, the detection accuracy of our method is competitive with the advanced techniques.
The advanced techniques to be compared include YOLOv3 [
21], Faster R-CNN [
16], and soft teacher [
26]. Among them, YOLOv3 and Faster R-CNN are supervised object-detection methods. Soft teacher is a practical semi-supervised object-detection method to overcome the problem that annotations are expensive and badly needed. In soft teacher, some images are annotated, while others are without labels. For the fairness of comparison, the proportion of data with the annotations is set to 0.1. The hyperparameters of all three methods are adjusted to obtain a better performance. SAR images are cropped into 512 × 512 slices according to the location of the annotations. Then, we conduct inference on the test images with big scenes. The optimizer is Stochastic Gradient Descent (SGD). As for the unsupervised method, the steady CFAR in our work can be regarded as a representative.
To compare the four methods quantitatively, we develop the following indices as
where
represents the true positive samples and
represents the false positive samples.
represents the true negative samples. Precision is related to false alarms and recall is related to missed detection.
represents the F1 score, which is a harmonic mean of precision and recall. It is widely used in binary classification. The closer the three indices are to one, the better the detection performance is. Additionally, we calculate the three indices in all scenes and offshore scenes to verify the excellence of the proposed method.
The detection results of the four methods are presented in
Figure 9. The result shows the effectiveness of our work, especially in the detection of offshore ships. In
Figure 9, we achieved 100% precision and recall in this offshore scene. However, the inshore ships are ignored as it is considered a part of the land in the land-elimination stage of the steady CFAR detector. Therefore, our method works better in the offshore scenes. As we can judge, YOLOv3 and Faster R-CNN achieve better accuracy in inshore ship detection. There are some land false alarms in the detection result of Faster R-CNN. Soft teacher has a similar detection performance to our method in the offshore scene. As we can see in
Figure 10, in the offshore scene, the sidelobe of the ship target is rejected by the ACM data-processing pipeline. Tenuous clutter is successfully classified into non-ship targets. The target that is similar to a ship on the bottom left of the image is identified as a non-ship target. It fully illustrates the powerful classification ability of our modified GBDT and the representativeness and accuracy of the data acquisition. Moreover, if the ship is on the edge of the image, it will also be eliminated by the morphological filter in the steady CFAR detector. A typical case is shown in
Figure 11. As we can see, the ship on the bottom left of the image has very strong sidelobe interference, which leads to its disappearance in the sea mask. It is believed that with the movement of the imaging platform, the missed ship will be detected. The ships are regarded as the land in the calculation of inshore detection performance.
After obtaining the visualized data of the four methods, the three indices are calculated. The benchmark is shown in
Table 7. M1 to M4 represent YOLOv3, Faster R-CNN, soft teacher, and our method, respectively.
As shown in
Table 7, our method performs well, even in all scenes. For a few false alarms and a high detection rate, the F1 score of our method is high in inshore scenes. Although the method is free of neural networks and ignores the inshore ships, it is competitive with the deep learning methods. Among the deep learning methods, soft teacher has the highest F1 score in all scenes, and YOLOv3 has the highest F1 score in offshore scenes. However, YOLOv3 and soft teacher generate some land false alarms in some scenes. Our method aims at rejecting the land false alarms by eliminating the land and rejecting the sea false alarms by training a powerful and robust classifier. The merit of our approach is its reliability and pertinence.
5. Conclusions
To conclude, we achieve a ship-detection approach combining the advantages of a steady CFAR detector and the knowledge-oriented GBDT. The CFAR detector is utilized to locate the latent targets and eliminate most land false alarms. The novelty of the steady CFAR is its method of modeling the clutter and the well-designed processing pipeline.
The knowledge-oriented GBDT is modified by the brain-like process of judging prior knowledge. The targets are divided into large and small targets according to their area. Then, they are divided into strong and weak targets according to their intensity. The contrast threshold is fixed to prevent the fitting of the data from losing prior knowledge. It is proved by the experiments that our improvement achieved remarkable results, and the whole method has a competitive performance with the deep learning methods.
Future work should include adding the inshore ship-detection module, the classification of the ships, and improving the stability and reliability of the whole procedure. The inshore ship-detection module is beneficial to the detection of all the ships on the scene. In addition, the further classification of the detected ships is beneficial to complete the whole SAR ATR process. A stabler classifier and more accurate feature extraction help to improve the robustness of the detector.