1. Introduction
Synthetic aperture radar (SAR), as a microwave imaging sensor, offers extensive applications in marine monitoring, ocean development, terrain classification, and disaster prevention [
1] and control due to its technical advantages of all-day, all-weather, and long-range capabilities. Ship detection is a crucial aspect of SAR maritime applications [
2], which is of great significance to the monitoring of water transport in coastal areas, port management, and ensuring maritime safety and security in coastal areas [
3].
The traditional SAR image ship detection methods primarily rely on manual feature extraction, which typically involves stages such as land–sea segmentation, image preprocessing, and target pre-screening. These methods mainly encompass constant false alarm rate (CFAR)-based methods [
4], visual-saliency-based methods [
5], polarization-decomposition-based methods [
6], transform-domain-based methods [
7], and global-threshold-based methods [
8]. Among these, the CFAR method, which is based on the statistical distribution of sea clutter, is the most widely used. This method detects ship targets by statistically modeling the sea clutter, setting an adaptive threshold, and comparing the pixel’s gray value to be detected with this threshold. However, the CFAR algorithm is less versatile and requires re-modeling the distribution of sea clutter under different conditions. In practical scenarios, the distribution of sea clutter is influenced by numerous factors, such as wind and waves, making it challenging to accurately model. Consequently, the CFAR algorithm struggles to deliver optimal detection results in in-shore complex environments. Thus, the traditional SAR ship detection methods can no longer meet the current detection requirements.
In recent years, with the advancement of deep learning technology, target detection algorithms based on convolutional neural networks (CNNs) [
9] have developed rapidly and have been applied to SAR image ship detection. Currently, the SAR image ship detection methods are primarily divided into two categories: horizontal-bounding-box-based and oriented-bounding-box-based methods. In horizontal-bounding-box-based SAR ship detection, Ke et al. [
10] replaced some traditional rigid convolutional kernels in Faster R-CNN [
11] with deformable convolutional kernels that can adaptively learn the extra 2D offset of the original kernels to better simulate the shapes of ships, thus improving the detection performance. Wang et al. [
12] proposed an improved Faster R-CNN port SAR ship detection method based on the MSER decision criterion, which replaces the threshold decision criterion of Faster R-CNN with the Maximum Stability Extreme Regions (MSER) method to re-evaluate the generated region proposals with higher scores, effectively reducing the false alarm rate and improving the detection accuracy. Zhu et al. [
13] introduced an improved residual module and deformable convolution in the feature extraction network to enhance the feature extraction capability and redesigned the anchor frame regression method to improve the target localization accuracy. Hu et al. [
14] designed an anchor-frame-free detection algorithm to balance local and nonlocal attention mechanisms, better utilizing contextual information to extract the semantic information of the image and improve the multi-scale ship detection capabilities. Li et al. [
15] proposed an attention-guided balanced feature pyramid network, which mitigates the effects of complex background clutter and noise on ship detection, enhancing the detection performance of multi-scale ships. Cui et al. [
16] introduced the spatial shuffle-group enhance (SSE) module in CenterNet, which extracts stronger semantic features and simultaneously suppresses the partial noise to reduce the false alarms caused by offshore and inland interference. Bai et al. [
17] proposed an anchorless frame detection network based on feature balancing and united attention, which aggregates and balances the semantic information at different levels of the feature pyramid using a global-context-guided feature balance pyramid (GC-FBP). Zhou et al. [
18] proposed a step-by-step feature refinement backbone and pyramid network, which sequentially refines the position and silhouette of ships through a step-by-step spatial information decoupling function to reduce the multi-scale high-level semantic loss of the neighboring feature layers and improve the ship detection performance.
Most of the aforementioned methods employ horizontal bounding boxes (HBBs) for ship detection. However, the ship targets in SAR images exhibit significant directional differences, and using the traditional horizontal bounding boxes to detect inclined ship targets with large aspect ratios introduces more background clutter. In areas such as harbors where ships are densely distributed, the HBBs may overlap with other ship targets around the primary target, resulting in lower detection accuracy. In contrast, oriented bounding box (OBB)-based detectors can provide more precise localization and orientation information for ship targets and are more suitable for ship detection in SAR images, garnering significant attention. Guo et al. [
19] used the upper-left offsets of two horizontal anchor points and an inclination factor to directly infer the coordinates of the four OBB vertices and designed a feature-adaptive module to enhance the target’s feature information. For instance, An et al. [
20] proposed the DRBox-v2 algorithm, which employs a multilayer anchor box generation strategy and an improved anchor box coding method for ship target detection in SAR images, yielding better detection results. Yang et al. [
21] approached the issue from the perspective of feature matching, decoupling the feature optimization processes of different tasks to alleviate the conflicts between the learning objectives and proposed an improved rotating frame RetinaNet detection algorithm to address the mismatch between the ship targets and algorithm features. Wang et al. [
22] combined the multi-scale contextual semantic information fusion (MCSIF) module and scattering points information learning (SPIL) module, proposing a two-stage network that incorporates ship scattering information learning for enhanced detection robustness. Chen et al. [
23] designed a nonlocal attention module with a feature-oriented alignment module, solving the drawbacks of feature misalignment in the cascade optimization scheme and balancing the quality of the bounding box prediction with the speed of single-stage algorithms. Xu et al. [
24] introduced an attention-weighted feature pyramid network to achieve high-quality semantic interaction and soft feature selection among the ship features of different resolutions and scales and designed a Triangle Distance IoU Loss to generate more accurate bounding boxes while accelerating the model convergence.
Although the aforementioned SAR ship target detection methods have achieved high detection accuracy, they often come with high network complexity, improving the detection accuracy at the expense of technical complexity, which poses challenges for subsequent application and deployment. In response, some scholars have researched lightweight ship detection models. Chen et al. [
25] proposed a dense connection method, integrating the outputs of three different scales through upsampling and cascading operations, fully merging low-resolution features with high-resolution features, and combining network pruning [
26] and knowledge distillation operations to construct a high-precision small-scale SAR real-time ship detector. Guo et al. [
27] combined depthwise separable convolutions and Mobilenet and proposed the depthwise adaptive spatial feature fusion (DSASFF) module, achieving a lightweight, fast, and accurate SAR ship target detection algorithm. Liu et al. [
28] introduced the Kullback–Leibler Divergence (KLD) loss function and BRA attention mechanism, enhancing the detection accuracy of small ship targets, and designed the lightweight P-ELAN structure by adjusting the width and depth of the model, reducing the network parameters and saving computational resources.
Although these methods partially address the issue of the high computational complexity of the models, several problems remain. Firstly, detecting ship targets in complex backgrounds is still a challenge. Due to the imaging mechanism of SAR, a certain amount of speckle noise is generated, making it difficult to extract the ship feature information, thereby increasing the difficulty of distinguishing ship targets from nearshore buildings and other interferences. Secondly, the multi-scale differences in the ship target sizes in different images, influenced by the different imaging resolutions of the various SAR working modes and the volume sizes of the ships themselves, increase the detection difficulty.
In summary, addressing the challenge of arbitrary-direction ship target detection with a low computational cost while enhancing the multi-scale detection capability is urgent. To this end, we propose a lightweight anchor-free method for arbitrary-direction ship detection in SAR images, named LSR-Det. Firstly, we introduce a lightweight backbone network (LCGNet) based on contour and spatial information guidance. This network is mainly constructed using the contour-guided feature aggregation module (CGFAM) and the lightweight feature extraction module (LFEM), which efficiently extract ship features from SAR images. Secondly, to improve the ship target feature fusion performance, we design a lightweight adaptive feature pyramid network (LAFPN). By incorporating the adaptive ship feature fusion module (ASFM) into different feature layers of the feature pyramid network (FPN), the network can adaptively learn the subtle features in ship images and suppress the background clutter noise. Finally, we propose a lightweight rotation detection head (LRDHead) network that employs a shared convolutional parameters strategy to address the imbalance in the number of samples from ships of different scales, reducing the computational costs while enhancing the network’s ability to detect multi-scale ship targets in SAR images.
The main contributions of this paper are as follows:
A lightweight contour-guided backbone network (LCGNet) is designed. The backbone network constructed using the contour-guided feature aggregation module (CGFAM) and lightweight feature extraction module (LFEM) can perform the extraction of SAR image ship features more efficiently and at the same time provide lower computational complexity.
The lightweight adaptive feature pyramid network (LAFPN) is designed. It improves the model’s ability to perceive ship position information and can achieve the multi-scale feature-adaptive fusion of ship features in SAR images with lower parameters.
The lightweight rotation detection head (LRDHead) network is designed. The design of shared convolutional parameters can reduce the number of parameters and computational volume of the detection head network and at the same time enhance the multi-scale detection capability of the model.
The experimental results on the SAR ship detection dataset (SSDD) and rotated ship detection dataset in SAR images (RSDD-SAR) show that our proposed module is effective, and, compared with the other arbitrary-direction target detectors, LSR-Det obtains higher AP50 and F1 scores, while both the parameters and computation are lower than for the other arbitrary-direction target detectors.
The rest of the paper is organized as follows.
Section 2 describes the proposed methodology in detail.
Section 3 provides and analyzes the experimental results.
Section 4 discusses the impact of the three proposed modules. Finally,
Section 5 concludes the paper.