Next Article in Journal
Advancing Sparse Vegetation Monitoring in the Arctic and Antarctic: A Review of Satellite and UAV Remote Sensing, Machine Learning, and Sensor Fusion
Previous Article in Journal
Spatiotemporal Fusion of Multi-Temporal MODIS and Landsat-8/9 Imagery for Enhanced Daily 30 m NDVI Reconstruction: A Case Study of the Shiyang River Basin Cropland (2022)
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

ORPSD: Outer Rectangular Projection-Based Representation for Oriented Ship Detection in SAR Images

State Key Laboratory of Integrated Services Networks, Xidian University, Xi’an 710071, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2025, 17(9), 1511; https://doi.org/10.3390/rs17091511
Submission received: 15 February 2025 / Revised: 10 April 2025 / Accepted: 15 April 2025 / Published: 24 April 2025

Abstract

:
Ship object detection in synthetic aperture radar (SAR) images is both an important and challenging task. Previous methods based on horizontal bounding boxes struggle to accurately locate densely packed ships oriented in arbitrary directions, due to variations in scale, aspect ratio, and orientation, thereby requiring other forms of object representation, like rotated bounding boxes (OBBs). However, most deep learning-based OBB detection methods share a single-stage paradigm to improve detection speed, often at the expense of accuracy. In this paper, we propose a simple yet effective two-stage detector dubbed ORPSD, which enjoys good accuracy and efficiency owing to two key designs. First, we design a novel encoding scheme based on outer-rectangle projection (ORP) for the OrpRPN stage, which could efficiently generate high-quality oriented proposals. Second, we propose a convex quadrilateral rectification (CQR) method to rectify distorted shape proposals into rectangles by finding the outer rectangle based on the minimum area, ensuring correct proposal orientation. Comparative experiments on the challenging public benchmarks RSSDD and RSAR demonstrate the superiority of our ORPDet over previous OBB-based detectors in terms of both detection accuracy and efficiency.

1. Introduction

Synthetic aperture radar (SAR) is an all-weather, all-time remote sensing imaging sensor widely used in various fields such as marine rescue and marine traffic monitoring [1,2,3,4,5,6]. Ship detection in SAR images is a specialized yet crucial object detection task that has attracted significant attention in recent years.
Advancements in deep learning and the availability of large SAR image datasets such as the SAR ship detection database (SSDD) [7] have led to significant development of SAR ship detection [8,9,10,11]. Compared to traditional ship detection methods like the constant false alarm rate (CFAR) [12] approach, deep learning-based methods offer more precise results due to their superior representation and automatic feature extraction abilities in complex and challenging scenes [11,13,14]. However, most of these methods rely on horizontal boundary boxes (HBBs) to locate objects, which presents challenges, as shown in Figure 1a. In SAR images, slender and densely distributed ships lead to overlapping HBBs, resulting in optimization difficulties, particularly in distinguishing objects from redundant background information in the overlapped HBBs.
To address the above challenge, researchers have turned to methods based on oriented bounding boxes (OBBs). They precisely represent the shape and orientation of ship targets, mitigating object overlap and background redundancy for enhanced precision, as shown in Figure 1b. They are broadly categorized into anchor-based and keypoint-based methods. Although key point-based methods [15,16,17] excel in fast object detection, their loss function design poses challenges. Poorly designed functions can lead to key point misalignment during regression, leading to reduced accuracy and missed detections of densely packed objects. Additionally, their complex structure hinders model generalization. In contrast, anchor-based methods incorporate anchor priors, which facilitate parameter learning and enhance detection accuracy and recall—particularly for small objects—while also demonstrating strong generalization. But some approaches [18,19] directly predict object orientation angles, leading to boundary discontinuity issues due to angle periodicity (POA) and edge exchange ability (EOE) [15,16,20], as well as mismatches between labeled and predicted regions. To address these problems, some methods [21,22,23] employ angle transformations or introduce additional modules—often complex and heavily parameterized—resulting in increased computational cost.
Most of the above methods are used for detection tasks in remote sensing. While in ship detection in SAR images, many OBB-based methods aim to enhance single-stage detectors [24,25,26,27] and anchor-free detectors [28,29,30,31] to improve detection speed and alleviate the aforementioned boundary problem. Surprisingly, there is limited research on high-performance two-stage methods for ship detection in SAR images. Two-stage detectors hold significant potential in terms of generalization and performance, yet only a few teams [32,33,34] are currently working on improving them for ship detection within SAR images. However, the aforementioned two-stage methods introduce an angle parameter to design rich prior knowledge in the form of rotated anchors. To tackle the aforementioned challenges and enhance the versatility and effectiveness of two-stage detectors in identifying ship targets in SAR images, we propose a novel outer rectangular projection-based method (dubbed ORPSD) for ship detection in SAR images. It is a two-stage detector consisting of OrpRPN and OrpRef. Specifically, in the OrpRPN phase, we design a lightweight representation called outer rectangular projection (ORP) to encode OBB with only six parameters. By providing strict boundary constraints on the predicted oriented proposals, ORP enhances detection performance on dense SAR ship objects. It mitigates the boundary discontinuity problem and greatly reduces the number of preset anchors, thereby reducing the imbalance between positive and negative samples. Furthermore, a convex quadrilateral rectification (CQR) method is devised, which identifies the outer rectangle of the minimum area of a convex quadrilateral to rectify proposals that are distorted into parallelograms. It guarantees correct orientations and removes redundant backgrounds. It achieves state-of-the-art (SOTA) detection accuracy and maintains competitive efficiency compared to one-stage and keypoint-based oriented detectors in terms of detection speed. In addition, a special loss function called KLD is employed in the second stage of the detector to alleviate boundary issues. Extensive experiments have been conducted on the challenging RSSDD [7] and RSAR datasets [35], demonstrating that the proposed ORPSD outperforms the existing methods.
In summary, this work makes contributions in three main aspects:
  • We propose a novel ORPSD model for detecting arbitrarily oriented ships in SAR images. By achieving the best performance on the challenging public RSSDD benchmark, the proposed ORPSD demonstrates its effectiveness and superiority over representative methods;
  • We design a new target representation method, ORP, which generates high-quality oriented proposals while maintaining low computational complexity;
  • We develop a CQR technique that can correct the proposals with distorted shapes, ensuring correct proposal orientations and removing redundant backgrounds.
The remainder of this paper is organized as follows. Section 2 briefly reviews the previous methods used for rotated target detection in SAR images. Section 3 presents the proposed ORPSD for ship detection in SAR images. We present the experimental results and analysis in Section 4 and conclude the paper in Section 6.

2. Related Work

2.1. Ship Detection and Algorithms Based on OBB

In general, traditional SAR ship detection methods consist of land and sea segmentation, image pre-processing, candidate region extraction, and false alarm feedback. Researchers have proposed various methods that can be categorized into threshold-based [36], significance-based [37], manual feature-based [38], and statistical modeling methods [39]. Despite their merits, these traditional methods have proven to be ineffective at dealing with challenging scenes due to their limited representation and generalization abilities.
Since the emergence of deep learning-based object detectors, plenty of methods [7,40,41] have been proposed. They can be broadly categorized into two categories, namely, anchor-free and anchor-based. Among anchor-free methods, key point-based methods are the most influential. For example, Xing et al. proposed a measure (CenterNet), which [42] detects objects by determining the center point and the length and width of the bounding box. Yi et al. [17] introduced a method named BBAVectors, which combines a point-based coding scheme with the CenterNet to represent OBB. They focused on achieving more precise detection of objects with rotated bounding boxes. On the contrary, incorporating a prior knowledge of anchors in the anchor-based method facilitates the learning of the network, enhances the recall rate, and shows significant improvement in detecting small targets. Nevertheless, generating numerous rotating anchors with angle information leads to excessive redundant calculations, greatly reducing the real-time performance of the detector. Meanwhile, the introduction of angles can easily lead to boundary discontinuity and feature dislocation. To address these issues, many novel approaches have been proposed. For example, Yang and Yan [22] and Yang et al. [20] reframed the angle regression challenge as an angle classification problem. Additionally, Han et al. designed a rotation-equivariant detector (ReDet) [43] and proposed a single-shot alignment network (S2A-Net), which employs a unified alignment module to better align features and alleviate the inconsistency between regression and classification [44].
For ship detection in SAR images, many OBB-based detection methods focus on enhancing detection speed and addressing the mentioned boundary issues. These include single-stage detectors with rich features and anchor-free detectors. To enhance the precision of target detection in SAR images, the single-stage detectors employ various strategies, including the use of multi-level anchors [24], multi-scale feature fusion and calibration [25], and a frequency attention module [26]. Nevertheless, these algorithms require manual anchor specification, posing challenges in calibration. Furthermore, imbalanced positive and negative sample distributions can lead to prolonged learning curves and suboptimal performance. And they commonly face issues related to boundary discontinuity. To address these drawbacks, some researchers have shifted their focus to developing anchor-free detection methods. For instance, Zhang et al. [29] proposed a keypoint-based deep learning method, which utilizes the proposed directional non-normalized Gaussian function to characterize the center point of ship targets, and simultaneously suppresses the imbalance of sample distribution through non-uniform weighting of loss functions at different levels. Gao et al. [30] developed a ship approach in any direction by applying ellipse encoding and dynamic keypoints. Pan et al. [31] designed a keypoint-based detector called SRT-Net. And Zhang et al. [27] proposed an oriented ship detection approach. Although two-stage detectors hold significant potential in terms of generality and performance, only a few teams are currently improving two-stage detectors for ship detection tasks in SAR images. For example, Pan et al. [33] proposed using rotation angle-related strategies in the rotated region proposal network to generate multi-angle anchor extraction candidate target regions. Subsequently, they employed a multi-layer cascaded structure with progressively increasing IoU thresholds to resample positive and negative proposals, thereby refining the OBB detection results. He et al. [34] introduced an oriented ship detector with paired branch detection heads and designed a flexible enhancement technique tailored to the specific characteristics of the SAR data. However, the above two-stage detectors introduce a large number of rotated anchors as prior knowledge to improve recall. The abundance of anchors leads to a large amount of computation and memory occupation, which reduces the real-time performance of the detection. In addition, they also encounter varying degrees of boundary discontinuity issues. Therefore, there is merit in developing an efficient two-stage detector tailored to SAR image detection.

2.2. Outer Rectangular Projection

For most existing two-stage detectors [32,33,34], the generation of oriented proposals can pose computational challenges. An early-stage approach for generating targeted proposals involves the incorporation of rotating anchor points. While enhancing recall rates, particularly in scenarios with sparse object distributions, an excessive number of anchor points can lead to significant computational and memory demands. The alignment of features in the context of rotating anchors also presents a substantial challenge. Ding et al. [45] designed a region of interest (RoI)-Transformer that specializes in learning rotated RoIs derived from horizontal ones generated by the region proposal network (RPN). But it incorporates fully connected layers and RoI alignment operations, making the network heavy and complex, and incurring expensive computational costs. Xu et al. [16] proposed a novel representation for oriented objects called the gliding vertex. It performs oriented object detection by regressing an irregular quadrilateral, learning four sets of offsets for the four vertices of the OBB relative to the HBB. However, it overlooks the inherent symmetry of rectangles. An OBB is represented by nine parameters, which is still a relatively large number of parameters involved in the regression task. The proposed ORP leverages rectangle symmetry and employs sine and cosine projections on the circumscribed rectangle, utilizing the lengths of two sets as loss measures. It represents an OBB with only six parameters, reducing both computational complexity and model parameters. Additionally, the OrpRPN network, being a lightweight fully convolutional network, incurs less computational load than a fully connected layer. The integration of ORP enables the network to learn and produce high-quality proposals from horizontal anchors.

3. Methodology

3.1. Rectification of Oriented Bounding Boxes

Existing quadrilateral detection methods like ResDet [46] and gliding vertex [16] lack correction measures for oriented bounding boxes (OBBs), leading to integration issues with subsequent alignment methods and decreased detection accuracy. Other methods also have drawbacks [28,47], such as misalignment or fitting problems. In the proposed method, we design a new CQR technique to correct the orientations of the distorted proposals and remove redundant backgrounds. The proposed ORPSD is a two-stage detector consisting of OrpRPN and OrpRef, illustrated in Figure 2. In the feature extraction section, our backbone follows the FPN [48], which produces five different scales of features { P 2 ,   P 3 ,   P 4 ,   P 5 ,   P 6 } . In the detection section, OrpRPN is primarily responsible for generating high-quality oriented proposals, while OrpRef predicts their categories and further refines their locations.

3.2. ORPSD’s Region Proposal Network (OrpRPN)

Existing OBB-based detectors often introduce an additional parameter angle θ , leading to an increased number of preset anchors and computational overhead. Moreover, it causes boundary discontinuity and a low regression rate. To address these issues, we learn from the idea of converting horizontal proposals into oriented proposals and propose a new encoding scheme called ORP. ORP encodes the length of the projection on the OBB’s outer rectangle as the representative factor for the predicted oriented proposal.
In Figure 2, OrpRPN takes five layers of features from the backbone FPN. Meanwhile, we propose the ORP representation method to encode the ground truth box ( G T ) and derive the oriented bounding box O parameterized by O = ( x ,   y ,   w ,   h ,   p w ,   p h ) . In each feature map layer, we utilize the k-means algorithm to predefine three different aspect ratios per pixel and dynamically adjust the anchors’ width and height ratios based on the size of the bounding rectangle of the oriented bounding box O, to match the actual object distribution in the dataset.
During training, in the regression branch, the actual value learned by the network is the offset δ = ( d x ,   d y ,   d w ,   d h ,   d d ,   d r ) of O relative to each anchor. In each feature map layer, a fully convolutional network, comprising a 3 × 3 convolutional layer and two 1 × 1 convolutional layers at the same level, predicts proposal categories and locations. One of the 1 × 1 convolutional layers handles the regression of proposal bounding boxes, and the predicted locations are denoted by the offsets δ = ( d x ,   d y ,   d w ,   d h ,   d d ,   d r ) relative to each anchor. During the optimization process, weights are updated using iterations of the loss function, gradually aligning predicted values with true values. Subsequently, our ORP method and anchors decode the offsets, and the proposed CQR technique corrects distorted proposals, yielding predicted oriented proposals in line with the number of anchors. Another sibling 1 × 1 convolutional layer estimates the classification scores of the oriented proposals. While OrpRPN is conceptually simple, its essence lies in the encoding and decoding representation scheme for oriented objects, where we introduce a novel and straightforward scheme: ORP.

3.2.1. Outer Rectangular Projection (ORP)

As shown in Figure 3, ORP encodes the length of the projection on the OBB’s outer rectangle as the representation factor of the predicted oriented proposal. Specifically, the OBB is represented by five parameters, denoted as O g = ( x g ,   y g ,   w g ,   h g ,   θ ) , where x g , y g , w g , h g , θ are the central coordinate, width, height, and angle of the G T based on OBB, respectively. Based on this, we calculate the horizontal outer rectangle H g :
H g = ( x o , y o , w o , h o )
x o = x g , y o = y g Δ x = w g / 2 · cos θ + h g / 2 · sin θ Δ y = w g / 2 · sin θ + h g / 2 · cos θ w o = x g + Δ x , h o = y g + Δ y ,
where x o , y o , w o , h o denote the center coordinate, width, and height of the G T ’s outer rectangle, respectively.
In the encoding part (Figure 3 and Algorithm 1), we use a set of 6-dimensional vectors δ = ( d x ,   d y ,   d w ,   d h ,   d d ,   d r ) from the regression branch of OrpRPN to represent the offset of the G T relative to every anchor, which can be calculated by adopting an affine transformation as follows:
d x = x g x a / w a , d y = y g y a / h a p h = h g · sin θ , θ [ 0 , π / 2 ] w g · cos θ , θ ( π / 2 , 0 ) p w = w g · sin θ , θ [ 0 , π / 2 ] h g · cos θ , θ ( π / 2 , 0 ) d d = p h / w g · cos θ + h g · sin θ d r = p w / w g · sin θ + h g · cos θ d w = log w g · cos θ + h g · sin θ / w a d h = log w g · sin θ + h g · cos θ / h a ,
where ( x a , y a ) , ( w a , h a ) denote the central coordinate, and the width and height of the anchor, respectively. For θ [ 0 , π / 2 ] , p h is the length of the projection from the high edge of G T to the wide edge of its outer rectangle. p w is the length of the projection from the wide edge of G T to the high edge of its outer rectangle. Conversely, for θ ( π / 2 , 0 ) , p h indicates the projection length from the wide edge of G T to the wide edge of its outer rectangle. p w is the length of the projection from the high edge of G T to the high edge of its outer rectangle. This encoding process in ORP yields classification labels (actual score) and offsets (offsets). This step is exclusive to the training phase.
Algorithm 1: ORP Encoding
Input: Ground truth boxes (GTs): ( x g ,   y g ,   w g ,   h g ,   a g ) ;
 Anchors: A = ( x a ,   y a ,   w a ,   h a )
Output: Actual offsets: δ = ( d x ,   d y ,   d w ,   d h ,   d d ,   d r )
1  Calculate outer rectangles O t = ( x o ,   y o ,   w o ,   h o ) :
2    x o = x g , y o = y g ; w o =   | w g cos a g | + | h g sin a g | ; h o =   | w g sin a g | + | h g cos a g |
3  Calculate projection lengths p w and p h :
4    p h = | h g sin θ | , θ [ 0 , π / 2 ] | w g cos θ | , θ ( π / 2 , 0 ) p w = | w g sin θ | , θ [ 0 , π / 2 ] | h g cos θ | , θ ( π / 2 , 0 )
5  Calculate actual offsets via affine transformation:
6    d x = ( x g x a ) / w a , d y = ( y g y a ) / h a ; d w = log ( w o / w a ) , d h = log ( h o / h a ) ; d d = p h / w o , d r = p w / h o
OrpRPN directly predicts the offsets δ = ( d x ,   d y ,   d w ,   d h ,   d d ,   d r ) . The decoding part is shown in Figure 4 and Algorithm 2. Combined with the preset anchors, we obtain the coordinate set of the four corners for each oriented proposal: C = ( C 1 ,   C 2 ,   C 3 ,   C 4 ) for the regression of OBBs. The decoding process can be described as follows:
c x = d x · w a + x a , c y = d y · h a + y a w = w a · e d w , h = h a · e d h p h = w / 2 d d · w , p w = h / 2 d r · h C 1 = ( c x + p h , c y h a · e d h / 2 ) C 2 = ( c x + w a · e d w / 2 , c y + p w ) C 3 = ( c x p h , c y + h a · e d h / 2 ) C 4 = ( c x w a · e d w / 2 , c y p w ) ,
where ( c x , c y ) is the central coordinate of the oriented proposals and ( w , h ) denotes the width and height of the outer horizontal rectangle of the oriented proposals. p h and p w represent the lengths of the projections from two edges of G T to the corresponding edges of its outer rectangle, respectively.
Algorithm 2: ORP Decoding
Input: Predicted offsets: δ = ( d x ,   d y ,   d w ,   d h ,   d d ,   d r ) ;
 Anchors: A = ( x a ,   y a ,   w a ,   h a )
Output: Corner points: C = ( C 1 ,   C 2 ,   C 3 ,   C 4 )
1  Calculate proposal representation O = ( c x ,   c y ,   w ,   h ,   p w ,   p h ) :
    c x = d x · w a + x a , c y = d y · h a + y a ; w = w a · e d w , h = h a · e d h ; p h = w / 2 d d · w , p w = h / 2 d r · h
2  Calculate corner coordinates:
3    C 1 = ( c x , c y ) + ( p h , h / 2 ) ; C 2 = ( c x , c y ) + ( w / 2 , p w ) ; C 3 = ( c x , c y ) + ( p h , h / 2 ) ; C 4 = ( c x , c y ) + ( w / 2 , p w )

3.2.2. OrpRPN’s Loss Function

During training, it is necessary to categorize the pre-set anchors as positive or negative samples. This is determined by the intersection over union (IoU) threshold between the anchors and the ground truths. An anchor is considered a positive sample (value of 1) in two cases (i) if the IoU between the anchor and a certain ground truth is greater than 0.7, or (ii) if the IoU is the highest among all IoUs (in all I) and the IoU value is greater than 0.3. If the IoU between an anchor and all ground truths is lower than 0.3, it is a negative sample (value of 0). In the remaining cases, anchors are considered invalid samples and are automatically discarded during the training process. It is worth noting that the ground truth box refers to the external rectangle obtained after encoding the specified directional boundary box with ORP, represented as ( x o ,   y o ,   w o ,   h o ) .
Next, we define the loss function. In the classification branch, the loss function, namely L c l s , is cross-entropy [49] loss for the binary classification tasks. It measures the difference between the actual and predicted probability distributions.
In the regression part of OrpRPN, we apply SmoothL1Loss [49] as the loss function. The specific loss function for the regression part is as follows:
L r e g = 1 N v s · p i * · i = 1 N v s j { x , y , w , h , d , r } S m o o t h L 1 d j i d j i ,
where N v s represents the total number of valid samples. p * { 0 , 1 } represents whether the anchor is a positive sample or a negative sample, with 1 indicating positive and 0 indicating negative. d i represents the offset of the i-th ground truth ( G T ) relative to the i-th anchor, represented by a 6-dimensional vector ( d x ,   d y ,   d w ,   d h ,   d d ,   d r ) . d i represents the offset of the i-th proposal relative to the i-th anchor in the regression branch of the OrpRPN.
The overall loss function L is as follows:
L = L c l s + L r e g .

3.3. ORPSD’s Refine Network (OrpRef)

OrpRef in the ORPSD model uses five layers of feature maps from the backbone and oriented proposals from OrpRPN. It first corrects distorted oriented proposals using the CQR method, then extracts rotation-invariant features through RotatedRoIAlign, and finally processes these features through RCNN to predict categories. In Figure 2, OrpRef utilizes five layers of feature maps from the backbone and oriented proposals from OrpRPN. We develop a CQR method for correcting distorted oriented proposals, followed by RotatedRoIAlign [45] to extract rotation-invariant features. These features are then processed by RCNN [49], which bifurcates into branches for category probability (foreground and background) and Kullback–Leibler divergence (KLD) loss [50] between the proposal’s OBB and the G T ’s OBB.

3.3.1. Convex Quadrilateral Rectification (CQR)

The oriented proposals produced by OrpRPN, depicted as parallelograms (blue box in Figure 5), need to be adapted for RotatedRoIAlign, requiring rectangular input shapes. To address this, we introduce a CQR method that converts convex quadrilaterals, including parallelograms, into oriented rectangles by determining the minimum-area outer rectangle. This transformation preserves the actual orientation of proposals, contributing to precise and accurate detection. The workflow of CQR is illustrated at Algorithm 3.
Specifically, as shown in Figure 5, the set of four corners of a parallelogram is denoted as V = ( v 1 , v 2 , v 3 , v 4 ) , where v 1 , v 2 , v 3 and v 4 are the coordinate vectors of its four corners, and their coordinates are v i = { ( x i , y i ) i = 1 ,   2 ,   3 ,   4 } . Here, we take the center point C = ( c x , c y ) of the parallelogram as the coordinate origin so the vector of its vertex becomes C v i = { ( x j o = x j c x , y j o = y j c y ) j = 1 ,   2 ,   3 ,   4 } . Then we obtain its four-sided vector v j v j + 1 = E j and two diagonal vectors v j v j + 2 = C v j + 2 C v j as follows:
v j v j + 1 = x j + 1 o x j o , y j + 1 o y j o j = 1 ,   2 ,   3 v j v j + 2 = x j + 2 o x j o , y j + 2 o y j o j = 1 ,   2 ,
where v j v j + 1 and v j v j + 2 are the j-th edge vector and j-th diagonal vector, respectively. Then, the length of the projection of the j-th diagonal vector on the j-th edge vector is denoted as L w j , and the larger term is selected as the length w of the rectangular bounding box, described by the following:
L w j = v j v j + 2 · v j v j + 1 / v j v j + 1 , j = ( 1 ,   2 ) w = max ( L w 1 ,   L w 2 ) ,
where * is the modulus of the vector. L w 1 and L w 2 are the lengths of the projections from the two diagonals of the predicted OBB to the vectors of its two edges.
Next, according to the formula E j · E j v = 0 , we obtain the normal vector of the j-th edge vector and calculate the length of the projection of the j-th diagonal vector on the method vector: L h j . The smaller term is chosen as the height, h, of the rectangular boundary box. This process is calculated as follows:
E j = x j + 1 o x j o , y j + 1 o y j o j = 1 ,   2 ,   3 E j v = y j + 1 o y j o , x j o x j + 1 o j = 1 ,   2 ,   3 L h j = v j v j + 2 · E j v / E j v , j = ( 1 ,   2 ) h = min ( L h 1 ,   L h 2 ) ,
where ( E j , E j v ) is the j-th edge vector of the parallelogram and its corresponding normal vector. x j o , y j o represents the coordinates of the j-th vertex. L h 1 and L h 2 are the lengths of the projection from the two diagonals of the predicted OBB to the vectors of its two edges.
Finally, the angle θ of the oriented rectangle can be calculated as follows:
θ = arccos ( E j · e x ) / ( E j · e x ) , j max ( E 1 , E 2 ) ,
where ( E j , e x ) denotes the vector of the long side of the parallelogram and the horizontal unit vector of the coordinate system. max ( E 1 ,   E 2 ) selects the vector with the longer modulus among the two adjacent edge vectors of the parallelogram as the long side vector.
Algorithm 3: Convex Quadrilateral Rectification (CQR)
Input: Corner points: C = ( C 1 ,   C 2 ,   C 3 ,   C 4 ) where C i = ( x i ,   y i ) ;
 Center points: O = ( c x ,   c y )
Output: Rectified proposal: P = ( c x ,   c y ,   w ,   h ,   θ )
1  Calculate origin-centered vectors:
2    O v j = ( x j o ,   y j o ) ;   x j o = x j c x , y j o = y j c y ( j = 1 ,   2 ,   3 ,   4 )
3  Compute edge and diagonal vectors:
4   E j = v j v j + 1 = ( x j + 1 o x j o , y j + 1 o y j o ) ( j = 1 ,   2 ,   3 ) ; v j v j + 2 = O v j + 2 O v j ( j = 1 ,   2 )
5  Find normal vectors of edges:
6    E j v E j E j v = ( y j + 1 o y j o , x j o x j + 1 o ) ( j = 1 ,   2 ,   3 )
7  Calculate width and height:
8    w = max | v 1 v 3 · v 1 v 2 | | v 1 v 2 | , | v 2 v 4 · v 2 v 3 | | v 2 v 3 | ; h = min | v 1 v 3 · E 1 v | | E 1 v | , | v 2 v 4 · E 2 v | | E 2 v |
9  Determine orientation angle:
10   θ = arccos E k · e x | E k | ; k = arg max j { 1 , 2 } | E j | where e x = ( 1 , 0 )
11  Return rectified proposal P

3.3.2. Loss Function

The oriented proposals generated by OrpRPN provide reasonable estimations for the oriented objects. Following the RotatedRoIAlign processing, the proposals are still represented by five parameters. To avoid problems associated with angle regression, we convert these parameters into a two-dimensional Gaussian distribution. Then, the KLD measure is used to calculate the loss between the Gaussian distributions of the oriented proposals and the ground truth boxes. We transform an oriented bounding box O ( x ,   y ,   w ,   h ,   θ ) into a two-dimensional Gaussian distribution, denoted by X N ( μ , Σ ) , according to the following formula:
μ = ( x , y ) T Σ 1 2 = w 2 cos 2 θ + h 2 sin 2 θ w h 2 cos θ sin θ w h 2 sin θ cos θ w 2 cos 2 θ + h 2 sin 2 θ ,
Then the KLD between X a N a ( μ a , Σ a ) and X g N g ( μ g , Σ g ) is calculated as follows:
D k l N a | | N g = 1 2 μ a μ g T Σ g 1 μ a μ g t e r m a b o u t x a a n d y a + 1 2 Tr Σ g 1 Σ a + 1 2 ln | Σ g | | Σ a | c o u p l i n g t e r m s a b o u t h a , w a a n d θ a 1 ,
where D k l represents the KLD between two Gaussian distributions. X a N a ( μ a , Σ a ) and X g N g ( μ g , Σ g ) are the Gaussian distributions of the oriented proposals and G T s , respectively. Tr(·) denotes the trace of the matrix, and ( · ) 1 denotes the inverse matrix of the matrix.
It is evident that every element in D k l ( N a N g ) consists of partial parameters that are coupled together in a chained relationship. This coupling allows the parameters to interact with each other during the optimization process, resulting in joint optimization and self-adjustment of the model [50].
The final regression loss L r e g is defined as follows:
L r e g = 1 1 τ + l n ( D k l + 1 ) , τ 1 ,
where l n ( D k l + 1 ) can transform the distance D k l into a smoother and more expressive loss function. The hyperparameter τ is employed to adjust the loss. The total loss function is as follows:
L = λ 1 N p o s n = 1 N p o s ( L r e g ( O p n , G t n ) ) + λ 2 N a n = 1 N a ( L c l s ( C p n , C t n ) ) ,
where the variables N a and N p o s are the number of all anchors and the anchors assigned to positive samples, respectively. O p n refers to the n-th predicted OBB, while G t n is the ground truth of the n-th object. C t n denotes the label of the n-th object, and C p n represents the predicted probability distribution of two classes. The hyperparameters λ 1 and λ 2 control the two losses, with a default setting of 2 , 1 . We use cross-entropy as L c l s .

4. Experiments

4.1. Experimental Settings

4.1.1. Dataset

Li et al. [7] relabeled horizontal boxes as oriented boxes based on the SSDD dataset. RSSDD is the earliest open-source SAR image dataset and has made significant contributions to the development of oriented detectors. It contains 1160 images and 2456 objects (ships). The images in RSSDD are acquired by TerraSAR-X, RadarSat-2, and Sendin-1 sensors. The image sizes range from 200 to 700, with a resolution of 1 to 15 m. We adopted a training set and test set with a ratio of 8:2, that is, 928 images for training and 232 images for testing. To assess the detector’s performance in diverse scenes, we partitioned the test set into two subsets, namely, offshore scenes, comprising 39 images, and inshore scenes, consisting of 193 images.

4.1.2. Implementation Details

All experiments were conducted on a server with an NVIDIA GeForce RTX 3090 GPU, with a batch size of 4. The proposed ORPSD utilizes ResNet-101 [51] and Swin-Tiny [52] as backbones to showcase its versatility. All images were resized to 608 × 608, with additional data augmentation such as horizontal and vertical flipping. We initialized the parameters with ImageNet’s pre-trained weights and used the AdamW optimizer [53] for training. The momentum was 0.9, and the weight decreased to 0.0001. The models were trained for 120 epochs with an initial learning rate of 1.25 × 10−4. We compare our detector with several OBB-based detectors, including an anchor-free detector—BBAVectors [17], a single-stage detector—S2ANet [44], and two two-stage detectors, namely, ROI-Transformer [45] and ReDet [43].

4.1.3. Evaluation Metrics

As for the metrics used to evaluate model performance, we used several common ones [54], including precision ( P r ), recall ( R e ), m A P , F 1 , F P S , and the precision–recall curve (PR curve). Precision ( P r ), recall ( R e ), m A P , and F 1 are several widely used metrics used to evaluate the performance of the model. For precision and recall, they can be calculated as follows:
P r = T P T P + F P
R e = T P T P + F N
where T P represents the number of correctly detected targets, F P denotes false alarm, and F N represents missing targets.
In the precision–recall (PR) curve, the recall rate ( R e ) is taken as the x-axis and the precision rate (Pd) is taken as the y-axis. The A P metric quantitatively evaluates the overall detection performance of a detector by calculating the area under the PR curve. It can measure the overall detection performance of the detector at different thresholds.
A P = 0 1 P r ( R e ) d R e .
The experiment only detects a single category (ships), and the threshold for IOU is set to 0.5. In other words, if the IOU between the predicted box and the ground truth exceeds 0.5, the target is considered successfully detected, so m A P is the A P with an IoU threshold of 0.5, namely m A P = A P 0.5 .
The F 1 score represents the overall performance of the detector under a single threshold. It is the harmonic mean of precision and recall, which can take both metrics into account. Since the F 1 score varies with different thresholds, we compare the maximum F 1 score across all thresholds. The F 1 score is defined as follows:
F 1 = 2 · P r · R e P r + R e ,
where F P S denotes how many images the model can process per second and is used to measure the detection speed of different detectors, defined as follows:
F P S = 1 / T i m e s ,
where T i m e s represents the average detection time for each image, usually measured in milliseconds (ms).

4.2. Evaluation of OrpRPN

ResNet-101 serves as the backbone for evaluating OrpRPN’s recall performance, employing an IoU threshold of 0.5 for the ground truth box. Results, detailed in Table 1, indicate a 90.03% recall at 2000 proposals, with a marginal decrease (0.8%) at 1000 proposals and a substantial decline at 300 proposals. To strike a balance between speed and accuracy, we input 1000 proposals into OrpRef during the testing period.

4.3. Comparison with Representative Methods

Table 2 shows the quantitative comparison of our method with other methods on the RSSDD dataset. And Table 3 presents the quantitative results on the RSAR dataset. As can be seen, the F 1 and m A P metrics of our ORPSD are superior to other methods in both inshore and offshore scenes. The proposed ORPSD, S2ANet, and ROI-Transformer use ResNet-101-FPN as the backbone, while ReDet uses ReR100-ReFPN. In terms of detection speed, our ORPSD (22.2 F P S ) is slightly slower than S2ANet (24.4 F P S ), but its m A P (86.21% and 90.49%) is much better. Among the two-stage detectors, our ORPSD is the fastest and the most accurate. On the contrary, the RoI-Transformer with the slowest detection speed (18.2 F P S ) learns rotation RoI by extracting horizontal RoI from RPN [45]. Nevertheless, it involves complex fully connected layers and RoI alignment operations in the learning process of rotation RoI, making the network heavy and complex, which leads to slower detection speed. S2ANet has the fastest detection speed (24.4 F P S ), but it has the lowest m A P (75.21% and 80.02%) because its feature alignment network is more suitable for pixel-rich optical remote sensing images [44]. However, the SAR images in the RSSDD have relatively uniform pixels, so this algorithm is not very suitable for this task. Furthermore, we evaluate ORPSD using the Transformer-based backbone, namely Swin-Tiny [52] (Swin-T). Although running slightly slower (19.2 F P S ), it obtains a better m A P (87.56% and 90.51%), indicating that ORPSD is compatible with other backbone networks. Our method has significant advantages over S2ANet and ReDet in the offshore scenes. The differences in F 1 and m A P between our method and BBAVectors and RoI-Transformer are not significant, resulting from the inherent simplicity and reduced clutter in coastal scenes. Consequently, detectors are prone to achieving comparable performance in coastal scenes, narrowing the gap between different methods. In addition, we compare our method we HBB-based methods in Table 4 and our method present the best results, which demonstrate the effectiveness of our method and OBB-based method. In summary, our proposed method outperforms others in detection results by addressing boundary discontinuity issues, incorporating KLD loss, and leveraging the Swin-T backbone network for enhanced guidance in network training.
Figure 6 compares the precision–recall(PR) curves of various methods in inshore and offshore scenes. As shown in Figure 5a, our method outperforms the others, regardless of whether the KLD loss function is employed or not. This demonstrates the effectiveness of our method in detecting inshore ships. The PR curve shows improvement when the KLD loss function is utilized instead of SmoothL1 in the OrpRef stage of the regression branch, indicating that the KLD loss function can improve performance even further. In the backbone part, we use the SwinT model instead of the ResNet model, and the PR curves also show improvement, which suggests that ORPSD’s backbone network can be compatible with the Transformer-based framework and can further improve the detection performance. For the offshore scenes, it is obvious from Figure 6 that the PR curves of S2ANet and ReDet are lower than those of the other methods, showing their poor detection performance. The proposed method surpasses all others, affirming its effectiveness.

4.4. Visualization

To provide a visual comparison between our method and others, Figure 7 displays the detection results of various methods in the two scenes. The proposed ORPSD outperforms other methods in terms of detection performance, with a low false alarm rate and missed detection rate in both inshore and offshore scenes. Specifically, in the first test image, all other methods except our ORPSD produce erroneous detections in the background, namely, detecting the coast as a ship. There are many places where ReDet and S2Net error detection is needed. BBAV and S2ANet produce false detections in the second test image, while ReDet and RoI-Transformer show inaccurate object locations in their results. In dense scenes (see the third test image), the false positives and missed objects generated by our ORPSD are significantly reduced. S2Net has a more serious case of missed detection, while the other three methods have some bounding boxes that are out of bounds. There is a clear discontinuity between the head and body of the target in the top left corner of the fourth test image. Therefore, all methods except ORPSD and BBAVectors will produce incorrect detections, detecting one target as two targets. In the offshore scene (see the last test image), our ORPSD generates significantly fewer false alarms and missing objects compared to other methods. BBAVectors and ReDet have more missed targets, while RoI-Transformer and S2ANet not only miss some targets but also mistakenly detect islands as ships.

4.5. Model Efficiency

With the same settings, we compare the speed and accuracy of the various methods in Table 5 and Figure 8. Among the two-stage detectors, ORPSD outperforms other models in terms of both accuracy and speed, owing to its efficient ORP coding scheme, which uses fewer anchors and simplifies the conversion mechanism of intermediate parameters. Although incorporating an additional RPN structure slightly reduces the speed of ORPSD (2.2 F P S ) compared to the best first-stage detector, namely S2ANet, the detection accuracy is significantly improved (11.94%).

4.6. Model Generalizability

To evaluate the generalizability of our method, we conducted further experiments on the optical satellite image dataset HRSC201, and the results are shown in Table 6. It can be seen that our method demonstrates the best results on the HRSC203 dataset on m A P .

4.7. Ablation Study

In order to validate the effectiveness of each module proposed in ORPSD, we carried out ablation studies on the RSSDD dataset. The experimental results are presented in Table 7. Each component has a positive effect, and all components are combined to obtain the best performance.

4.7.1. Impact of Encoding

To verify the effectiveness of the encoding module, we replace the ORP encoding scheme with the common five-parameter encoding method in the baseline for comparison. ORP avoids direct regression of the angle parameter, thereby effectively mitigating the angular discontinuity issue and enhancing the detection accuracy from 0.8514 m A P to 0.8803 m A P , as evidenced by the results in the first two rows in Table 7.

4.7.2. Impact of Rectification

To investigate the influence of the rectification module, we replace our rectification method (CQR) with a diagonal pulling method [63] in the baseline for comparison. Although the latter method is simple to use, during the correction process, there may be slight deviations between the direction of the object box and its original direction. In contrast, our method can more accurately align with the actual direction of the object, thereby improving the detection performance. Results in the first and third rows in Table 7 show that the proposed CQR is more effective.

4.7.3. Impact of KLD

To explore the impact of the KLD loss, we replace it with the original SmoothL1Loss for comparison. Comparing the results in the last two rows in Table 7, we find that the KLD loss improves the detection accuracy from 0.8812 m A P to 0.8915 m A P . KLD can dynamically adjust the parameter gradients according to the characteristics of the object [50], such as aspect ratio, thereby facilitating learning better feature representation.

5. Discussion

The proposed ORPSD framework demonstrates significant advancements in oriented ship detection for SAR imagery. While achieving high detection accuracy, the iterative geometric calculations involved in the CQR module for convex hull processing and projection analysis may introduce computational overhead. In real-time processing scenarios, this computational burden could restrict practical deployment, as evidenced in Figure 8. Future work should focus on optimizing both detection accuracy and computational efficiency to address this limitation. Additionally, the ORP encoding scheme circumvents explicit angle regression through innovative projection of OBB parameters onto the edges of outer rectangles. Extending this approach by integrating adaptive weighting mechanisms to improve robustness against extreme aspect ratios emerges as another valuable research avenue.

6. Conclusions

We introduce ORPSD, a novel two-stage detector for ship detection in SAR images. In the first stage, OrpRPN incorporates an efficient target representation (ORP) to simplify proposal generation and reduce intermediate parameter computation. Additionally, a CQR method corrects distorted shapes. In the second stage, OrpRef employs KLD as the regression loss, enhancing detection accuracy, particularly for small ships. Experimental results on the RSSDD and RSAR datasets demonstrate that the proposed ORPSD outperforms representative methods in terms of detection accuracy and speed. The proposed approach holds promise for ship detection in SAR images, with potential for leveraging recent advances in self-supervised pre-training; this is a direction for future exploration. We encourage researchers to explore extensions of ORPSD to handle diverse marine structures or integrate lightweight backbones. By explicitly defining ORPSD’s niche strengths, such as oriented detection in complex, cluttered scenes, and discussing its limitations, we hope to help practitioners deploy the method in scenarios where its advantages are most impactful.

Author Contributions

Conceptualization, M.Z.; Methodology, Y.O.; Software, M.Y.; Validation, J.G.; Writing—review & editing, Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported in part by the National Natural Science Foundation of China under Grant 92470108 and 62272363; in part by the Joint Laboratory for Innovation in Satellite-Borne Computers and Electronics Technology Open Fund 2023 under Grant 2024KFKT001-1.

Data Availability Statement

The RSSDD dataset was downloaded free of charge from the link in [7]. The RASAR dataset was downloaded free of charge from the link in [35].

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Zhang, M.; He, C.; Zhang, J.; Yang, Y.; Peng, X.; Guo, J. SAR-to-Optical Image Translation via Neural Partial Differential Equations. In Proceedings of the IEEE Conference on International Joint Conference on Artificial Intelligence, Vienna, Austria, 23–29 July 2022; pp. 1–7. [Google Scholar]
  2. Zhang, C.; Gao, G.; Zhang, L.; Chen, C.; Gao, S.; Yao, L.; Bai, Q.; Gou, S. A novel full-polarization SAR image ship detector based on scattering mechanisms and wave polarization anisotropy. ISPRS J. Photogramm. Remote Sens. 2022, 190, 129–143. [Google Scholar] [CrossRef]
  3. Lv, J.; Zhu, D.; Geng, Z.; Han, S.; Wang, Y.; Ye, Z.; Zhou, T.; Chen, H.; Huang, J. Recognition for SAR deformation military target from a new MiniSAR dataset using multi-view joint transformer approach. ISPRS J. Photogramm. Remote Sens. 2024, 210, 180–197. [Google Scholar] [CrossRef]
  4. Wang, C.; Cai, X.; Wu, F.; Cui, P.; Wu, Y.; Zhang, Y. Stepwise Attention-Guided Multiscale Fusion Network for Lightweight and High-Accurate SAR Ship Detection. Remote Sens. 2024, 16, 3137. [Google Scholar] [CrossRef]
  5. Wu, B.; Wang, H.; Zhang, C.; Chen, J. Optical-to-SAR Translation Based on CDA-GAN for High-Quality Training Sample Generation for Ship Detection in SAR Amplitude Images. Remote Sens. 2024, 16, 3001. [Google Scholar] [CrossRef]
  6. Lu, Z.; Wang, P.; Li, Y.; Ding, B. A New Deep Neural Network Based on SwinT-FRM-ShipNet for SAR Ship Detection in Complex Near-Shore and Offshore Environments. Remote Sens. 2023, 15, 5780. [Google Scholar] [CrossRef]
  7. Li, J.; Qu, C.; Shao, J. Ship detection in SAR images based on an improved faster R-CNN. In Proceedings of the 2017 SAR in Big Data Era: Models, Methods and Applications (BIGSARDATA), Beijing, China, 13–14 November 2017; IEEE: New York, NJ, USA, 2017; pp. 1–6. [Google Scholar]
  8. Gao, F.; He, Y.; Wang, J.; Hussain, A.; Zhou, H. Anchor-free convolutional network with dense attention feature aggregation for ship detection in SAR images. Remote Sens. 2020, 12, 2619. [Google Scholar] [CrossRef]
  9. Zhu, M.; Hu, G.; Zhou, H.; Wang, S. H2Det: A high-speed and high-accurate ship detector in SAR images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 12455–12466. [Google Scholar] [CrossRef]
  10. Gao, G.; Zhang, C.; Zhang, L.; Duan, D. Scattering Characteristic-Aware Fully Polarized SAR Ship Detection Network Based on a Four-Component Decomposition Model. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5222722. [Google Scholar] [CrossRef]
  11. Zhao, S.; Luo, Y.; Zhang, T.; Guo, W.; Zhang, Z. A domain specific knowledge extraction transformer method for multisource satellite-borne SAR images ship detection. ISPRS J. Photogramm. Remote Sens. 2023, 198, 16–29. [Google Scholar] [CrossRef]
  12. Wang, C.; Bi, F.; Zhang, W.; Chen, L. An intensity-space domain CFAR method for ship detection in HR SAR images. IEEE Geosci. Remote Sens. Lett. 2017, 14, 529–533. [Google Scholar] [CrossRef]
  13. Gao, G.; Bai, Q.; Zhang, C.; Zhang, L.; Yao, L. Dualistic cascade convolutional neural network dedicated to fully PolSAR image ship detection. ISPRS J. Photogramm. Remote Sens. 2023, 202, 663–681. [Google Scholar] [CrossRef]
  14. Zhang, C.Q.; Deng, Y.; Chong, M.Z.; Zhang, Z.W.; Tan, Y.H. Entropy-Based re-sampling method on SAR class imbalance target detection. ISPRS J. Photogramm. Remote Sens. 2024, 209, 432–447. [Google Scholar] [CrossRef]
  15. Zhu, Y.; Du, J.; Wu, X. Adaptive period embedding for representing oriented objects in aerial images. IEEE Trans. Geosci. Remote Sens. 2020, 58, 7247–7257. [Google Scholar] [CrossRef]
  16. Xu, Y.; Fu, M.; Wang, Q.; Wang, Y.; Chen, K.; Xia, G.S.; Bai, X. Gliding vertex on the horizontal bounding box for multi-oriented object detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 1452–1459. [Google Scholar] [CrossRef] [PubMed]
  17. Yi, J.; Wu, P.; Liu, B.; Huang, Q.; Qu, H.; Metaxas, D. Oriented object detection in aerial images with box boundary-aware vectors. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Virtual, 5–9 January 2021; pp. 2150–2159. [Google Scholar]
  18. Ma, J.; Shao, W.; Ye, H.; Wang, L.; Wang, H.; Zheng, Y.; Xue, X. Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans. Multimed. 2018, 20, 3111–3122. [Google Scholar] [CrossRef]
  19. Zhang, Z.; Guo, W.; Zhu, S.; Yu, W. Toward arbitrary-oriented ship detection with rotated region proposal and discrimination networks. IEEE Geosci. Remote Sens. Lett. 2018, 15, 1745–1749. [Google Scholar] [CrossRef]
  20. Yang, X.; Hou, L.; Zhou, Y.; Wang, W.; Yan, J. Dense label encoding for boundary discontinuity free rotation detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; pp. 15819–15829. [Google Scholar]
  21. Yang, X.; Yang, J.; Yan, J.; Zhang, Y.; Zhang, T.; Guo, Z.; Sun, X.; Fu, K. Scrdet: Towards more robust detection for small, cluttered and rotated objects. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 8232–8241. [Google Scholar]
  22. Yang, X.; Yan, J. Arbitrary-oriented object detection with circular smooth label. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 677–694. [Google Scholar]
  23. Yang, X.; Yan, J.; Liao, W.; Yang, X.; Tang, J.; He, T. Scrdet++: Detecting small, cluttered and rotated objects via instance-level feature denoising and rotation loss smoothing. IEEE Trans. Pattern Anal. Mach. Intell. 2022. [Google Scholar] [CrossRef]
  24. An, Q.; Pan, Z.; Liu, L.; You, H. DRBox-v2: An improved detector with rotatable boxes for target detection in SAR images. IEEE Trans. Geosci. Remote Sens. 2019, 57, 8333–8349. [Google Scholar] [CrossRef]
  25. Zhao, S.; Liu, Q.; Yu, W.; Lv, J. A Single-Stage Arbitrary-Oriented Detector Based on Multiscale Feature Fusion and Calibration for SAR Ship Detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 8179–8198. [Google Scholar] [CrossRef]
  26. Zhang, L.; Liu, Y.; Zhao, W.; Wang, X.; Li, G.; He, Y. Frequency-Adaptive Learning for SAR Ship Detection in Clutter Scenes. IEEE Trans. Geosci. Remote Sens. 2023. [Google Scholar] [CrossRef]
  27. Zhang, C.; Gao, G.; Liu, J.; Duan, D. Oriented Ship Detection Based on Soft Thresholding and Context Information in SAR Images of Complex Scenes. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5200615. [Google Scholar] [CrossRef]
  28. He, Y.; Gao, F.; Wang, J.; Hussain, A.; Yang, E.; Zhou, H. Learning polar encodings for arbitrary-oriented ship detection in SAR images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 3846–3859. [Google Scholar] [CrossRef]
  29. Zhang, J.; Xing, M.; Sun, G.C.; Li, N. Oriented Gaussian function-based box boundary-aware vectors for oriented ship detection in multiresolution SAR imagery. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5211015. [Google Scholar] [CrossRef]
  30. Gao, F.; Huo, Y.; Sun, J.; Yu, T.; Hussain, A.; Zhou, H. Ellipse encoding for arbitrary-oriented SAR ship detection based on dynamic key points. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5240528. [Google Scholar] [CrossRef]
  31. Pan, D.; Gao, X.; Dai, W.; Fu, J.; Wang, Z.; Sun, X.; Wu, Y. SRT-Net: Scattering Region Topology Network for Oriented Ship Detection in Large-Scale SAR Images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5202318. [Google Scholar] [CrossRef]
  32. Liu, Y.; Zhang, M.H.; Xu, P.; Guo, Z.W. SAR ship detection using sea-land segmentation-based convolutional neural network. In Proceedings of the 2017 International Workshop on Remote Sensing with Intelligent Processing (RSIP), Shanghai, China, 19–21 May 2017; IEEE: New York, NJ, USA, 2017; pp. 1–4. [Google Scholar]
  33. Pan, Z.; Yang, R.; Zhang, Z. MSR2N: Multi-stage rotational region based network for arbitrary-oriented ship detection in SAR images. Sensors 2020, 20, 2340. [Google Scholar] [CrossRef]
  34. He, B.; Zhang, Q.; Tong, M.; He, C. Oriented ship detector for remote sensing imagery based on pairwise branch detection head and SAR feature enhancement. Remote Sens. 2022, 14, 2177. [Google Scholar] [CrossRef]
  35. Zhang, X.; Yang, X.; Li, Y.; Yang, J.; Cheng, M.M.; Li, X. RSAR: Restricted State Angle Resolver and Rotated SAR Benchmark. arXiv 2025, arXiv:2501.04440. [Google Scholar]
  36. Pappas, O.; Achim, A.; Bull, D. Superpixel-level CFAR detectors for ship detection in SAR imagery. IEEE Geosci. Remote Sens. Lett. 2018, 15, 1397–1401. [Google Scholar] [CrossRef]
  37. Gao, F.; Ma, F.; Wang, J.; Sun, J.; Yang, E.; Zhou, H. Visual saliency modeling for river detection in high-resolution SAR imagery. IEEE Access 2017, 6, 1000–1014. [Google Scholar] [CrossRef]
  38. Lin, H.; Song, S.; Yang, J. Ship classification based on MSHOG feature and task-driven dictionary learning with structured incoherent constraints in SAR images. Remote Sens. 2018, 10, 190. [Google Scholar] [CrossRef]
  39. Gao, G.; Ouyang, K.; Luo, Y.; Liang, S.; Zhou, S. Scheme of Parameter Estimation for Generalized Gamma Distribution and Its Application to Ship Detection in SAR Images. IEEE Trans. Geosci. Remote Sens. 2017, 55, 1812–1832. [Google Scholar] [CrossRef]
  40. Wei, S.; Zeng, X.; Qu, Q.; Wang, M.; Su, H.; Shi, J. HRSID: A high-resolution SAR images dataset for ship detection and instance segmentation. IEEE Access 2020, 8, 120234–120254. [Google Scholar] [CrossRef]
  41. Zhang, T.; Zhang, X.; Ke, X.; Zhan, X.; Shi, J.; Wei, S.; Pan, D.; Li, J.; Su, H.; Zhou, Y.; et al. LS-SSDD-v1. 0: A deep learning dataset dedicated to small ship detection from large-scale Sentinel-1 SAR images. Remote Sens. 2020, 12, 2997. [Google Scholar] [CrossRef]
  42. Zhou, X.; Wang, D.; Krähenbühl, P. Objects as points. arXiv 2019, arXiv:1904.07850. [Google Scholar]
  43. Han, J.; Ding, J.; Xue, N.; Xia, G.S. Redet: A rotation-equivariant detector for aerial object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; pp. 2786–2795. [Google Scholar]
  44. Han, J.; Ding, J.; Li, J.; Xia, G.S. Align deep features for oriented object detection. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5602511. [Google Scholar] [CrossRef]
  45. Ding, J.; Xue, N.; Long, Y.; Xia, G.S.; Lu, Q. Learning RoI transformer for oriented object detection in aerial images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 2849–2858. [Google Scholar]
  46. Qian, W.; Yang, X.; Peng, S.; Yan, J.; Guo, Y. Learning modulated loss for rotated object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; Volume 35, pp. 2458–2466. [Google Scholar]
  47. Guo, P.; Celik, T.; Liu, N.; Li, H.C. Break Through the Border Restriction of Horizontal Bounding Box for Arbitrary-Oriented Ship Detection in SAR Images. IEEE Geosci. Remote Sens. Lett. 2023, 20, 4005505. [Google Scholar] [CrossRef]
  48. Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
  49. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
  50. Yang, X.; Yang, X.; Yang, J.; Ming, Q.; Wang, W.; Tian, Q.; Yan, J. Learning high-precision bounding box for rotated object detection via kullback-leibler divergence. Adv. Neural Inf. Process. Syst. 2021, 34, 18381–18394. [Google Scholar]
  51. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  52. Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
  53. Loshchilov, I.; Hutter, F. Decoupled weight decay regularization. arXiv 2017, arXiv:1711.05101. [Google Scholar]
  54. Chen, K.; Wang, J.; Pang, J.; Cao, Y.; Xiong, Y.; Li, X.; Sun, S.; Feng, W.; Liu, Z.; Xu, J.; et al. MMDetection: Open mmlab detection toolbox and benchmark. arXiv 2019, arXiv:1906.07155. [Google Scholar]
  55. Zhao, W.; Huang, L.; Liu, H.; Yan, C. Scattering-Point-Guided Oriented RepPoints for Ship Detection. Remote Sensing 2024, 16, 933. [Google Scholar] [CrossRef]
  56. Chen, B.; Xue, F.; Song, H. A Lightweight Arbitrarily Oriented Detector Based on Transformers and Deformable Features for Ship Detection in SAR Images. Remote Sens. 2024, 16, 237. [Google Scholar] [CrossRef]
  57. Guo, Y.; Chen, S.; Zhan, R.; Wang, W.; Zhang, J. LMSD-YOLO: A lightweight YOLO algorithm for multi-scale SAR ship detection. Remote Sens. 2022, 14, 4801. [Google Scholar] [CrossRef]
  58. Xu, Z.; Zhai, J.; Huang, K.; Liu, K. DSF-Net: A dual feature shuffle guided multi-field fusion network for SAR small ship target detection. Remote Sens. 2023, 15, 4546. [Google Scholar] [CrossRef]
  59. Wan, H.; Chen, J.; Huang, Z.; Xia, R.; Wu, B.; Sun, L.; Yao, B.; Liu, X.; Xing, M. AFSar: An anchor-free SAR target detection algorithm based on multiscale enhancement representation learning. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5219514. [Google Scholar] [CrossRef]
  60. Zhou, Z.; Chen, J.; Huang, Z.; Lv, J.; Song, J.; Luo, H.; Wu, B.; Li, Y.; Diniz, P.S. HRLE-SARDet: A lightweight SAR target detection algorithm based on hybrid representation learning enhancement. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5203922. [Google Scholar] [CrossRef]
  61. Liu, Y.; Jiang, W. OII: An Orientation Information Integrating Network for Oriented Object Detection in Remote Sensing Images. Remote Sens. 2024, 16, 731. [Google Scholar] [CrossRef]
  62. Ming, Q.; Miao, L.; Zhou, Z.; Dong, Y. CFC-Net: A critical feature capturing network for arbitrary-oriented object detection in remote-sensing images. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5605814. [Google Scholar] [CrossRef]
  63. Xie, X.; Cheng, G.; Wang, J.; Yao, X.; Han, J. Oriented R-CNN for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 3520–3529. [Google Scholar]
Figure 1. Comparison between HBB-based detection and OBB-based detection. (a) Result of HBB-based detection; (b) result of OBB-based detection. The blue boxes represent horizontal detection boxes and green boxes represent oriented detection boxes.
Figure 1. Comparison between HBB-based detection and OBB-based detection. (a) Result of HBB-based detection; (b) result of OBB-based detection. The blue boxes represent horizontal detection boxes and green boxes represent oriented detection boxes.
Remotesensing 17 01511 g001
Figure 2. The overall architecture of the proposed two-stage detector, ORPDet. In the first stage, OrpRPN, the ground truth box ( G T ) is encoded using the ORP scheme to obtain O, which is then combined with the anchors to calculate the regression loss. During inference, we propose CQR to correct the proposals with distorted shapes. In the second stage, OrpReg, it classifies these oriented proposals and refines their locations via the KLD-based regression loss.
Figure 2. The overall architecture of the proposed two-stage detector, ORPDet. In the first stage, OrpRPN, the ground truth box ( G T ) is encoded using the ORP scheme to obtain O, which is then combined with the anchors to calculate the regression loss. During inference, we propose CQR to correct the proposals with distorted shapes. In the second stage, OrpReg, it classifies these oriented proposals and refines their locations via the KLD-based regression loss.
Remotesensing 17 01511 g002
Figure 3. Illustration of the ORP. (a) An example represented by an ORP. (b) The schematic diagram of ORP representation. The red solid box denotes the ground truth O. The red and orange dots are the center and four vertices of O, respectively. The orange-dotted box is a horizontal outer rectangle around O. The blue and green arrows indicate two directions of the projection. The purple-dotted box is the preset anchor, and the purple arrows indicate the direction of the affine transformation.
Figure 3. Illustration of the ORP. (a) An example represented by an ORP. (b) The schematic diagram of ORP representation. The red solid box denotes the ground truth O. The red and orange dots are the center and four vertices of O, respectively. The orange-dotted box is a horizontal outer rectangle around O. The blue and green arrows indicate two directions of the projection. The purple-dotted box is the preset anchor, and the purple arrows indicate the direction of the affine transformation.
Remotesensing 17 01511 g003
Figure 4. Illustration of regression of the OBB. The red box and dots denote the anchor and its center point, respectively. The orange-dotted box denotes the outer rectangle, while the purple solid box denotes the outputted oriented proposals. The orange and purple dots denote the four corners and midpoints of the box, respectively. The green and blue arrows indicate the length of the projection.
Figure 4. Illustration of regression of the OBB. The red box and dots denote the anchor and its center point, respectively. The orange-dotted box denotes the outer rectangle, while the purple solid box denotes the outputted oriented proposals. The orange and purple dots denote the four corners and midpoints of the box, respectively. The green and blue arrows indicate the length of the projection.
Remotesensing 17 01511 g004
Figure 5. Illustration of the CQR process. The blue parallelogram on the left depicts the proposal generated by OrpRPN. The red box on the right represents the rectangular proposal used for RotatedRoIAlign.
Figure 5. Illustration of the CQR process. The blue parallelogram on the left depicts the proposal generated by OrpRPN. The red box on the right represents the rectangular proposal used for RotatedRoIAlign.
Remotesensing 17 01511 g005
Figure 6. PR curves of various methods in inshore and offshore scenes. (a) PR curves in inshore scenes. (b) PR curves in offshore scenes.
Figure 6. PR curves of various methods in inshore and offshore scenes. (a) PR curves in inshore scenes. (b) PR curves in offshore scenes.
Remotesensing 17 01511 g006
Figure 7. Visualization of the detection results of various methods. The red rectangular box denotes the G T , and the green rectangular box indicates the detection results of the four comparative methods. The blue rectangular box is the detection result of our method.
Figure 7. Visualization of the detection results of various methods. The red rectangular box denotes the G T , and the green rectangular box indicates the detection results of the four comparative methods. The blue rectangular box is the detection result of our method.
Remotesensing 17 01511 g007
Figure 8. Speed vs. accuracy on the RSSDD test set.
Figure 8. Speed vs. accuracy on the RSSDD test set.
Remotesensing 17 01511 g008
Table 1. Recall of OrpRPN on the RSSDD validation set.
Table 1. Recall of OrpRPN on the RSSDD validation set.
MethodRSSDD Validation
Recall-300Recall-1000Recall-2000
ORPSD81.6189.2390.03
Table 2. Comparison of different methods on the RSSDD test set. Bold items indicate the optimal values in each column, while the underlined items indicate the second-best values in each column.
Table 2. Comparison of different methods on the RSSDD test set. Bold items indicate the optimal values in each column, while the underlined items indicate the second-best values in each column.
MethodAnchor-FreeStageInshoreOffshoreAll Scenes
PrReF1mAPPrReF1mAPTimes (ms)FPS
BBAVectors [17]YesOne0.78810.79310.79060.76880.94010.94020.94010.90015020
S2ANet [44]NoOne0.78220.77160.77690.75210.84120.78270.81090.80024124.4
RoI-Transformer [45]NoTwo0.70270.79390.74550.75410.95420.93520.94460.90265518.2
ReDet [43]NoTwo0.79030.80120.79570.76250.87210.83450.85290.84025020
LMSD-YOLO [31]YesOne0.79820.79010.79410.77800.93210.95020.94010.88192835.2
SPG-OSD [55]NoTwo0.79900.80490.80190.79080.95090.92370.93710.90154721.3
LD-Det [56]NoOne0.82310.80390.81340.80760.95010.94020.94510.90215020
ORPSD+SmoothL1Loss (Ours)NoTwo0.87710.83560.85580.84910.94210.93920.94060.90284522.2
ORPSD+KLD [50] (Ours)NoTwo0.88320.84210.86220.86210.95710.95310.95510.90494522.2
ORPSD+SwinT [52]+KLD [50] (Ours)NoTwo0.88670.84310.86440.87560.95820.94560.95190.90515219.2
Table 3. Comparison of different methods on the RSAR test set.
Table 3. Comparison of different methods on the RSAR test set.
leftRSAR
PrReF1mAP
BBAVectors [17]0.89010.81200.84920.6127
S2ANet [17]0.80480.87560.83860.6333
RoI-Transformer [45]0.88800.89070.88940.6689
ReDet [43]0.88720.87440.88070.6692
LMSD-YOLO [57]0.87900.90240.89060.6603
SPG-OSD [55]0.82120.80980.81550.6442
LD-Det [56]0.90320.89170.89740.6820
Ours0.93030.92680.92860.6930
Table 4. Comparison of results with HBB-based methods on the RSSDD dataset. Bold values denote the best performance.
Table 4. Comparison of results with HBB-based methods on the RSSDD dataset. Bold values denote the best performance.
MethodRSSDD
F1mAP
DSF-Net [58]0.78230.7588
AFSar [59]0.79320.7948
HRLE-SARDet [60]0.79400.7721
ORPSD+KLD (Ours)0.86440.8756
Table 5. Speed vs. accuracy on the RSSDD dataset. Bold items represent the best.
Table 5. Speed vs. accuracy on the RSSDD dataset. Bold items represent the best.
MethodAll Scenes
mAPParams (M)FPS
BBAVectors [17]0.850142.520
S2ANet [44]0.772135.0224.4
RoI-Transformer [45]0.863227318.2
ReDet [43]0.80456820
LMSD-YOLO [41]0.77807.435.2
ORPSD+KLD (Ours)0.891515.322.2
Table 6. Comparison of results on the HRSC201 dataset. Bold values denote the best performance.
Table 6. Comparison of results on the HRSC201 dataset. Bold values denote the best performance.
MethodHRSC201
F1mAP
OIINet [61]0.85420.8630
ROI-Transformer [45]0.87450.8760
CFCNet [62]0.90320.8851
LD-Det [56]0.88450.8620
ORPSD+KLD (Ours)0.89150.8891
Table 7. Ablation study of ORP, CQR, and KLD-Loss on the RSSDD dataset represented by m A P . Bold value denote the best performance.
Table 7. Ablation study of ORP, CQR, and KLD-Loss on the RSSDD dataset represented by m A P . Bold value denote the best performance.
MethodOrpRPNOrpRefmAP
Encoding (ORP) Rectification (CQR) SmoothL1Loss KDL
Baseline 0.8514
ORPSD 0.8803
0.8721
0.8812
0.8915
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, M.; Ouyang, Y.; Yang, M.; Guo, J.; Li, Y. ORPSD: Outer Rectangular Projection-Based Representation for Oriented Ship Detection in SAR Images. Remote Sens. 2025, 17, 1511. https://doi.org/10.3390/rs17091511

AMA Style

Zhang M, Ouyang Y, Yang M, Guo J, Li Y. ORPSD: Outer Rectangular Projection-Based Representation for Oriented Ship Detection in SAR Images. Remote Sensing. 2025; 17(9):1511. https://doi.org/10.3390/rs17091511

Chicago/Turabian Style

Zhang, Mingjin, Yuanjun Ouyang, Minghai Yang, Jie Guo, and Yunsong Li. 2025. "ORPSD: Outer Rectangular Projection-Based Representation for Oriented Ship Detection in SAR Images" Remote Sensing 17, no. 9: 1511. https://doi.org/10.3390/rs17091511

APA Style

Zhang, M., Ouyang, Y., Yang, M., Guo, J., & Li, Y. (2025). ORPSD: Outer Rectangular Projection-Based Representation for Oriented Ship Detection in SAR Images. Remote Sensing, 17(9), 1511. https://doi.org/10.3390/rs17091511

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop