Next Article in Journal
Rapeseed Area Extraction Based on Time-Series Dual-Polarization Radar Vegetation Indices
Previous Article in Journal
A Geospatial Livestock-Carrying Capacity Model (GLCC) in the Akmola Oblast, Kazakhstan
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

CV-YOLO: A Complex-Valued Convolutional Neural Network for Oriented Ship Detection in Single-Polarization Single-Look Complex SAR Images

1
School of Information and Communication Engineering, Hainan University, Haikou 570228, China
2
Suzhou Key Laboratory of Microwave Imaging, Processing and Application Technology, Suzhou 215123, China
3
Suzhou Aerospace Information Research Institute, Suzhou 215123, China
4
Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, China
5
National Defense University, Beijing 100091, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2025, 17(8), 1478; https://doi.org/10.3390/rs17081478
Submission received: 28 February 2025 / Revised: 15 April 2025 / Accepted: 17 April 2025 / Published: 21 April 2025

Abstract

:
Deep learning has significantly advanced synthetic aperture radar (SAR) ship detection in recent years. However, existing approaches predominantly rely on amplitude information while largely overlooking the critical phase component, limiting further performance improvements. Additionally, unlike optical images, which benefit from a variety of enhancement techniques, complex-valued SAR images lack effective processing methods. To address these challenges, we propose Complex-Valued You Only Look Once (CV-YOLO), an anchor-free, oriented bounding box (OBB)-based ship detection network that fully exploits both amplitude and phase information from single-polarization, single-look complex SAR images. Furthermore, we introduce novel complex-valued data augmentation strategies—including complex-valued Gaussian filtering, complex-valued Mosaic data augmentation, and complex-valued mixed sample data augmentation—to enhance sample diversity and significantly improve the generalization capability of complex-valued networks. Experimental evaluations of the Complex-Valued SAR Images Rotation Ship Detection Dataset (CSRSDD) demonstrate that our method surpasses real-valued networks with identical architectures and outperforms leading real-valued approaches, validating the effectiveness of our proposed methodology.

1. Introduction

Synthetic aperture radar (SAR) is an active microwave-based remote sensing instrument that employs virtual array and pulse compression technology to generate high-resolution imagery of Earth’s surface features. Its ability to operate regardless of weather, lighting, or environmental conditions makes SAR a crucial tool for ocean monitoring. SAR-based ship detection holds great potential for military and civilian uses [1,2].
Conventional SAR ship detection techniques primarily depend on image or signal processing technologies, including constant false alarm rate (CFAR) detection methods that estimate background noise and set dynamic thresholds [3,4], edge detection methods using gradient operators [5], texture analysis methods [6], and polarization feature analysis methods [7]. However, traditional methods require professional expertise and often yield poor detection results.
Deep learning-based SAR ship detection algorithms do not require manual design of features and automatically learn features with labeled samples and ground truth. Their end-to-end simple designs and excellent performance have garnered increasing attention from scholars. Deep learning-based detectors are broadly categorized into one-stage and two-stage approaches. One-stage approaches directly predict target coordinates and class probabilities, exemplified by the You Only Look Once (YOLO) series (v1–v7) [8,9,10,11,12,13,14]. For instance, Tang et al. [15] introduced noise classifying-based YOLO (N-YOLO) and Zhou et al. [16] proposed a multi-scale network, both tailored for SAR ship detection. Conversely, two-stage approaches initially generate region proposals likely to contain targets, subsequently refining these proposals through identification and prediction. This category is represented by models such as Faster Region-based Convolutional Neural Network (R-CNN) [17], feature pyramid networks (FPNs) [18], Cascaded R-CNN [19], and Mask R-CNN [20]. Kang et al. [21] integrated CFAR with Faster R-CNN, while Cui et al. [22] presented a multi-scale attention network for SAR ship detection.
These previously mentioned methods are all anchor-based algorithms, which require pre-set hyperparameters including anchor numbers, size, aspect ratio, and intersection over union (IoU) thresholds. Additionally, generating anchor boxes to match the labels increases the computational cost. To address these issues, anchor-free algorithms like the Corner-based Keypoint Detection Network (CornerNet) [23], the Extreme Point-based Object Detection Network (ExtremeNet) [24], the Center-based Keypoint Detection Network (CenterNet) [25], objects as points [26], fully convolutional one-stage object detection (FCOS) [27], and FoveaBox [28] predict the key points of the targets, opening up another direction for target detection. Anchor-free methods bypass anchors, decreasing the number of predicted boxes while enhancing detection performance and speed. Researchers have started applying anchor-free methods to the SAR ship detection field. Wang et al. [29] introduced CenterNet, Yang et al. [30] suggested Improved-FCOS, and Cao et al. [31] proposed lightweight high-precision YOLO (LH-YOLO), built upon YOLOv8, which employed weight-sharing mechanism.
These algorithms are all based on horizontal bounding boxes (HBBs). However, ships in SAR images usually have large aspect ratios and arbitrary direction, and HBBs cannot accurately surround the target and tend to introduce a lot of background information, particularly in near-shore scenarios with densely berthed ships. Furthermore, there is a lot of overlap between HBBs, which brings great difficulties to detection. Oriented ship detection with oriented bounding boxes (OBBs) has become a research focus [32]. An et al. [33] used OBBs to address the issue of excessive parameters in traditional CNN models. Chen et al. [34] developed a multi-scale architecture based on OBBs to handle the arbitrary orientation of SAR ships. Chen et al. [35] and Yang et al. [36] presented transformer-based single-stage methods for oriented SAR ship detection. Huang et al. [37] further advanced YOLOv11 by integrating an attention mechanism to enable high-performance oriented ship detection in SAR images.
Nevertheless, all the aforementioned algorithms solely consider the amplitude information of SAR images while disregarding or neglecting the phase information. On one hand, the amplitude images bear resemblance to optical images, making it convenient for direct input into classic networks in computer vision. On the other hand, complex-valued neural networks face challenges stemming from Liouville’s theorem, which establishes that any bounded entire function must be constant. This necessitates non-holomorphic loss functions. Prior to 1990, Liouville’s theorem posed a significant challenge as some researchers believed that non-differentiability would hinder tracking dynamics within complex-valued convolutional networks [38,39]. This impasse was resolved through the seminal introduction of Wirtinger Calculus [40], which established a rigorous gradient definition for non-holomorphic functions and has been widely employed in optimizing and training complex-valued models.
Complex-valued (CV) networks are gradually coming into the view of researchers. B. Widrow et al. [41] first introduced complex domain neural networks and derived the gradient descent process for separate real and imaginary parts. D. H. Brandwood et al. [42] further generalized gradient descent using Wirtinger Calculus, allowing the gradient to be calculated with respect to complex-valued variables rather than separately for real and imaginary components. In the field of SAR, Zhang et al. [43] introduced the Complex-valued Convolutional Neural Network (CV-CNN), which outperformed real-valued networks in PolSAR image classification. Hua et al. [44] developed the Complex-valued SAR Ship Refocusing Network (CV-SSRN) for the three-dimensional rotational refocusing task. Additionally, Hua et al. [45] proposed the Complex-valued Motion Network (CV-MotionNet) for classifying moving ship targets in SAR images. Zhu et al. [46] introduced the Fully Complex-valued Lightweight Network (CVLWNet), a lightweight version of CV-CNN. For SAR target recognition, Yu et al. [47] proposed the Complex-valued Fully Convolutional Neural Network (CV-FCNN), which employs complex-valued convolutional layers. Fang et al. [48] developed the Defocusing Adaptive Complex CNN (DA-CCNN), a novel fully adaptive CV-CNN, enhanced by an image entropy measurement technique. Zhou et al. [49] constructed the Multi-scale Complex-valued Feature Attention CNN (MsCvFA-CNN), a multi-scale CV-CNN incorporating a complex-valued attention module. Wang et al. [50] proposed the Complex-valued Network Guided with Sub-aperture Decomposition (CGS-Net), a CV-CNN that leverages sub-aperture decomposition.
While complex-valued networks show promising results in SAR target recognition, they have not yet been investigated for SAR ship detection, to the best of our knowledge. To avoid using CV networks, some scholars have proposed feeding amplitude and phase separately into real-valued (RV) networks for detection or recognition, but this does not conform to the inherent characteristics of SAR and destroys the correlation within complex-valued data. Therefore, the development of a CV neural network specifically tailored for SAR ship detection is both necessary and promising. In this context, we introduce CV-YOLO, a novel architecture capable of processing single-channel complex-valued SAR data, thereby effectively exploiting both amplitude and phase information. To the best of our knowledge, this represents the first application of CV neural networks in the SAR ship detection field, providing valuable insights for advancing research in SAR target detection. The main contributions of this work are as follows:
(1)
This paper proposes a novel complex-valued transformation method for a real-valued detection network. By extending fundamental operations including convolution, batch normalization, up sampling, and max pooling to complex domain, our approach achieves significant performance improvements in detection tasks.
(2)
We designed several complex-valued modules to support the operation of CV-YOLO, such as a complex-valued convolution batch normalization and activation (CCBA) module for complex-valued feature extraction, the Complex-valued Cross Stage Partial Network with 2 Convolutions (CC2f) module for feature fusion, the Complex-valued Spatial Pyramid Pooling-Fast (CSPPF) module for spatial pyramid feature extraction, a complex-valued up sampling (CUpsampling) module for up sampling, and the Complex-valued Detect (CDetect) module for detection. These modules lay the foundation for complex-valued detection networks.
(3)
Considering the challenges in complex-valued SAR image processing, we studied special data augmentation methods for complex-valued images. These enhancement techniques effectively augment the diversity of the training dataset, resulting in a significant improvement in detection performance.
(4)
Ablation analysis revealed that the improvements in the performance of CV networks are not due to an increase in network parameters, but rather the inclusion of phase information. CV networks maintain the one-to-one correspondence between amplitude and phase in SAR data, that is a critical factor in their effectiveness.
The remainder of this paper is structured as follows: Section 2 provides a comprehensive description of the proposed methodology. Section 3 presents a detailed analysis of the experimental results. Finally, Section 4 summarizes the key findings and Section 5 concludes the paper.

2. Materials and Methods

2.1. Dataset

We conducted experiments using the Complex-valued SAR Ship Rotation Detection Dataset (CSRSDD) [51]. CSRSDD is based on GaoFen-3, China’s first C-band polarimetric SAR satellite, which supports 12 imaging modes and quad-polarization. The dataset contains 1-m resolution single-look complex (SLC) horizontal–horizontal (HH) polarized SAR images, each 1024 × 1024 pixels. It includes 514 images featuring 10 ship categories (Ship1–Ship7, Light boat, Cargo, and Other) with an imbalanced distribution—Ship1, Ship2, Ship4, Ship6, and Ship7 are underrepresented compared to other types. The dataset is currently not publicly available. The original SAR data were stored as 16-bit, 2-channel TIFF files, following geometric correction and radiometric calibration, the images were annotated with dataset for object detection in aerial images (DOTA) format oriented bounding boxes. The dataset was randomly partitioned into a training set and a test set at the ratio of 4:1 as shown in Table 1.

2.2. Overall Architecture

The YOLO series has garnered significant recognition in the field of computer vision, with researchers continually refining the methodology and incorporating innovative components, leading to the development of several influential models. YOLO versions 1 through 7 are anchor-based and utilize HBBs, while YOLOv8 is a one-stage, anchor-free detector that eliminates the need for specialized anchor computations, resulting in faster training and inference. Its native support for OBBs aligns perfectly with our OBB-annotated complex-valued dataset, eliminating the need for additional rotation angle calculations. Furthermore, research has demonstrated that YOLOv8 is a stable and well-established version within the YOLO series. Building on these advancements, we introduce CV-YOLO, a complex-valued SAR ship detection network based on the YOLOv8n-obb framework.
Unlike optical images, synthetic aperture radar (SAR) data are inherently complex-valued, consisting of both amplitude and phase information. The amplitude component reflects the backscattering intensity between radar waves and targets, encoding physical properties such as surface roughness, material composition, and geometric structure. The phase component captures the phase difference between the radar waves and the target, offering elevation cues that the target’s distance from the radar. Moreover, distinctive phase signatures emerge from various scattering mechanisms, such as single-bounce, double-bounce, and triple-bounce reflections. By fully leveraging both amplitude and phase information, we can extract more discriminative features, significantly enhancing detection performance.
In contrast to RV networks, CV networks explicitly incorporate both the real and imaginary components, which implicitly encode amplitude and phase information. This results in a model with twice the number of parameters compared to RV networks. Channel expansion in CV networks facilitates more sophisticated nonlinear interactions between the real and imaginary parts, thereby improving feature separability in the complex domain.
Motivated by these principles, we propose CV-YOLO, which comprises a complex-valued backbone (CBackbone), a complex-valued neck (CNeck), and a complex-valued head (CHead) as shown in Figure 1.

2.3. Complex-Valued Backbone

CV-YOLO integrates the complex-valued CSPDarknet53 as its backbone, as outlined in Figure 1a. In the complex-valued backbone (CBackbone), we designed several complex-valued modules, like CCBA, CC2f and CSPPF.

2.3.1. CCBA

CCBA is the fundamental building block of the complex-valued detection network, with its structure shown in Figure 2, where k denotes the kernel size, s represents the stride, and p indicates the padding.
CCL represents a complex-valued convolutional layer. To directly engage with complex-valued data, Trabelsi et al. [52] introduced the concept of complex-valued convolution. It represents an innovative expansion of traditional convolution techniques into the complex domain. The operation is formally defined as follows:
W     X = ( W R + W I i )     ( X R + X I i ) = ( W R     X R W I     X I ) + ( W R     X I + W I     X R ) i
In the aforementioned equation,     denotes traditional convolution. X R and X I , respectively, signify the real and imaginary parts of complex-valued input X , while W R and W I , respectively, represent two components of the complex-valued convolutional kernel W . The phase angle θ and amplitude r of W     X can be calculated from the real and imaginary parts:
θ = a r c t a n ( W R     X I + W I     X R W R     X R W I     X I )
r = ( W R     X R W I     X R ) 2 + ( W R     X I + W I     X R ) 2            = W R 2 X R 2 + W I 2 X I 2 + W R 2 X I 2 + W I 2 X R 2
Regarding the complex-valued batch normalization layer, batch normalization in deep neural networks standardizes data in each mini-batch. This process stabilizes layer outputs, speeds up convergence, and mitigates overfitting, as referenced in [53]. Trabelsi introduced a complex-valued batch normalization technique in which the mean X ~ and variance V of the complex-valued mini-batch X are calculated as
X ~ = X E ( X ) V
V = c o v ( X R , X R ) c o v ( X R , X I ) c o v ( X I , X R ) c o v ( X I , X I )
where c o v represents the covariance. This normalization method necessitates the computation of a matrix square root, resulting in substantial computational overhead. Furthermore, we found that this batch normalization technique fails to adequately facilitate the convergence of complex-valued detection networks.
Previous studies have utilized a simplified complex-valued batch normalization (CBN) approach, wherein real-valued batch normalization (BN) is independently applied to the real and imaginary components of the input X , followed by their recombination. In this work, we adopted this established simplified CBN methodology.
C B N ( X ) = B N ( X R ) + B N ( X I ) i
Regarding the complex-valued activation function layer, researchers have developed various complex activation functions for representing complex-valued data, including the Modulus Rectified Linear Unit (ModReLU) [54], the Complex Rectified Linear Unit (CReLU) [52], the Zoneout Rectified Linear Unit (ZReLU) [55], and the cardiac activation function [56]. There is no consensus in the literature on the ideal activation function for complex-valued neural networks. Building on the outstanding performance of the Rectified Linear Unit (ReLU) in RV networks, we previously applied CReLU to the Complex-Valued Visual Geometry Group Network (CVGG-Net) [57] and obtained strong results. Therefore, we opted to continue using CReLU in this study. Although CReLU is not holomorphic, Wirtinger Calculus enables back propagation with full complex-valued activation, allowing for effective gradient propagation.
C R e l u ( X ) = R e l u ( X R ) + R e l u ( X I ) i

2.3.2. CC2f

CC2f represents the complex-domain extension of the Cross Stage Partial Network with 2 Fusion (C2f) from YOLOv8. As depicted in Figure 3, the CCBA module extracts features with c channels, which are then split into two halves along the channel dimension (each with c/2 channels). One half is processed through n bottleneck modules to generate n intermediate outputs (each maintaining c/2 channels), while the other half remains unchanged. The original two halves are then concatenated with the n bottleneck outputs, followed by another CCBA module to restore the channel dimension to c. Here the value of n was set to 1. CC2f module splits the feature map into two branches: one branch preserves the original feature to avoid information loss, while the other extracts deeper features through a series of bottleneck blocks. This design reduces computational redundancy and ultimately fuses shallow and deep features via concatenation, thereby enhancing the network’s multi-scale representation capability.
A bottleneck module provides two types of mappings controlled by a shortcut. When the shortcut is set to true, it performs an identity mapping, where the residual is added before the activation function. Conversely, when the shortcut is set to false, it applies a convolutional mapping, where the activation is applied without residual addition. Notably, the shortcut is set to true in the CBackbone and false in the CNeck.

2.3.3. CSPPF

Spatial pyramid pooling (SPP) is a highly effective structure proposed by He [58], which solves the problem of extracting related repetitive features of convolutional neural networks. Building upon the SPP framework and integrating Complex-valued MaxPool2d (CMaxPool2d), we have developed the CSPPF as shown in Figure 4. By employing multiple small-sized pooling kernels in a cascading manner rather than a single large kernel, CSPPF blends feature maps of multi-scale receptive fields, thereby enriching their feature representation capability and further accelerating the processing speed.
CMaxPool2d is calculated as
C M a x P o o l 2 d ( z ) = M a x P o o l 2 d ( r ) c o s θ + M a x P o o l 2 d ( r ) i s i n θ          = M a x P o o l 2 d ( r ) e i θ
where θ represents the phase, r represents the amplitude, and M a x P o o l 2 d represents real-valued max pooling.

2.4. Complex-Valued Neck

Inspired by the Path Aggregation Network (PANet), CV-YOLO includes a double-flow CNeck, as shown in Figure 1b. In the top-down flow, the output features from layer 6 are fused with those from layer 10, and similarly layer 4 combines with layer 13, while in the bottom-up flow, feature maps from layer 12 integrate with layer 16, while layer 9 merges with layer 19. This allows the network to more effectively integrate different level’s features.
In the CNeck, we designed the complex-valued upsampling (CUpsampling) module for upsampling. CUpsampling, which scales a low-resolution image or feature map to a desired dimension, is an extension of RV upsampling techniques. It is performed with the following calculations:
C U p s a m p l e ( X ) = U p s a m p l e ( X R ) + U p s a m p l e ( X I ) i
In this paper, the CUpsampling method is the nearest neighbor interpolation, as shown in Figure 5.

2.5. Complex-Valued Head

The CHead processes the multi-scale feature maps emanating from the neck by feeding them into CDetect. This step is crucial for distilling higher-level features, which are then utilized to produce feature maps for predicted bounding boxes (Box), class labels (Cls), and orientation angles (Angle). The CHead structure is depicted in Figure 1c, and the CDetect module was designed in this part as shown in Figure 6, in which reg_max is set to 16, and nc, representing the number of classes, is set to 10.
As depicted in Figure 6, CDetect employs three distinct branches that receive inputs from the CBackbone and CNeck. It further refines the complex-valued features via two CCBA modules. Subsequently, the complex-valued features are translated into real domain by the Abs layer, culminating in the processes of target recognition, bounding box regression for predictions, and rotation angle estimation. The CHead predicts five parameters: x, y, w, h, and θ. Here, (x, y) denotes the center coordinates, w and h are scaling factors relative to predefined anchor box dimensions across different feature maps, and θ represents the rotation angle. The anchor boxes that most closely match the ground truth boxes are identified through IoU calculations. Since the labels are in the real domain, we convert the outputs to real values for loss computation.
The overall loss function is a composite of three distinct terms: a classification loss, a distributed focal loss, and a specially designed loss for rotated bounding boxes. The classification loss is binary cross-entropy with logits (BCEWithLogits), which is computed as follows:
L c l s = 1 N i = 1 N ( ( T i     l o g ( s i g m o i d ( P i ) ) + ( ( 1 T i )     l o g ( 1 s i g m o i d ( P i ) ) )
where T i denotes the i-th ground truth value, P i represents the i-th predicted value, and s i g m o i d indicates the sigmoid activation function.
For the rotated bounding box loss, we adopt probabilistic IoU (ProbIoU) [59], following the implementation in YOLOv8. Different from the commonly used complete IoU (CIoU) [60], ProbIoU effectively mitigates the boundary discontinuity issue by framing the predicted bounding boxes and ground truth as probability distributions. It assesses the overlap between these distributions, which serves as a metric for the congruence between the estimated and actual bounding boxes. To calculate ProbIoU, it is essential to transform both the truth box and the prediction box into a Gaussian distribution ( μ , Σ ) , in which μ represents the mean vector and Σ denotes the covariance. R θ is a two-dimensional rotation matrix, related to the rotation angle θ , and a and b are the variances after decorrelation, which can be calculated by converting a rotating box to a horizontal box. The coordinates of the center point of the rotated box x 0 , y 0 can be determined as
Σ = a c c b = R θ a 0 0 b R θ T = a c o s 2 θ + b s i n 2 θ 1 2 ( a b ) s i n 2 θ 1 2 ( a b ) s i n 2 θ a s i n 2 θ + b c o s 2 θ
μ = ( x 0 , y 0 ) T
By designing a network that returns three parameters ( a , b , θ ) , ProbIoU enhances the shape and orientation of the rotated box to more closely resemble the ground truth. The Bhattacharyya coefficient ( B C ) is then calculated to assess the similarity between the two Gaussian distributions, which can be defined as
B C ( P b o x , T b o x ) = R 2 P b o x ( x ) T b o x ( x ) d x
where P b o x and T b o x represent the probability density function of the prediction bounding box (PB) and the ground truth (GT), both conforming to a Gaussian distribution. Hellinger distance ( H D ) is then used to quantify the divergence between these two Gaussian distributions, aiming to maximize the overlap between PB and GT. The calculations for Hellinger distance, ProbIoU, and ProbIoU loss are as follows:
H D P b o x , T b o x = 1 B C ( P b o x , T b o x )
P r o b I o U = 1 H D P b o x , T b o x
L p r o b = L 1 ( P , T ) = 1 P r o b I o U ( P , T )   ϵ [ 0,1 ] L 2 ( P , T ) = l n ( 1 L 1 2 ( P , T ) ) ϵ [ 0 , ]
When the Gaussian distributions are significantly separated, L 1 can yield values close to 1, potentially leading to minimal gradients and consequently slow convergence. In contrast, L 2 avoids this issue but is geometrically disconnected from the IoU. Thus, ProbIoU suggests an initial phase with the L 2 loss followed by a transition to the L 1 loss to optimize performance.
Building upon ProbIoU, we maintain the utilization of distribution focal loss (DFL) [61], which reformulates bounding box coordinate regression as discrete probability distribution estimation. Unlike conventional focal loss functions, DFL concentrates on both the differentiation of positive and negative samples. By aligning the model’s predicted probability distribution with the actual target distribution, DFL diminishes the model’s learning uncertainty, enhancing its practical recognition accuracy. The DFL loss is formulated as follows, where S i is the model’s predicted probability for the true class T :
L d f l = ( ( T i + 1 T ) l o g ( S i ) + ( T T i ) l o g ( S i + 1 ) )
The final loss function is a weighted combination of the classification loss component, the distributed focus loss component, and the ProbIoU loss component:
L o s s = λ 1 L c l s + λ 2 L d f l + λ 3 L p r o b
with λ 1 , λ 2 , and λ 3 being the weight coefficients for the respective losses. Moreover, a task-aligned assigner [62] is employed to refine model performance. It dynamically adjusts the sampling strategy to ensure that both classification and localization losses are concurrently considered.

2.6. Complex-Valued Data Augmentation

To enhance detection performance, we introduce data augmentation strategies tailored for complex-valued SAR data. Addressing the limitations of traditional enhancement techniques, we’ve adapted several standard methods to the complex domain, encompassing complex-valued Gaussian filtering (CGaussian), complex-valued Mosaic data augmentation (CMosaic), and complex-valued mixed sample data augmentation (CMixUp).

2.6.1. Complex-Valued Gaussian Filter

Gaussian filtering is a prevalent image smoothing technique employed to mitigate noise and reduce high-frequency image details. Histogram analysis was conducted on the images within the CSRSDD dataset, as presented in Figure 7. The observed distribution suggests that noise present in SAR images can be approximated by a Gaussian distribution. Thus, we aimed to extend Gaussian filtering into the complex domain to effectively remove noise. For a real-valued signal f ( x ) , in which σ represents the standard deviation, a Gaussian filter can be represented as follows:
G ( x ) = 1 σ 2 π e x p ( x 2 2 σ 2 )
For complex-valued signals, standard deviation computation is intricate; thus, we preserved the phase θ while applying Gaussian filtering to the amplitude r . The procedure is as follows:
C G ( z ) = G ( r ) c o s θ + G ( r ) i s i n θ = G ( r ) e i θ
Utilizing the CGaussian filter, we could suppress noise in SAR images, as shown in Figure 8, while also maintaining and enhancing the phase information present in the imagery.

2.6.2. CMosaic and CMixUp

We extended Mosaic, a prevalent technique in YOLOv8, to the complex domain. By stitching together different complex-valued images, CMosaic forges new samples. This approach exposes the model to a diverse range of backgrounds, scales, and intensities, thus improving the diversity of training data samples. This, in turn, significantly reduces the propensity for overfitting and enhances the model’s robustness.
Similarly, mixed sample data augmentation (MixUp), another data augmentation method utilized in YOLOv8, was adapted for complex-valued images. CMixUp generates augmented training samples by linearly interpolating pairs of images and their corresponding labels. This technique introduces model priors, improving the model’s generalization capability. Specifically, for two given complex-valued images x i , x j and their labels y i , y j , the generation of new images x ~ and labels y ~ is conducted as follows:
x ~ = λ x i + ( 1 λ ) x j
y ~ = λ y i + ( 1 λ ) y j
where λ is a mixture coefficient randomly sampled from the beta distribution in the interval [0, 1]. In this paper, we employed complex-valued Gaussian filtering, CMosaic, and CMixUp to enhance the training samples.
Analogously, data augmentation techniques commonly employed in optical images can be extended to the complex domain for the enhancement of complex-valued images, provided that the integrity of the complex-valued data representation is preserved during the extension process.

3. Results

In this section, we present a series of experiments designed to validate the efficacy of our proposed CV-YOLO. Initially, we provide a detailed configuration of the experimental environment and articulate the metrics for evaluation. Subsequently, we compare our proposed network with RV networks of the same architecture and with same number of scalar parameters. Further, we compare our approach against several prevalent SAR ship detection algorithms to evaluate its relative standing. Finally, we conduct ablation experiments to evaluate the effects of complex-valued modules and amplitude-only or phase-only input on network performance.

3.1. Implementation Details

All experiments reported in this paper were conducted on a system running Ubuntu 18.04. The hardware configuration comprised a Quadro RTX 8000 GPU (40GB RAM), an AMD Ryzen 9 3950X CPU, and 64 GB of system RAM. The software environment included PyTorch 1.9.0, CUDA 11.1, and Python 3.8.

3.2. Performance Metrics

In evaluating the effectiveness of CV-YOLO, three key performance indicators were introduced: precision (R), recall (R), and average precision (AP).
P and R are calculated as follows:
p r e c i s i o n = N T P N T P + N F P
r e c a l l = N T P N T P + N F N
where N T P , N F P , and N F N represent the number of true positives (TPs), false positives (FPs), and false negatives (FNs), respectively. If the IoU between the PB and the GT exceeds the predefined threshold, the detection is typically labeled as a TP. In cases where the IoU falls short of this threshold, the detection is classified as an FP, signifying a false alarm. Conversely, when a ground truth bounding box lacks a corresponding detection, it is deemed an FN, representing a missed ship. At each threshold, precision and recall are computed, yielding a precision–recall curve (PRC). The average precision (AP) metric is defined as the area under the PRC, as detailed below:
A P = 0 1 P ( R ) d R
For a comprehensive performance assessment, evaluation metrics adopted from the Microsoft Common Objects in Context (MSCOCO) dataset were utilized, encompassing AP50 and AP50:95, where AP50:95 represents the mean of ten IoU thresholds spanning from 0.5 to 0.95 in steps of 0.05, and AP50 is determined at an IoU threshold of 0.5.

3.3. Comparison with Real-Valued Networks Without Data Augmentation

3.3.1. Comparison with Real-Valued Networks of the Same Structure Without Data Augmentation

To verify the effectiveness of CV-YOLO, we compared it with an RV network of the same structure without data augmentation. The experiments were carried out under the unified framework Ultralytics 8.1.30, and the parameters involved are shown in Table 2. Cls, Dfl, and Box represent the weights of three types of loss functions. Detailed comparative results are reported in Table 3. YOLOBase stands for YOLOv8 without data augmentation, and CVBase stands for CV-YOLO without data augmentation. In this experiment, all models were trained from scratch without a pre-training weight.
Table 3 illustrates the comparative performance of CVBase and YOLOBase. Without data enhancement, CVBase outperformed YOLOBase by 4% in AP50, 0.7% in AP50:95, and 3.6% in precision (P) and 10.9% in precision (P), although it lagged by 3.8% in recall (R). However, CVBase has twice the number of parameters and four times the FLOPs of YOLOBase.
The results indicate that a CV network with an equivalent structure offers superior detection performance over an RV network which relies solely on amplitude information. In contrast, the CV network simultaneously takes into account phase information as well as amplitude. However, as indicated by Formula (1), since complex-valued convolution is equivalently implemented through real-valued convolution, a doubling of parameters and a quadrupling of computational complexity are unavoidable. The combination of real and imaginary parts means that complex-valued numbers inherently contain more information than RV numbers. Nevertheless, the weights for the real and imaginary parts in a CV network are not independent and, strictly speaking, cannot be equated to twice the weights of an RV network. But we continue to use the parameter calculation method from RV networks for the sake of clarity.
Table 4 reveals that due to the dataset’s category imbalance, the detection performance for certain ship types is poor. Notably, the AP for Ship4 across all four models is zero, leading to initial suspicions that this ship type might be absent from the test dataset. However, a count of ship instances, detailed in Table 1, confirms the presence of a Ship4 sample in the test dataset.
In terms of detecting different types of ships, we observed that CVBase was weaker only on Ship3, Light_boat, and Other, but outperformed the RV network for the other seven ship types. This indicates that CV networks excel in the detection of large ships. Upon analysis, we found that various angles on the decks generate reflections and scattering information, which are embedded in the phase data. Since CV networks simultaneously consider both the amplitude and phase information in complex-valued data, they extract more effective features and achieve superior detection performance.

3.3.2. Comparison with Real-Valued Networks of the Same Number of Scalar Parameters Without Data Augmentation

As previously mentioned, the number of parameters in a CV network is twice that of a comparable RV one. Therefore, the observed performance improvement could be attributed to the increase in parameters rather than the use of complex-valued processing. To address this, we constructed a RV network equivalent to YOLOBase in depth, with parameters approximately doubled. We called this network DoubleBase. Specifically, the scale settings were [0.33, 0.25, 1024] for YOLOBase and [0.33, 0.36, 1024] for DoubleBase, maintaining consistency in depth and max channels, with a width increase of 0.11. The experimental results for DoubleBase on the CSRSDD dataset are shown in Table 3 and Table 4.
The results in Table 3 show that CVBase outperformed DoubleBase, with an improvement of 1.4% in AP50, 0.5% in AP50:59, 13.2% in P, and 5% in R, while using 0.297M fewer parameters. Additionally, DoubleBase improved AP50 by 2.6%, AP50:59 by 0.2%, and R by 5% compared to YOLOBase, although P decreased by 9.6%, while the number of parameters increased by 3.367M. Increasing the parameters in DoubleBase resulted in a noticeable improvement in detection performance. However, compared to CVBase, DoubleBase still fell slightly short. This indicates that the performance improvement of CV network was not due to an increase in parameters, but rather the incorporation of amplitude and phase information through complex-valued processing.
From Table 4, it is evident that DoubleBase outperformed YOLOBase in detecting the Ship5, Ship6, Ship7, and Other categories. However, it shows declines in performance for the Ship1, Ship3, Light_boat, and Cargo categories. On the other hand, CVBase demonstrated considerable improvements over DoubleBase in the Ship1, Ship6, Light_boat, and Cargo categories. Meanwhile, CVBase experienced declines in performance for the Ship3, Ship5, Ship7, and Other classes. Overall, CVBase demonstrated the best detection performance.
The results from Table 3 and Table 4 suggest that in conditions with both amplitude and phase information, all convolutional networks can enhance their detection performance by transitioning to CV networks, despite the increase in parameters and computational cost.

3.4. Comparison with Representative SAR Ship Detection Networks with Data Augmentation

As is well known, RV networks can also improve performance through various data augmentation techniques, whereas the field of complex-valued data augmentation is nearly non-existent. Therefore, we propose a complex-valued SAR ship detection network with complex-valued data augmentation, CV-YOLO. To further showcase the capabilities of CV-YOLO, we conducted comparative evaluations with several representative detection algorithms within SAR ship detection domain.
These included the Refined Rotation RetinaNet (R3Det) [63], Region of Interest Transformer (ROI-Transformer) [64], Rotated Faster R-CNN [65], Oriented R-CNN [66], Rotated Representative Points (Rotated RepPoints) [67], Single-Shot Alignment Network (S2ANet) [68] and YOLOv8n. R3Det, a one-stage detector, incorporates a feature refinement module (FRM) and excels in detecting objects with large aspect ratios and dense arrangements. S2ANet, another one-stage detector, achieves oriented target detection through a feature alignment module. Rotated RepPoints offers an anchor-free directional detection approach. The RoI-Transformer module, integrated into a light-head R-CNN, minimizes the generation of excessive detection boxes. Rotated Faster R-CNN builds upon the strengths of Faster R-CNN [69], addressing its high computational and structural complexity. Oriented R-CNN, an extension of traditional R-CNN [70], introduces an oriented region proposal network (RPN) and oriented detection headers. Both ROI-Transformer and Rotated Faster R-CNN are two-stage algorithms. The comparative experiment was executed on MMRotate 0.3.4, with the results summarized in Table 5.
Among the seven algorithms, the one-stage detectors R3Det and S2ANet showed relatively poor performance. The Rotated RepPoints detector, which forgoes the use of anchor frames, achieved a more promising performance. Two two-stage detectors, ROI-Transformer and Rotated Faster R-CNN, exhibited average performance. Oriented-RCNN, also a two-stage detector, did not reach the optimal performance level. As for one-stage and anchor-free detectors, YOLOv8 achieved the best results in P and was second only to CV-YOLO in R and mAP, while CV-YOLO demonstrated superior performance across R and mAP.
Next, we compare the best and second-best detectors through confusion matrices and model inferences, to present the detection efficacy of our proposed CV-YOLO in an intuitive and accessible manner.
To graphically illustrate the proficiency of our method in classifying target classes, we present the confusion matrices for both YOLOv8 and CV-YOLO in Figure 9. From left to right, the columns correspond to ship1–7, light_boat, cargo, other, and background, respectively, and the rows correspond to predicted ships. The diagonal entries indicate the proportion of correctly identified classes, whereas the off-diagonal entries signify the proportion of misclassified instances. It is clear that in the detection tasks across various ship categories, CV-YOLO demonstrates significantly superior performance compared to YOLOv8.
Figure 10 illustrates the comparative detection capabilities of CV-YOLO and YOLOv8 on images. The leftmost column represents the ground truth, the middle column displays YOLOv8’s predictions, and the right column shows CV-YOLO’s predictions. It is apparent that in the first row, indicated by the red circle, YOLOv8 failed to detect a ship target. In the middle row, highlighted in blue, YOLOv8 identified an extra, non-existent ship target. In the third row, enclosed in yellow, YOLOv8 incorrectly detected two additional ships, whereas CV-YOLO provided an accurate prediction.
Upon reviewing the aggregate experimental outcomes, the effectiveness and robustness of our proposed CV-YOLO for SAR ship detection were confirmed.

3.5. Ablation Analysis

In this section, we focus on examining the effect of complex-valued modules, as well as the performance of CV-YOLO compared to YOLOv8 when only amplitude is available, only phase is available, and both amplitude and phase coexist. To guarantee the rigor and accuracy of the experimental results, all networks involved in the comparison employed identical experimental parameters as in Table 2.

3.5.1. The Impact of Complex-Valued Modules

To support complex-valued operations in CV-YOLO, we designed several complex-valued modules, including CCBA, CC2f, CSPPF, CUpsampling, and CDetect. Due to the interconnected nature of these modules within the CVCNN, isolating the impact of each individual module is impractical. Therefore, we focused on evaluating the influence of three major components—CBackbone, CNeck, and CHead—on the network’s overall performance. These components are themselves composed of the aforementioned modules.
Given that converting complex-valued data to real-valued data introduces irreversible information loss, we adopted a systematic replacement strategy. First, we replaced the CHead with a real-valued head (RH), creating a configuration we denote as CB + CN + RH. This involved converting the integrated complex-valued features to the real domain using an absolute value (Abs) layer after the CNeck. Subsequently, we replaced both the CNeck and CHead with their real-valued counterparts (RN), converting features after the CBackbone. We refer to this architecture as CB + RN + RH. Finally, we compared YOLOv8, which represents the fully real-valued configuration (RB + RN + RH), and our complete complex-valued CV-YOLO (CB + CN + CH). The experimental results are summarized in Table 6.
The results presented in Table 6 demonstrate a clear trend. Specifically, the CB+RN+RH configuration outperformed YOLOv8 by 1.5% in AP50 and 1.7% in AP50:95, suggesting that the CBackbone effectively extracts richer complex-valued features, which significantly enhances detection performance. Furthermore, the CB + CN + RH combination yielded an additional improvement of 2.3% in AP50 and 1.7% in AP50:95 compared to CB + RN + RH. This indicates that the CNeck further integrates these complex-valued features, leading to a continued boost in detection accuracy. Finally, CV-YOLO achieved gains of 1.9% in AP50 and 0.3% in AP50:95 over CB + CN + RH, demonstrating the CHead’s ability to refine the integrated complex-valued features and substantially improve overall detection performance. These experimental findings strongly support the effectiveness of the proposed complex-valued modules.

3.5.2. The Impact of Complex-Valued Data Augmentation

To rigorously evaluate the contribution of each proposed complex-valued data augmentation technique, we conducted a series of ablation studies focusing on CGaussian, CMosaic, and CMixUp. The results of these experiments are summarized in Table 7.
Table 7 shows that with complex-valued Gaussian filtering (CGaussian) applied to the complex-valued network (CVBase), AP50 increased by 1.1%. With CMixup applied, AP50 increased by 2.2%, while application of CMosaic enhanced AP50 by 10.4%. The combination of CGaussian and CMixup improved AP50 by 3%, the combination of CGaussian and CMosaic improved AP50 by 13.6%, the combination of CMixup and CMosaic enhanced AP50 by 14.7%, and the combination of all three techniques further increased AP50 by 15.7%.
The experimental results reveal that CMosaic had the most significant impact on performance, which can be attributed to its ability to process four times the input data compared to the other augmentation techniques. This echoes the benefits observed with real-valued Mosaic, where increased data diversity leads to improved model generalization. CMixup demonstrated a moderate but noticeable improvement, while CGaussian exhibited the smallest contribution, yet still positively influenced the overall detection performance of the CV network. These findings provide compelling evidence for the validity and potential of the proposed complex-valued data augmentation methods.

3.5.3. The Impact of Amplitude and Phase

To further validate the efficiency of CV networks, we compared the detection performance of YOLOv8 and CV-YOLO under conditions where only amplitude is input, only phase is input, and both amplitude and phase are input simultaneously. For YOLOv8, when both amplitude and phase are input simultaneously, the real part was used as the R channel, the imaginary part as the G channel, and a zero matrix as the B channel, input as a three-channel (RGB) image to the network. For CV-YOLO, the real-valued amplitude and phase were input as complex-valued data for amplitude-only and phase-only, respectively. The experimental results are shown in Table 8.
We found that, for amplitude-only, CV-YOLO improved AP50 by 0.1% and AP50:95 by 1.1% compared to YOLOv8. This suggests that, with amplitude-only input, the imaginary part in CV-YOLO becomes ineffective, effectively reducing it to YOLOv8, with both models showing comparable recognition performance. For phase-only, CV-YOLO outperformed YOLOv8 by 24.4% on AP50 and by 18.5% on AP50:95. This indicates that the phase contains rich features from which both CV-YOLO and YOLOv8 can extract some detection information. Phase images, which appear as gibberish to human eyes, are limited in the information they can provide to YOLOv8, a vision-based model. In contrast, CV-YOLO, designed for complex-valued images, can achieve decent detection results solely from phase information. When both amplitude and phase are input simultaneously, YOLOv8 improved AP50 by 0.8% and AP50:95 by 0.7% compared to amplitude-only, while CV-YOLO improved AP50 by 4.9% and AP50:95 by 3% over YOLOv8. The inclusion of phase information provided a modest enhancement to YOLOv8’s detection performance. However, the gap compared to CV-YOLO remained significant.
This illustrates that phase, in addition to amplitude, holds significant discriminative information. However, the RV network showed insensitivity to phase. We hypothesize that this could be due to real-valued convolution kernels’ limited ability to perform only linear combinations of the input’s phase, thereby failing to directly model phase rotation operations. Furthermore, while real-valued activation functions typically process information in a nonlinear manner, they often disrupt the phase’s periodicity and continuity. In contrast, CV networks use complex-valued convolutions that simultaneously enable amplitude scaling and phase rotation on inputs. They also incorporate complex-valued activation functions that allow for the processing of amplitudes while preserving or smoothly transitioning the phases.
The superiority of CV networks stems from their mathematical operations in the complex domain: complex-valued convolution inherently couples amplitude scaling with phase rotation, which mathematically aligns with the propagation characteristics of electromagnetic waves. Moreover, Wirtinger Calculus-based back propagation enables joint optimization of both amplitude and phase, significantly enhancing the network’s sensitivity to phase.

4. Discussion

The experimental results in Section 3 demonstrate several significant findings: First, in the absence of data augmentation, the CV network consistently outperformed its real-valued counterpart, achieving superior detection performance compared to both structurally equivalent and same number of scalar parameters RV networks. We attribute this performance enhancement not only to the incorporation of phase information but also to the CV network’s ability to preserve the intrinsic one-to-one correspondence between amplitude and phase in the original complex-valued data. Second, with the implementation of our proposed complex-valued data augmentation techniques, the CV network exhibited remarkable performance advantages over seven state-of-the-art ship detection networks, which effectively validates the efficacy of our novel augmentation approach. Furthermore, a series of comprehensive ablation studies provided robust empirical evidence supporting the individual effectiveness of both our CV network and the data augmentation method. These findings collectively suggest that, when equipped with our complex-valued data augmentation, existing real-valued detection networks may potentially achieve enhanced performance through their transformation into complex-valued variants.

5. Conclusions

Traditional SAR ship detection methods often disregard phase information, inadvertently sacrificing beneficial detection cues. Therefore, we proposed a new complex-valued convolutional neural network based on complex-valued data enhancement for SAR ship detection, which considers both amplitude and phase information. This method excelled in identifying ships in challenging scenarios like nearshore. Using the CSRSDD dataset, we compared the CVBase (CV-YOLO without data augmentation) with the structurally similar YOLOBase, noting a 4% increase in AP50. Compared to the RV network DoubleBase with the same scaler number of parameters, the AP50 increased by 1.4%. This indicates that the superior performance of the CV network was not merely due to an increase in parameters, but also the incorporation of phase information and the handling of complex values. In addition, we introduced some complex-valued SAR data augmentation strategies. In comparative evaluations with other established ship detection methods such as YOLOv8, R3Det, Oriented-RCNN, Rotated RepPoints, and S2ANet, CV-YOLO integrated with our enhancement strategy, achieved 71.3% in AP50, highlighting its superiority.
We have innovatively proposed a complex-valued ship detection network based on HH single-polarization SAR data, achieving the first breakthrough in applying complex-valued convolutional neural networks to object detection tasks. At present, CV networks have demonstrated remarkable success in fully polarimetric SAR image classification, and we believe their extension to detection tasks is feasible, though challenges remain—primarily the lack of publicly available complex-valued fully polarimetric detection datasets. In addition, we recognize that CV networks face issues like high computational demand and lengthy training times in resource-constrained settings. Thus, we will focus on the lightweight design of complex-valued models in our future work.

Author Contributions

Conceptualization, Y.W. and Z.Z.; methodology, D.Z. and Z.Z.; software, D.Z. and D.L.; validation, D.Z.; formal analysis, X.Q.; investigation, D.Z.; resources, Z.Z.; data curation, D.Z.; writing—original draft preparation, D.Z.; writing—review and editing, D.Z.; visualization, D.Z.; supervision, H.L. and W.L.; project administration, Z.Z.; funding acquisition, Z.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by National Key R&D Program of China under grant 2023YFB3904900.

Data Availability Statement

The dataset is subject to restricted access and cannot be publicly disclosed due to data privacy and confidentiality agreements.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

References

  1. Moreira, A.; Prats-Iraola, P.; Younis, M.; Krieger, G.; Hajnsek, I.; Papathanassiou, K.P. A tutorial on synthetic aperture radar. IEEE Geosci. Remote Sens. Mag. 2013, 1, 6–43. [Google Scholar] [CrossRef]
  2. Li, J.; Xu, C.; Su, H.; Gao, L.; Wang, T. Deep learning for SAR ship detection: Past, present and future. Remote Sens. 2022, 14, 2712. [Google Scholar] [CrossRef]
  3. Hou, B.; Chen, X.; Jiao, L. Multilayer CFAR Detection of Ship Targets in Very High Resolution SAR Images. IEEE Geosci. Remote Sens. Lett. 2015, 12, 811–815. [Google Scholar]
  4. Tao, D.; Anfinsen, S.N.; Brekke, C. Robust CFAR Detector Based on Truncated Statistics in Multiple-Target Situations. IEEE Trans. Geosci. Remote Sens. 2016, 54, 117–134. [Google Scholar] [CrossRef]
  5. Akbari, V.; Yazdi, M. Edge Detection in SAR Images Using Wavelet Transform. In Proceedings of the International Conference on Computer and Communication Engineering, Kuala Lumpur, Malaysia, 13–15 May 2008; pp. 749–753. [Google Scholar]
  6. Zhu, X.X.; Bamler, R. Very High Resolution Spaceborne SAR Tomography in Urban Environment. IEEE Trans. Geosci. Remote Sens. 2010, 48, 4296–4308. [Google Scholar] [CrossRef]
  7. Chen, J.; Zhang, H.; Raney, R.K.; Lang, R.H. Segmentation-based ship detection using polarimetric SAR. IEEE Trans. Geosci. Remote Sens. 2003, 41, 697–705. [Google Scholar]
  8. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 779–788. [Google Scholar]
  9. Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6517–6525. [Google Scholar]
  10. Redmon, J.; Farhadi, A. YOLOv3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
  11. Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
  12. Wang, J.F.; Chen, Y.; Dong, Z.K.; Gao, M.Y. Improved YOLOv5 network for real-time multi-scale traffic sign detection. Neural Comput. Appl. 2023, 35, 7853–7865. [Google Scholar] [CrossRef]
  13. Li, C.Y.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar]
  14. Wang, C.Y.; Bochkovskiy, A.; Liao, H.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
  15. Tang, G.; Zhuge, Y.; Claramunt, C.; Men, S. N-YOLO: A SAR Ship Detection Using Noise-Classifying and Complete-Target Extraction. Remote Sens. 2021, 13, 871. [Google Scholar] [CrossRef]
  16. Zhou, K.; Zhang, M.; Wang, H.; Tan, J. Ship Detection in SAR Images Based on Multi-Scale Feature Extraction and Adaptive Feature Fusion. Remote Sens. 2022, 14, 755. [Google Scholar] [CrossRef]
  17. Dai, J.; Li, Y.; He, K.; Sun, J. R-fcn: Object detection via region-based fully convolutional networks. Adv. Neural Inf. Process. Syst. 2016, 29, 379–387. [Google Scholar]
  18. Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 936–944. [Google Scholar]
  19. Cai, Z.; Vasconcelos, N. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6154–6162. [Google Scholar]
  20. He, K.M.; Gkioxari, G.; Dollár, P.; Girshick, R.B. Mask R-CNN. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 386–397. [Google Scholar] [CrossRef] [PubMed]
  21. Kang, M.; Leng, X.; Lin, Z.; Ji, K. A modified faster r-CNN based on CFAR algorithm for SAR ship detection. In Proceedings of the International Workshop on Remote Sensing with Intelligent Processing, Shanghai, China, 18–21 May 2017; pp. 1–4. [Google Scholar]
  22. Cui, Z.; Li, Q.; Cao, Z.; Liu, N. Dense attention pyramid networks for multi-scale ship detection in SAR images. IEEE Trans. Geosci. Remote Sens. 2019, 57, 8983–8997. [Google Scholar] [CrossRef]
  23. Law, H.; Deng, J. CornerNet: Detecting objects as paired keypoints. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 734–750. [Google Scholar]
  24. Zhou, X.; Zhuo, J.; Krähenbuhl, P. Bottom-Up Object Detection by Grouping Extreme and Center Points. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 July 2019; pp. 850–859. [Google Scholar]
  25. Duan, D.; Bai, S.; Xie, L.; Qi, H.; Huang, Q.; Tian, Q. CenterNet: Keypoint triplets for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6569–6578. [Google Scholar]
  26. Zhou, X.; Wang, D.; Krähenbuhl, P. Objects as Points. arXiv 2019, arXiv:1904.07850. [Google Scholar]
  27. Tian, Z.; Shen, C.; Chen, H.; He, T. FCOS: Fully Convolutional One-Stage Object Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9627–9636. [Google Scholar]
  28. Kong, T.; Sun, F.; Liu, H.; Jiang, Y.; Shi, J. FoveaBox: Beyond anchor-based object detector. IEEE Trans. Image Process. 2020, 29, 7389–7398. [Google Scholar] [CrossRef]
  29. Wang, X.; Cui, Z.; Cao, Z.; Dang, S. Dense Docked Ship Detection via Spatial Group-Wise Enhance Attention in SAR Images. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, Waikoloa, HI, USA, 26 September–2 October 2020; pp. 1244–1247. [Google Scholar]
  30. Yang, S.; An, W.; Li, S.; Wei, G.; Zou, B. An improved FCOS method for ship detection in SAR images. IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens. 2022, 15, 8910–8927. [Google Scholar] [CrossRef]
  31. Cao, Q.; Chen, H.; Wang, S.; Wang, Y.; Fu, H.; Chen, Z.; Liang, F. LH-YOLO: A Lightweight and High-Precision SAR Ship Detection Model Based on the Improved YOLOv8n. Remote Sens. 2024, 16, 4340. [Google Scholar] [CrossRef]
  32. Gong, W.; Shi, Z.; Wu, Z.; Luo, J. Arbitrary-oriented ship detection via feature fusion and visual attention for high-resolution optical remote sensing imagery. Int. J. Remote Sens. 2021, 42, 2622–2640. [Google Scholar] [CrossRef]
  33. An, Q.; Pan, Z.; You, H.; Hu, Y. Transitive Transfer Learning Based Anchor Free Rotatable Detector for SAR Target Detection With Few Samples. IEEE Access 2021, 9, 24011–24025. [Google Scholar] [CrossRef]
  34. Chen, C.; He, C.; Hu, C.; Pei, H.; Jiao, L. MSARN: A Deep Neural Network Based on an Adaptive Recalibration Mechanism for Multiscale and Arbitrary-oriented SAR Ship Detection. IEEE Access 2019, 7, 159262–159283. [Google Scholar] [CrossRef]
  35. Chen, B.; Xue, F.; Song, H. A Lightweight Arbitrarily Oriented Detector Based on Transformers and Deformable Features for Ship Detection in SAR Images. Remote Sens. 2024, 16, 237. [Google Scholar] [CrossRef]
  36. Yang, Z.; Xia, X.; Liu, Y.; Wen, G.; Zhang, W.; Guo, L. LPST-Det: Local-Perception-Enhanced Swin Transformer for SAR Ship Detection. Remote Sens. 2024, 16, 483. [Google Scholar] [CrossRef]
  37. Huang, Y.; Wang, D.; Wu, B.; An, D. NST-YOLO11: ViT Merged Model with Neuron Attention for Arbitrary-Oriented Ship Detection in SAR Images. Remote Sens. 2024, 16, 4760. [Google Scholar] [CrossRef]
  38. Tanaka, G. Complex-Valued Neural Networks: Advances and Applications [Book Review]. IEEE Comput. Intell. Mag. 2013, 8, 77–79. [Google Scholar] [CrossRef]
  39. Barrachina, J.A.; Ren, C.; Vieillard, G.; Morisseau, C.; Ovarlez, J.P. Theory and implementation of complex-valued neural networks. arXiv 2023, arXiv:2302.08286. [Google Scholar]
  40. Wirtinger, W. On the formal theory of functions of several complex variables. Math. Ann. 1927, 97, 357–375. [Google Scholar] [CrossRef]
  41. Widrow, B.; McCool, J.; Ball, M. The complex lms algorithm. Proc. IEEE 1975, 63, 719–720. [Google Scholar] [CrossRef]
  42. Brandwood, D.H. A complex gradient operator and its application in adaptive array theory. IEE Proc. H Microwaves Opt. Antennas 1983, 130, 11–16. [Google Scholar] [CrossRef]
  43. Zhang, Z.; Wang, H.; Xu, F.; Jin, Y.Q. Complex-Valued Convolutional Neural Network and Its Application in Polarimetric SAR Image Classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 7177–7188. [Google Scholar] [CrossRef]
  44. Hua, Q.; Zhang, Y.; Li, H.; Jiang, Y.; Xu, D. Refocusing on SAR ship targets with three-dimensional rotating based on complex-valued convolutional gated recurrent unit. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
  45. Hua, Q.; Zhang, Y.; Wei, C.; Ji, Z. CV-RotNet: Complex-Valued Convolutional Neural Network for SAR three-dimensional rotating ship target recognition. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 17–22 July 2022; pp. 3552–3555. [Google Scholar]
  46. Zhu, Y.; Li, T.; Peng, D.; Wang, H.; Shi, S. A Novel SAR Automatic Target Recognition Method Based on Fully Complex-Valued Networks. IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens. 2023, 16, 6160–6171. [Google Scholar] [CrossRef]
  47. Yu, L.; Hu, Y.; Xie, X.; Wang, L.; Liu, R.; Hao, Y. Complex-Valued Full Convolutional Neural Network for SAR Target Classification. IEEE Geosci. Remote Sens. Lett. 2019, 16, 1374–1378. [Google Scholar] [CrossRef]
  48. Fang, C.; Song, Y.; Guan, F.; Liang, F.; Yang, L. A Robust Complex-Valued Deep Neural Network for Target Recognition of UAV SAR Imagery. IEEE J. Miniatur. Air Space Syst. 2023, 4, 175–185. [Google Scholar] [CrossRef]
  49. Zhou, X.; Luo, C.; Ren, P.; Zhang, B. Multiscale Complex-Valued Feature Attention Convolutional Neural Network for SAR Automatic Target Recognition. IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens. 2024, 17, 2052–2066. [Google Scholar] [CrossRef]
  50. Wang, R.; Wang, Z.; Chen, Y.; Kang, H.; Luo, F.; Liu, Y. Target Recognition in SAR Images Using Complex-Valued Network Guided with Sub-Aperture Decomposition. Remote Sens. 2023, 15, 4031. [Google Scholar] [CrossRef]
  51. Lei, S.; Qiu, X.; Ding, C.; Lei, S. A Feature Enhancement Method Based on the Sub-Aperture Decomposition for Rotating Frame Ship Detection in SAR Images. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, Brussels, Belgium, 11–16 July 2021; pp. 3573–3576. [Google Scholar]
  52. Trabelsi, C.; Bilaniuk, O.; Zhang, Y.; Serdyuk, D.; Subramanian, S.; Santos, J.F.; Mehri, S.; Rostamzadeh, N.; Bengio, Y.; Pal, C.J. Deep complex networks. arXiv 2018, arXiv:1705.09792. [Google Scholar]
  53. Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 448–456. [Google Scholar]
  54. Xiao, C.; Yang, S.; Feng, Z. Complex-Valued Depthwise Separable Convolutional Neural Network for Automatic Modulation Classification. IEEE Trans. Instrum. Meas. 2023, 72, 2522310. [Google Scholar] [CrossRef]
  55. Mohammadi Asiyabi, R.; Datcu, M.; Anghel, A.; Nies, H. Complex-Valued End-to-End Deep Network with Coherency Preservation for Complex-Valued SAR Data Reconstruction and Classification. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5206417. [Google Scholar] [CrossRef]
  56. Lei, Z.; Gao, S.; Hasegawa, H.; Zhang, Z.; Zhou, M.; Sedraoui, K. Fully Complex-Valued Gated Recurrent Neural Network for Ultrasound Imaging. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 14918–14931. [Google Scholar] [CrossRef] [PubMed]
  57. Zhao, D.; Zhang, Z.; Lu, D.; Kang, J.; Qiu, X.; Wu, Y. CVGG-Net: Ship Recognition for SAR Images Based on Complex-Valued Convolutional Neural Network. IEEE Geosci. Remote Sens. Lett. 2023, 20, 4010805. [Google Scholar] [CrossRef]
  58. He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef] [PubMed]
  59. Llerena, J.M.; Zeni, L.F.; Kristen, L.N.; Avila, S. Gaussian bounding boxes and probabilistic intersection-over-union for object detection. arXiv 2021, arXiv:2106.06072. [Google Scholar]
  60. Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Enhancing Geometric Factors in Model Learning and Inference for Object Detection and Instance Segmentation. IEEE Trans. Cybern. 2022, 52, 8574–8586. [Google Scholar] [CrossRef]
  61. Li, X.; Wang, W.; Wu, L.; Chen, S.; Hu, X.; Li, J.; Tang, J.; Yang, J. Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection. Adv. Neural Inf. Process. Syst. 2020, 33, 21002–21012. [Google Scholar]
  62. Feng, C.; Zhong, Y.; Gao, Y.; Scott, M.R.; Huang, W. Tood: Task-aligned one-stage object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 3490–3499. [Google Scholar]
  63. Yang, X.; Yan, J.; Feng, Z.; He, T. R3det: Refined single-stage detector with feature refinement for rotating object. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual Event, 2–9 February 2021; pp. 3163–3171. [Google Scholar]
  64. Ding, J.; Xue, N.; Long, Y.; Xia, G.; Lu, Q. Learning ROI Transformer for Oriented Object Detection in Aerial Images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 2849–2858. [Google Scholar]
  65. Yang, S.; Pei, Z.; Zhou, F.; Wang, G. Rotated Faster R-CNN for Oriented Object Detection in Aerial Images. In Proceedings of the International Conference on Robot Systems and Applications, Dalian, China, 16–18 August 2020; pp. 35–39. [Google Scholar]
  66. Xie, X.; Cheng, G.; Wang, J.; Yao, X.; Han, J. Oriented R-CNN for Object Detection. arXiv 2021, arXiv:2108.05699. [Google Scholar]
  67. Li, W.; Chen, Y.; Hu, K.; Zhu, J. Oriented reppoints for aerial object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 1829–1838. [Google Scholar]
  68. Han, J.; Ding, J.; Li, J.; Xia, G.S. Align Deep Features for Oriented Object Detection. arXiv 2020, arXiv:2008.09397. [Google Scholar] [CrossRef]
  69. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
  70. Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Region-Based Convolutional Networks for Accurate Object Detection and Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 142–158. [Google Scholar] [CrossRef]
Figure 1. Overall structure of CV-YOLO. (a) Complex-valued backbone (CBackbone); (b) complex-valued neck (CNeck); (c) complex-valued head (CHead).
Figure 1. Overall structure of CV-YOLO. (a) Complex-valued backbone (CBackbone); (b) complex-valued neck (CNeck); (c) complex-valued head (CHead).
Remotesensing 17 01478 g001
Figure 2. Composition of CCBA.
Figure 2. Composition of CCBA.
Remotesensing 17 01478 g002
Figure 3. Composition of CC2f.
Figure 3. Composition of CC2f.
Remotesensing 17 01478 g003
Figure 4. Composition of CSPPF.
Figure 4. Composition of CSPPF.
Remotesensing 17 01478 g004
Figure 5. Implementation of complex-valued upsampling.
Figure 5. Implementation of complex-valued upsampling.
Remotesensing 17 01478 g005
Figure 6. Composition of CDetect.
Figure 6. Composition of CDetect.
Remotesensing 17 01478 g006
Figure 7. Effect of complex-valued Gaussian filtering. (a) Amplitude image histogram; (b) real part image histogram; (c) imaginary part image histogram.
Figure 7. Effect of complex-valued Gaussian filtering. (a) Amplitude image histogram; (b) real part image histogram; (c) imaginary part image histogram.
Remotesensing 17 01478 g007
Figure 8. Effect of complex-valued Gaussian filtering. (a) Amplitude of the original image; (b) amplitude after complex-valued Gaussian filtering.
Figure 8. Effect of complex-valued Gaussian filtering. (a) Amplitude of the original image; (b) amplitude after complex-valued Gaussian filtering.
Remotesensing 17 01478 g008
Figure 9. Confusion matrices of YOLOv8 (a) and CV-YOLO (b).
Figure 9. Confusion matrices of YOLOv8 (a) and CV-YOLO (b).
Remotesensing 17 01478 g009
Figure 10. Predictions of CV-YOLO and YOLOv8. (a) Ground truth; (b) predictions of YOLOv8; (c) predictions of CV-YOLO.
Figure 10. Predictions of CV-YOLO and YOLOv8. (a) Ground truth; (b) predictions of YOLOv8; (c) predictions of CV-YOLO.
Remotesensing 17 01478 g010
Table 1. Distribution of ships in the train and test sets.
Table 1. Distribution of ships in the train and test sets.
InstancesShip1Ship2Ship3Ship4Ship5Ship6Ship7Light_boatCargoOtherAll
Train10952921121323603721651538
Test22134129451582340398
Table 2. Experimental parameters on Ultralytics.
Table 2. Experimental parameters on Ultralytics.
ImgsizeBatchsizeEpochsLrOptimizerWeight_decay
102422000.01AdamW0.0005
Table 3. Comparison with real-valued networks with the same structure, and with approximated parameters.
Table 3. Comparison with real-valued networks with the same structure, and with approximated parameters.
MethodP (%)R (%)AP50 (%)AP50:95 (%)Parameters (M)FLOPs (G)
YOLOBase65.741.751.633.43.0848.4
DoubleBase56.147.654.233.66.45117.2
CVBase69.352.655.634.16.15433.1
Table 4. AP50 of three models on 10 types of ships.
Table 4. AP50 of three models on 10 types of ships.
ClassShip1
(%)
Ship2
(%)
Ship3
(%)
Ship4
(%)
Ship5
(%)
Ship6
(%)
Ship7
(%)
Light_boat
(%)
Cargo
(%)
Other
(%)
All
(%)
YOLOBase99.599.570.8029.5031.367.170.747.351.6
DoubleBase66.399.567.1044.64.8186.561.562.249.354.6
CVBase99.599.562.603023.161.164.872.942.355.6
Table 5. Comparison of representative SAR ship detection algorithms.
Table 5. Comparison of representative SAR ship detection algorithms.
MethodP (%)R (%)mAP (%)
R3Det14.726.929.61
S2ANet18.5213.414.83
Rotated RepPoints24.8114.522.76
ROI_transformer20.8315.9417.36
Rotated Faster R-CNN17.8922.2520.01
Oritented_RCNN39.4638.5842.86
YOLOv875.262.265.6
CV-YOLO74.266.771.3
Table 6. Impact of complex-valued modules.
Table 6. Impact of complex-valued modules.
MethodCBackboneCNeckCHeadP (%)R (%)AP50 (%)AP50:95 (%)
YOLOv8×××75.262.265.641.7
CB + RN + RH××60.767.167.143.4
CB + CN + RH×62.767.569.445.1
CV-YOLO74.266.771.345.4
Table 7. Impact of complex-valued data augmentation.
Table 7. Impact of complex-valued data augmentation.
CGaussianCMixupCMosaicP (%)R (%)AP50 (%)AP50:95 (%)
CVBase×××69.352.655.634.1
××7449.456.734.6
××76.45157.835.2
××6560.16641
×76.853.558.636.4
×62.46869.245.4
×65.364.870.344.6
CV-YOLO74.266.771.345.4
Table 8. Impact of amplitude and phase.
Table 8. Impact of amplitude and phase.
MethodsAmplitude-OnlyPhase-OnlyP (%)R (%)AP50 (%)AP50:95 (%)
YOLOv8×75.262.265.641.7
×58.833.536.719.7
70.860.666.442.4
CV-YOLO×77.260.865.742.8
×62.155.661.138.2
74.266.771.345.4
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhao, D.; Zhang, Z.; Lu, D.; Qiu, X.; Li, W.; Li, H.; Wu, Y. CV-YOLO: A Complex-Valued Convolutional Neural Network for Oriented Ship Detection in Single-Polarization Single-Look Complex SAR Images. Remote Sens. 2025, 17, 1478. https://doi.org/10.3390/rs17081478

AMA Style

Zhao D, Zhang Z, Lu D, Qiu X, Li W, Li H, Wu Y. CV-YOLO: A Complex-Valued Convolutional Neural Network for Oriented Ship Detection in Single-Polarization Single-Look Complex SAR Images. Remote Sensing. 2025; 17(8):1478. https://doi.org/10.3390/rs17081478

Chicago/Turabian Style

Zhao, Dandan, Zhe Zhang, Dongdong Lu, Xiaolan Qiu, Wei Li, Hang Li, and Yirong Wu. 2025. "CV-YOLO: A Complex-Valued Convolutional Neural Network for Oriented Ship Detection in Single-Polarization Single-Look Complex SAR Images" Remote Sensing 17, no. 8: 1478. https://doi.org/10.3390/rs17081478

APA Style

Zhao, D., Zhang, Z., Lu, D., Qiu, X., Li, W., Li, H., & Wu, Y. (2025). CV-YOLO: A Complex-Valued Convolutional Neural Network for Oriented Ship Detection in Single-Polarization Single-Look Complex SAR Images. Remote Sensing, 17(8), 1478. https://doi.org/10.3390/rs17081478

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop