Context-Aware DGCN-Based Ship Formation Recognition in Remote Sensing Images

Zhang, Tao; Yang, Xiaogang; Lu, Ruitao; Xie, Xueli; Wang, Siyu; Su, Shuang

doi:10.3390/rs16183435

Open AccessArticle

Context-Aware DGCN-Based Ship Formation Recognition in Remote Sensing Images

by

Tao Zhang

,

Xiaogang Yang

^*,

Ruitao Lu

,

Xueli Xie

,

Siyu Wang

and

Shuang Su

Department of Automation Engineering, Rocket Force University of Engineering, Xi’an 710025, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(18), 3435; https://doi.org/10.3390/rs16183435

Submission received: 1 July 2024 / Revised: 1 September 2024 / Accepted: 13 September 2024 / Published: 16 September 2024

(This article belongs to the Special Issue Artificial Intelligence Algorithm for Remote Sensing Imagery Processing (4th Edition))

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Ship detection and formation recognition in remote sensing have increasingly garnered attention. However, research remains challenging due to arbitrary orientation, dense arrangement, and the complex background of ships. To enhance the analysis of ship situations in channels, we model the ships as the key points and propose a context-aware DGCN-based ship formation recognition method. First, we develop a center point-based ship detection subnetwork, which employs depth-separable convolution to reduce parameter redundancy and combines coordinate attention with an oriented response network to generate direction-invariant feature maps. The center point of each ship is predicted by regression of the offset, target scale, and angle to realize the ship detection. Then, we adopt the spatial similarity of the ship center points to cluster the ship group, utilizing the Delaunay triangulation method to establish the topological graph structure of the ship group. Finally, we design a context-aware Dense Graph Convolutional Network (DGCN) with graph structure to achieve formation recognition. Experimental results on HRSD2016 and SGF datasets demonstrate that the proposed method can detect arbitrarily oriented ships and identify formations, attaining state-of-the-art performance.

Keywords:

remote sensing; arbitrary-oriented ship detection; ship formation recognition; key point estimation; Delaunay triangulation; context-aware dense graph convolution network

1. Introduction

Ship detection and formation recognition aims to accurately locate and classify ships within remote sensing images and identify the formation of a group of ships. There are fundamental tasks in situational marine monitoring, and they have a wide range of applications in marine transportation, shipping scheduling, and fishery management [1]. Despite the advancements in ship detection and formation recognition methods facilitated by deep learning, several challenges persist due to the intricate nature of remote sensing images. These challenges are illustrated in Figure 1 and include (1) the expansive field of view in remote sensing images, which results in high imaging resolution; (2) the wide range of object scales distributed in these images, encompassing numerous small-scale objects; and (3) the complex backgrounds in remote sensing images, replete with significant irrelevant noise interference. Such complicated backgrounds often comprise large sea surface waves, clouds, multiple harbors featuring dry docks, and water wave traces generated by swiftly moving ships. Therefore, it is crucial to investigate advanced methods for ship detection and formation recognition [2].

For ship detection, Chen et al. [3] introduced Dilated Attention into the YOLOv3, integrating channel and spatial attention modules to extract significant features. Meroufel et al. [4] used the attention modules in Mask R-CNN to enhance information propagation. Hu et al. [5] proposed a dual-attention module with spatial and channel dimensions, which optimizes the expression of feature information. Moreover, remote sensing images are taken from an overhead perspective, leading to the ships arranging in any direction. Some scholars have adopted the rotation-invariant modules [6,7,8] and data augmentation techniques [9,10,11] to deal with this issue. Ships are often small in remote sensing images, occupying only a few pixels. Deep-shallow features fusion, attention mechanisms, etc., are generally employed for small target detection [12,13,14].

However, the above detection methods are anchor-based and require pre-designed anchor boxes. The anchor-based design increases the computational cost when performing IoU and non-maximum suppression (NMS) algorithms. In addition, some methods fail to fully use the prior information of remote sensing images. That is, the ships are distributed in multiple directions, but the centers of the ships are not affected by direction.

For ship formation recognition, it is essential to recognize that a ship group constitutes a collection of interconnected ship entities. Unlike isolated ships, a ship group exhibits distinct collective characteristics. The state of each member within the ship group is influenced not only by its own attributes but also by the constraints imposed by other members in the same group. Consequently, the states of all members within a ship group are consistent. According to the existing research on group formation recognition, the two-stage network has been used to locate and remove the false targets to extract the optimal spatial features in the high-frequency surface wave radar (HFSWR). Peng et al. [15] adopted the polynomial Fourier transform to accurately identify the number of targets accurately. Liang et al. [16] used a convolutional neural network to determine the number of targets. However, the above methods are based on the HFSWR, and the research on formation recognition is limited. Lin et al. [17] proposed a formation recognition algorithm based on a long short-term memory network to solve the problem of fleet formation recognition by establishing a ship motion model. Zhou et al. [18] built a formation database through simulation and input the formation data into the convolutional neural network for training and classification recognition.

The above methods obtain a formation database through simulation and then use a CNN to classify the formations. However, with dense target distribution and small target scale, and without considering the feature transfer between targets within the group, features are easily lost when using a convolutional neural network to extract features directly. Our work explores a more reasonable paradigm to tackle the challenges in ship detection and formation recognition.

To address the aforementioned challenges, we propose a novel approach to ship detection and formation recognition in remote sensing images utilizing center-point estimation and context-aware DGCN. This method facilitates the extraction of ship node information, delineates the topology of ship groups, and substantially augments the robustness of the resultant description. In the initial phase of the two-stage architecture, we employ the concept of center points of rotated bounding boxes to devise a one-stage center-point ship detection subnetwork that operates on anchor-free frames, where ships are represented by their center points. This allows for the estimation of ships in any orientation and the acquisition of corresponding coordinate positions for center-point recognition. In the second stage, we design the context-aware DGCN for ship formation recognition. We exploit a ship grouping module based on feature similarity clustering among them. Then, we employ a Delaunay triangulation network to represent the graph structure of ship groups by utilizing the obtained center points, effectively mapping the graph data of formations. The context-aware Dense Graph Convolutional Network (DGCN) has been developed for formation classification, integrating the layer attention mechanism and dense GCN. The main contributions are as follows:

(1) A two-stage algorithmic framework, composed of a center-point detection subnetwork and a DGCN-based formation recognition subnetwork, is designed to accomplish ship detection and formation recognition by representing ships as central points.

(2) We integrate the CenterNet with the Oriented Response and Coordinate Attention modules to detect ships and obtain center-point information.

(3) We group ships based on feature similarity clustering and adopt the Delaunay triangle to represent the graph topology of ship groups. We further develop the context-aware DGCN to recognize formation.

(4) Extensive experiments demonstrate the effectiveness of our algorithm. We also construct a new dataset, Ship Group Formation (SGF), for training and evaluating formation recognition.

2. Related Work

Traditional methods for ship detection employ transfer learning from natural image detection methods. However, the complex backgrounds in remote sensing images and the dense distributions of ships have led to limitations in traditional detection models. There is relatively little vision-based research on formation recognition. As deep learning (DL) techniques are widely applied in object detection, DL-based ship detection in remote sensing images has increasingly drawn attention. Therefore, ship detection and its formation recognition in remote sensing images remain an important research focus. This section sequentially introduces object detection, ship detection, and formation detection.

2.1. Object Detection

Object detection is the basic scene understanding topic that aims to locate the target location. Currently, there are two typical detection methods: handcrafted methods and DL-based methods. Some scholars have conducted in-depth research on object detection based on the characteristics of remote-sensing images. Handcrafted feature-based detection methods utilize candidate region extraction, where each region is subjected to feature extraction followed by feature classification to extract potential target regions from the input image. You et al. [19] introduced the Otsu (OTSU) algorithm [20] for coarse segmentation to obtain the sea and land areas. This method requires a significant amount of manually obtained prior information. A sliding window is employed on the actual image to calculate the similarity to the features in the shape library to determine whether the target image is included [21]. Zhang et al. [22] extracted visual features using sliding windows and used SVM for scoring to extract candidate regions. However, the above methods have low computational efficiency and poor performance. Data-driven deep learning methods, such as Faster R-CNN [23], YOLO [24], and Consistent-Teacher [25], obtain more robust semantic features. The

R^{2} CNN

network with a lightweight backbone, Tiny-Net, was proposed for feature extraction. It addresses the trade-off between speed and accuracy when processing large-size images in blocks [26]. Additionally, combining shallow information with pixel attention mechanisms for local information fusion enhances the detection accuracy of small targets [27]. SCRDet incorporates pixel and channel attention mechanisms, which improves the localization accuracy of dense targets [28]. Many studies [29,30,31,32] have applied attention mechanisms on feature maps to suppress backgrounds. By integrating deep learning techniques for remote sensing image processing, the aforementioned methods improve detection efficiency and reduce computational costs.

2.2. Ship Target Detection

Ship detection in remote sensing images can generally be divided into two categories: regular bounding boxes and rotated bounding boxes. Due to the unique perspective of remote sensing images, the former includes useless background information and overlooks the directional information of the ship distribution. The latter can better adapt to changes in target angles and reduce interference from background information.

Li et al. [33] generated candidate regions from the feature map through a region proposal network and employed a hierarchical selection module to conduct feature maps of different scales into the same spatial scale. Lei et al. [34] classified marine and non-marine areas and proposed a post-CNN method to extract ship candidate regions using morphological calculations. MSSDet [35] analyzed the inherent and spatial semantic gaps between hierarchical pyramid structures and proposed a unified recursive feature pyramid to generate multi-scale features with rich semantic information. Context-preserving region-based contrastive learning is a new unsupervised domain adaptation framework that enhances ship detection by learning data from labeled and unlabeled image domains [36]. However, owing to the box size, aspect ratio, and orientation angle, the existing methods generally introduce excessive background information using regular bounding boxes for rotated alignment ships.

Jiang et al. proposed

R^{2} CNN

[26] the transformation of text detection methods into ship detection models that can effectively predict ship orientation. The Rotated DFPN [37] was introduced to detect ships in complex scenes within marine and shore-based areas. Nie et al. [38] presented a new method based on Mask R-CNN, which segments and estimates the direction of ships by outputting the positions of key points at the bow and stern. FSFADet [39] is an arbitrarily oriented ship detection network based on feature separation and alignment principles, which includes a feature separation module to diminish background noise and a feature alignment module to enhance the characteristics of the ships. An oriented ship detection method based on intersecting circles and deformable regions of interest [40] was proposed to describe ships with large aspect ratios and arbitrary orientations. DL-based ship detection methods benefit from the powerful feature representation capabilities of CNNs, enabling the extraction of the high-level semantic features of ships. Meanwhile, directional ship detection methods offer additional advantages when detecting ships with large orientation angles and aspect ratios. Furthermore, some anchor-based methods deserve further consideration when computational cost is an essential metric.

2.3. Formation Recognition

The existing methods for identifying groups of ships in formation are relatively limited and can generally be divided into three categories: structural units-based formation recognition, graph model-based formation recognition, and shape context-based formation recognition. The various basic formations can be obtained according to the arrangement of the ships, which can be analyzed by understanding the basic units in the formation, such as the position of the cruiser, allowing intelligent perception of the formation. However, this method relies on identifying basic units, and the accuracy of basic unit-based recognition results directly affects the efficiency of formation identification.

To address the issue of severe scale variations and dense arrangements of ships within a group in any direction, some scholars have adopted graph models to identify ship formations by extracting center points. However, the graph models constructed using this method only consider the positional information of individual nodes and are not linked to the context of group behavior, lacking rich, high-level semantic information. Additionally, local feature descriptors cannot be directly applied in graph matching methods. Ships within a formation are closely related, so it is not feasible to describe the entire graph (formation) using a single node (ship). Deng [41] proposed a formation recognition method based on a hidden Markov model with a memory function, utilizing context descriptors and probability density functions to describe local information, achieving rotational and scale invariance. Shi et al. [42] treated the recognition problem as a graph-matching task, using shape and background context information for feature point matching, followed by calculating similarity using squared and cosine loss functions. Although these methods use histograms to count sampling points at target edges, obtaining local descriptors of shape context, they require a large number of sampling points. As the number of ships is limited for ship group formations, this method has certain limitations.

3. Materials and Methods

In this section, we describe the overall framework and then introduce each subnetwork unit, including the designed network structure and loss function.

3.1. Overview

Figure 2 provides an overview of the proposed framework for ship detection and formation recognition based on a center-point detection network and a context-aware DGCN.

As shown in Figure 2, the ship detection subnetwork is implemented to acquire the position of the center point

(c_{x}, c_{y})

in the first stage. Then, the ships are grouped based on feature similarity clustering, while the Delaunay triangulation method is employed to establish the graph structure of the ship formation. Finally, we introduce the graph structure to the context-aware DGCN to achieve formation recognition.

3.2. Center-Point-Based Method for Ship Detection

Employing multiple anchor boxes can improve the ship detection accuracy, but it introduces a large number of hyperparameters and calculations. Inspired by Center-Net [43], we note that the center point of a ship arranged in any direction is not affected by its orientation. Given this unique characteristic, we adopt the concept of rotated frame detection. Therefore, we develop a one-stage anchor-free detection model that uses the center point estimation for ships in any direction. Figure 3 provides the detailed architecture of the detection network based on center points. Its main components are the feature extraction module and the center point detection module. The former is responsible for extracting rich features from input images, while the latter generates the final output, including position coordinates and angle information

(c_{x}, c_{y}, w, h, θ)

.

3.2.1. Feature Extraction

We adopt the DLA-Net as the backbone to extract features. The DLA [44] leverages the idea of the Dense-Net and the Feature Pyramid, iteratively integrating semantic information and spatial features from different blocks to improve detection accuracy. Since ships are densely arranged in any direction, the features generated from the backbone are not rotationally invariant. We further introduce the Oriented Response Networks (OIM) [45] into the DLA to create the feature maps. The OIM consists of Active Rotation Filters (ARFs) and Oriented Response Pooling (ORPooling). ARF filters can actively rotate to generate orientation and positional encoded feature maps while producing intra-class rotation invariant deep features.

Attention mechanisms have the potential to significantly enhance network performance. Common attention mechanisms include SENet [46], CBAM [47], and Non-local [48]. SENet overlooks the positional information crucial for generating spatially selective attention maps, while CBAM utilizes large convolutional kernels to introduce channel attention. We introduce the Coordinate Attention (CA) mechanism [49] into the intermediate layers of the encoding network. CA can capture the location and channel information. Furthermore, it encodes the generated feature maps into attention maps that are sensitive to positional information and perceptual directions. Subsequently, the attention maps obtained from CA are applied to the feature map, enhancing its expression ability. The CA mechanism involves information embedding and coordinated attention generation, with the outputs as follows:

y_{c} (i, j) = x_{c} (i, j) \times g_{c}^{h} (i) \times g_{c}^{w} (j)

(1)

where

x_{c} (i, j)

represents the input feature map;

g_{c}^{h} (i)

and

g_{c}^{w} (j)

are the attention weights assigned to the input feature by the height and width direction, respectively.

3.2.2. Center Point Based Detection

CenterNet detects ships by representing them as points, using the center point of the bounding box to represent the ship, and predicts the offset and size to obtain the actual bounding box. Concurrently, a heatmap represents classification information, with each category having a unique heatmap. If the center point of a ship is located at a certain coordinate, a key point (represented by a Gaussian circle) is generated at that coordinate.

We input images into the feature extraction network proposed in Section 3.2.1 to generate a heatmap

\hat{Y} \in {[0, 1]}^{W / s \times H / s \times C}

of the ship’s center point, where

W, H

and

C

are the target’s width, height, and detected category, respectively. The Gaussian kernel function is used to map the center point to

\hat{Y} \in {[0, 1]}^{W / s \times H / s \times C}

and is expressed as follows:

Y_{x y c} = \exp (- \frac{{(x - \tilde{p_{x}})}^{2} + {(y - \tilde{p_{y}})}^{2}}{2 σ_{p}^{2}})

(2)

where

p

is the center point position of the ship,

(x, y)

represents the penalty coefficient during training, and

\tilde{p} = [p / s]

is taken as the position of the center point changed after down sampling. Most heatmaps are negative, and a few center points are positive. The focal loss is used to address the imbalance between positive and negative samples in the training process.

L_{c} = \frac{1}{N} \{\begin{matrix} \begin{matrix} \sum_{x y c} {(1 - {\hat{Y}}_{x y c})}^{α} \log (1 - {\hat{Y}}_{x y c}) \\ \sum_{x y c} {(1 - Y_{x y c})}^{β} {({\hat{Y}}_{x y c})}^{α} \\ \log (1 - {\hat{Y}}_{x y c}) \end{matrix} & \begin{matrix} \begin{array}{l} Y_{x y c} = 1 \end{array} \\ o t h e r w i s e \end{matrix} \end{matrix}

(3)

where

N

is the number of ships,

α

and

β

are the penalty coefficients in the training process,

{\hat{Y}}_{x y c}

represents the center point prediction map, and

Y_{x y c}

is the real value heat map. The input image undergoes four rounds of down-sampling through the center point network. This results in a specific deviation of the central position in the feature map relative to its original position on the image, which may lead to an offset in the predicted center point. We optimize this offset using L1 loss, as shown below.

L_{c o} = \frac{1}{N} \sum_{P} |{\hat{O}}_{\tilde{P}} - (\frac{p}{s} - \tilde{p})|

(4)

where

{\hat{O}}_{\tilde{p}}

is the offset of the predicted center point and

(p / s - \tilde{p})

represents the offset of the center point. Meanwhile, we adopt the L1 loss to predict the width and height of the ship at each predicted center point.

L_{c w h} = \frac{1}{N} \sum_{k = 1}^{N} |S_{c k} - S_{k}|

(5)

where

S_{c k}

and

S_{k}

are the width and height of the predicted ship and source target, respectively. To sum up, the design of the overall loss function for the proposed ship detection network is

L_{l o s s}^{1} = λ_{c} L_{c} + λ_{c o} L_{c o} + λ_{c w h} L_{c w h}

(6)

We refer to the following parameter settings for the Center-Net algorithm:

λ_{c} = 1

, the offset loss weight of the center point is

λ_{c o} = 1

, and the target’s width and height loss weight is

λ_{c w h} = 0.1

. Through the above Settings, the model can achieve a good balance between key point prediction and size regression, resulting in better overall performance.

3.3. Context-Aware DGCN for Ship Formation Recognition

Due to the limitations of existing research methods, the analysis of ship formations faces significant challenges. Scholars often discuss the distribution and shape of ship formation, particularly the peripheral contour. However, capturing the fine details of ship formations in remote sensing images has difficulties, as ships are analyzed as discrete points. To address this issue, we first employ the ship grouping method based on feature similarity clustering (See Section 3.2 for feature), then use the Delaunay triangulation method [50] to represent the graph structure of the ship formation. We also design a context-aware Dense Graph Convolutional Network (DGCN) for formation classification. The formation recognition process is shown in Figure 4.

3.3.1. Ship Grouping Based on Feature Similarity Clustering

Ships within a group share many similarities, such as close target positions and consistent course. From the perspective of group structure, ships with higher similarity can be extracted to form the central structure of a group. This central structure represents a group and thus determines the number of groups. After extracting the central structure of the group, others that do not belong to the central structure are divided into the nearest and most similar group central structure through the clustering method [51]. In this paper, we can calculate similarity and construct a similarity matrix between ships based on the input state information, such as ship target positions and navigation angles. Given two ships

S_{1}

and

S_{2}

, where the position information is

(c_{x}^{S_{1}}, c_{y}^{S_{1}})

and

(c_{x}^{S_{2}}, c_{y}^{S_{2}})

, and the angle information is

θ_{S_{1}}

and

θ_{S_{2}}

, we define the ship position similarity using the fuzzy function of a decreasing semi-normal distribution as follows:

S i m_d i s (d_{i j}) = \{\begin{matrix} 1 & d_{i j} \leq α \\ e^{- k (d_{i j} - α)} & d_{i j} > α \end{matrix}

(7)

where

d_{i j} = \sqrt{{(c_{x}^{S_{1}} - c_{x}^{S_{2}})}^{2} + {(c_{y}^{S_{1}} - c_{y}^{S_{2}})}^{2}}

,

α

is the position distance between ships, usually set to

α = 100 m

.

k

is a decay coefficient that determines how rapidly the similarity decreases when the distance exceeds

α

. The larger the value of

k

, the faster the similarity decreases with the increase of distance. We also calculate the angular similarity between ships, which is shown as follows:

S i m_a n g l e (θ) = \{\begin{matrix} 1 & θ \leq α_{1} \\ \frac{α_{2} - θ}{α_{2} - α_{1}} & α_{1} \leq θ \leq α_{2} \\ 0 & θ \geq α_{2} \end{matrix}

(8)

where

θ = | θ_{S_{2}} - θ_{S_{1}} |

,

α_{1} = 10

, and

α_{2} = 20

. We have incorporated Figure 5, which vividly illustrates the angles, enhancing the clarity and understanding of the similarities between the ships. Based on this, the central structure of a group can be constructed via extracting ships with high similarity values. We define the central structure of a group as: a space group

S_{k}

composed of n ships

{S_{1}, S_{2}, S_{3} \dots S_{n}}

, where

\forall S_{i} \in S_{k}

,

\forall S_{j} \in S_{k}

, and the position and angle similarity between ships are greater than a certain threshold. We employ the ship feature similarity matrix to identify the center point structure of the group, constructing two

n \times n

similarity matrices by Formulas (7) and (8), which are the position similarity matrix

A_{1}

and the angle similarity matrix

A_{2}

. The definition of the ship feature similarity matrix is defined as follows:

A = A_{1} \land A_{2}

(9)

where

a_{i j} \in A

,

a_{i j} = \{\begin{matrix} 1 & a_{i j} \geq μ \\ 0 & a_{i j} < μ \end{matrix}

, and the similarity threshold is set to 0.85. There are n ships in

S_{k}

, and

a_{i j} = 1

indicates that the ship belongs to the central structure of the group. There exist m ships that satisfy

a_{i j} = 1

among them, then m represents the number of groups. We further identify isolated points by detecting ships with low similarity to all other ships. If

a_{i j} \leq ε, i \neq j, j - = 1, 2 \dots n

exists, then

i

is an isolated point. The central structure of the group is used as the clustering center for group representation, classifying non-isolated ships into the closest central structure. We find the target

S_{i}

with the largest similarity value to

S_{j}

and divide

S_{j}

into the group where

S_{i}

resides. If

S_{i}

has not yet been divided, we find the target with the second-largest similarity value to

S_{j}

and then complete the clustering analogously.

3.3.2. Formation Structure Representation Based on Delaunay Triangulation

The Delaunay triangulation can represent the arbitrary shape of a region boundary and effectively capture the random interference factors present in a ship group. The Delaunay triangulation represents the outer contour of the group and effectively expresses its shape distribution. It has a simple structure and small data redundancy, making it suitable for analyzing complex patterns of ship groups [52].

In this paper, ships are analyzed as discrete points, obtaining the position coordinates of the center point within the ship formation are obtained as described in Section 3.2.2. The center point serves as the node of the graph structure. The spatial distribution reflects the spatial proximity relationships among the ship group, employing the Delaunay triangulation as a model for spatial proximity analysis to express the spatial structure. Figure 6 shows the Delaunay triangulation, which represents the spatial proximity relationships between center points are represented as an interconnected regional network. We connect the nodes between the ship groups using the Delaunay triangle, which enables us to capture the relationships between groups and accurately represent their spatial structures. The graph structure of the ship formation is based on the position coordinates of the center points, and the formation can be expressed as:

G_{s w} = (V_{s w}, E_{s w})

, where

V_{s w} = {v_{1}, v_{2}, \dots, v_{i}, \dots, v_{N}}

represents the nodes set of the graph, and

N

indicates the number of nodes.

E_{sw} = {e_{i, j} = (i, j) | i, j \in N}

is the edge set of the graph structure, and

e_{i, j}

is an edge between node

i

and node

j

.

3.3.3. Formation Classification Based on Context-Aware DGCN

The topological structure of the ship group is derived using Delaunay triangulation (see Section 3.3.2 for details), which can further represent the formation graph structure. We adopt the context-aware-based dense graph convolution network (DGCN) for formation recognition. The traditional graph convolution transfer function is expressed as follows:

L^{(n + 1)} = σ ({\tilde{D}}^{- 1 / 2} \tilde{A} {\tilde{D}}^{- 1 / 2} L^{(n)} W^{(n)})

(10)

where

L^{(n + 1)}

and

L^{(n)}

represent the values of layer

n + 1

and layer

n

, respectively;

W

is the weight matrix, and

σ (\cdot)

is the activation function.

\tilde{A}

represents a symmetrically normalized version of the graph’s adjacency matrix

A

after adding self-connections (i.e., each node is connected to itself). Concretely, this transformation involves first adding the identity matrix

I

to

A

(i.e.,

\tilde{A}

=

A

+

I

), subsequently applying normalization procedures to yield

{\tilde{D}}^{- 1 / 2} \tilde{A} {\tilde{D}}^{- 1 / 2}

, where

\tilde{D}

is the degree matrix of

\tilde{A}

. Notably, the degree matrix

\tilde{D}

is derived from

\tilde{A}

and used for normalization to ensure the numerical stability of the graph convolution operation. The adjacency matrix

A

is the structure information of the graph, which affects the computation of graph convolution through its impact on

\tilde{A}

. However, the traditional GCN primarily focuses on aggregating global information during the convolution process while ignoring the intermediate information between nodes. As the number of GCN layers increases, the differences between node features may gradually become blurred, leading to the issue of over-smoothing and degrading classification performance. To address this problem, we employ dense connections in GCNs, which utilize dense connections between layers to facilitate the effective application of features across layers. The process can be expressed as follows:

f_{v}^{(n)} = DGCN (f_{G}^{(n - 1)})

(11)

where

f_{v}^{(n)}

is the characteristic of the output of layer

n

, and

f_{G}^{(n - 1)}

is the network output of layer

n - 1

. Figure 7 shows the feature preservation based on the DGCN, in which three layers of local features are retained. The output of the

n th

layer can be further determined as follows:

f_{G}^{(n)} = DGCN (f_{G}^{(n - 1)}) + \sum_{k = 0}^{n - 1} f_{v}^{(k)}

(12)

We further propose a layer attention mechanism to obtain global context features, which helps to overcome the complexity of graph feature information redundancy as network depth increases. It can effectively extract useful features by introducing the attention mechanism, which enhances the correlation between features and strengthens the adaptability of the DGCN for formation classification.

ω_{n} = softmax (W_{2} \cdot λ (W_{1} \cdot f_{v}^{(n)}))

(13)

where

ω_{n}

is the layer attention mechanism,

W_{1}

and

W_{2}

are the parameter matrices, and

λ (\cdot)

is the Leaky_ReLU nonlinear function. The weighted average output of the layer’s features of the formation is further as follows:

f_{G} = λ (\sum_{k = 0}^{K} ω_{k} (W_{1} \cdot f_{v}^{(k)}))

(14)

where

f_{G}

is the node feature of the formations graph with a node association,

ω_{k} (\cdot)

represents the calculated attention weight, and

k

represents the number of network layers. It is necessary to output the feature representation of the entire graph for formation classification. We carry out a pooling operation on

f_{G}

: average the features of all nodes of each graph are averaged to obtain the following feature representation as follows:

f_{SW} = \frac{1}{|V_{SW}|} \sum_{i \in V_{SW}} f_{G}^{i}

(15)

where

V_{SW}

represents the set of nodes in the graph and

f_{G}^{i}

is the feature of the

i th

node. The graph pooling operation is insensitive to the node order of nodes in the ship formation graph and aggregates information of individual nodes and edges. We further use Softmax to calculate the probability for each type of ship formation.

y = softmax (\frac{e^{f_{SW}}}{\sum_{i}^{N} e^{f_{SW}^{i}}})

(16)

where

N

represents the number of nodes in the ship formation graph, and

e^{f_{SW}}

denotes the exponent of the node feature vector in the graph structure of the ship formation. Then, we transform the identification of ship formations into a graph classification problem. We use the cross-entropy loss to constrain the difference between the output result and the formation label.

L_{l o s s}^{2} (x, f_{G}) = - \sum_{j = 1}^{C} x_{j} \log y_{j}

(17)

Among them,

x_{j}

represents the actual type label of the

j th

type of formation;

y_{j}

is the probability that the predicted formation

x

belongs to the

j th

type of formation;

C

indicates the number of categories for ship formation types.

4. Experimental Results and Analysis

4.1. Datasets and Annotation

To verify the effectiveness of the proposed network, we conduct experiments on two remote sensing datasets, namely HRSC2016 [53] and SGF. The images of HRSC2016 cover six famous ports on Google Earth, with varying spatial resolutions ranging from 2 m to 0.4 m, and an image size mostly of 600 × 1000. The dataset is divided into a training set (436 images), validation set (181 images), and test set (444 images), containing 1207, 514, and 1228 ship samples, respectively. The tasks involved in the experiments include single-class ship detection, 4-class category recognition, and 19-class type recognition of ships. It is worth mentioning that, as the formation recognition treats ships as mere points, we have uniformly modified all types of ship labels in the dataset to “ship”.

The SGF is a novel dataset of ship group formations obtained from aerial images captured by a six-rotor UAV and simulated data, encompassing 3000 images. The dataset contains 1890 aerial images and 1120 simulated images. Due to the limited availability of ship formations, we only constructed six formations. However, we refer to the public standard formations to construct the six formations, which meet the experimental requirements [54]. Figure 8 shows examples from the HRSC2016 and SGF datasets. Figure 9 shows the six types of public ship formations.

To further identify the ship formations, several open benchmark formations are simply displayed in Figure 9, where CV, CG, DD, FFG, and SSN represent different classes of ships, respectively. These formations schematics can be summarized through the publicly available ship formations [55]. The SGF dataset is based on these six formations.

Horizontal bounding boxes for ship detection may lead to redundant background areas. Additionally, a dense arrangement of ships in any direction typically has a large aspect ratio, and the NMS algorithm, which uses a traditional horizontal box is prone to missing details and scene information. Therefore, the horizontal box is not suitable for ship detection tasks in any direction. We annotate the ship with the rotating boxes

(c_{x}, c_{y}, w, h, θ)

, where

(x, y)

are the center-point coordinates, w and h are the width and height of the ship, respectively, and

θ \in (0^{\circ}, 180^{\circ})

is the long side relative to the y-direction.

4.2. Experiment Details and Evaluation Index

The experiments are conducted on an Ubuntu 18.04 system, with the data resolution uniformly converted to 512 × 512 pixels, ensuring the model can handle various image sizes with different resolutions without affecting its performance or accuracy. The code is tested on Intel Core i7 CPU with an NVIDIA RTX 2080Ti GPU. The initial learning rate is 5 × 10⁻⁴ to optimize the training process and increase convergence speed, which is reduced by one-tenth at the 180th and 210th iterations. This strategy helps the model converge faster while avoiding overfitting or underfitting during training. In addition, to enhance data diversity and improve the generalization capability, robustness, and accuracy of the model, we adopted various data augmentation methods during the training process, including color dithering, random flips, and different angle rotations. We employ the evaluation method in the DOTA-devkit tool to calculate the mean average precision (mAP) of rotated bounding box detection and establish a set of thresholds to achieve maximum precision at each recall rate. The average precision is named average precision (AP), and the mean average AP across all classes is designated mAP.

m A P_{0.5} - m A P_{0.8}

refers to the mAP value obtained with an IOU threshold of 0.5–0.8. PASCAL VOC2007 metric is used to compute the mAP in all of our experiments. We also adopt frames per second (FPS) as a metric to evaluate the detection speed of the model. Higher values of FPS and mAP indicate better detection speed and accuracy, respectively.

p r e c i s i o n = \frac{T P}{T P + F P}, r e c a l l = \frac{T P}{T P + F N} mAP = \frac{1}{n} \sum_{i = 1}^{n} {AP}_{i}

(18)

where n represents the number of classes in HRSC2016 and SGF datasets. The precision-recall curve is drawn with recall as the horizontal axis and precision as the vertical axis. AP can be obtained by integrating the area under the precision–recall curve.

4.3. Experiment Results of Ship Detection

We compare the proposed network with other representative ship detection algorithms on the HRSC2016 dataset and SGF datasets, such as

R^{2} CNN

[56], RetinaNet-Rbb [57], RoI-Trans [58], etc., to further validate the effectiveness of the center point detection algorithm. For a fair comparison, we set the same epoch and data augmentation strategy for all training in the experiment. Table 1 summarizes the quantitative experimental results of the HRSC2016 dataset.

Table 1 presents the quantitative experimental results on the HRSC2016 dataset, which comprehensively compares the performance of different ship detection algorithms. The proposed method achieves an impressive accuracy score of 81.2%, outperforming other state-of-the-art algorithms listed in Table 1. We further compare the proposed method with several representative algorithms, which have excellent performance for ship detection.

R^{2} CNN

, which realizes rotation box detection based on Faster R-CNN with Resnet101, achieves an AP score of 71.1%. ROI-Trans transforms the horizontal region of interest into a rotational region, reducing the rotational anchor box and improving the accuracy. When introducing four corner coordinates, the detection accuracy of RSDet [59] reaches 80.3%. Compared to the above algorithms, the proposed method achieves the highest mAP scores at both IoU thresholds of 0.5 and 0.7, as well as the highest F1-score, indicating its superior performance in terms of both precision and recall. Additionally, it also exhibits the fastest processing speed (FPS), suggesting its potential for real-time applications. The ability to accurately locate the center points of objects serves as a robust feature for subsequent formation recognition tasks.

The quantitative experimental results on the SGF dataset are presented in Table 2. The simplicity of the sea surface background on the SGF dataset may improve detection accuracy. The results demonstrate that the proposed detection method achieves the best performance across all metrics, including mAP at different IoU thresholds, F1-score, and FPS. SCRDet [28], which fuses multi-layer features and addresses the boundary problem of rotation angles, achieves an accuracy of 82.78% but with a relatively slow FPS of 10.3. However, the proposed method balances accuracy and speed, maintaining high performance with an accuracy of 86.92% while achieving a significantly higher FPS, indicating its suitability for real-time applications. Additionally, the slow calculation speed of methods like Hourglass [60], which may be caused by numerous parameters, can limit their practical applications in real-time scenarios.

We provide visual comparisons of the predicted bounding boxes in Figure 10 without the quantitative evaluation metrics. Figure 10 compares the various detection methods on the HRSC2016 and SGF datasets under inshore and sea surface conditions. It is compared with methods such as

R^{3} Det

,

R^{2} CNN

, etc. The

R^{3} Det

network [61] alternately utilizes horizontal and rotated bounding boxes, maintaining high ship localization accuracy and heading speed for ships with a large aspect ratio. As shown in Figure 10a,b, the detection accuracy is significantly improved, with only a few missed and false detections on cloud-affected sea surfaces. Compared with these two networks, our proposed method improves the detection accuracy and can obtain the ship’s center point coordinates, which is beneficial for subsequent formation identification. The qualitative experimental results demonstrate the effectiveness and robustness of the proposed center-point network for ship detection.

The attention mechanism has demonstrated its effectiveness in enhancing the learning of network features and emphasizing on-ship features. Experiments are conducted on the HRSC2016 dataset by comparing several attention mechanisms to evaluate the efficacy of the proposed CA attention. The results are shown in Table 3, which clearly indicates the superiority of CA attention. Additionally, to provide a deeper understanding of the training process and the impact of different components on the network’s performance, ablation experiments are conducted.

As illustrated in Table 3, SENet [46] generates a two-dimensional feature map and effectively establishes the interdependence between channels. When incorporating SNet into the network, the accuracy rises to 80.92%. Additionally, CBAM [47] further introduces spatial information coding through large-scale kernel convolution to achieve better target classification results. Based on the baseline and center-point estimation, the accuracy of CBAM is improved from 80.92% to 81.12%. The CA module in the proposed method performs better than other attention methods with lightweight attributes, which helps to improve the expression ability for ship features of the network.

4.4. Experiment Results of Formation Recognition

To verify the effectiveness of the ship formation recognition based on the Context-aware DGCN, we carry out the following experiments.

4.4.1. Ship Grouping and Formation Structure Representation

We adopt the clustering method based on feature similarity to realize the ship grouping before formation recognition. This section primarily focuses on obtaining the central structure of ship groups and detecting isolated points. Afterward, we further classify the remaining ships into the nearest group central structures, thereby completing the clustering process and realizing ship grouping.

We have obtained ship characteristics, including center point coordinates and angle information, as described in Section 4.2. Therefore, we view the ship as a center point for this analysis. Figure 11 shows the test results of group center structure recognition and the isolated detection of different formations in the SGF dataset.

We further test the correctness of the ship grouping based on feature similarity clustering and compare it with K-means, K-means++, DBSCAN [55], AGNES [55] and Louvain [62]. The Louvain algorithm extracts attributes such as target type, offense–defense, heading, position, speed, etc., as the basis for clustering. It applies different distance measurement methods based on the spatial, functional, and collaborative relationship hierarchies of target groups, thereby achieving multi-level classification of aerial target groups. Density-based clustering using the DBSCAN algorithm can cluster data clusters of arbitrary shapes. The hierarchical AGNES algorithm is a hierarchical clustering approach that employs a bottom-up aggregation strategy. If the merging point is not well-selected at a certain step, it may lead to low-quality clustering results. It is worth noting that the K-means algorithm and the K-means++ algorithm require an input of the number of clusters,

K

. For test sets where there are no isolated ships, we enter

K

as the number of ships that exist. For test datasets with isolated ships,

K

is the number of ships in the group and isolated ships. We use the accuracy rate of clustering (CAR) to measure the correct degree of ship grouping. The CAR is calculated as follows:

CAR = \frac{n_{c o r}}{N_{a l l}} \times 100 %

(19)

where the

n_{c o r}

represents the number of ships that are correctly clustered, and the

N_{a l l}

is the total number of ships in the test set. We test the correctness of ship clustering using the dataset described in Table 4, and Figure 12 shows the clustering results across six test sets. Based on the test set in Table 4, the clustering accuracy rates using different methods are listed in Table 5.

The first column in Figure 12 contains only Ship Group 1 without isolated ships, and the CAR of the proposed grouping method, K-means, DBSCAN, AGNES, Louvain and K-means++ are all 100%. The third column contains Ship Groups 2 and 3 without isolated ships; its CAR values are all over 85%. The fifth column includes Ship Groups 2 and 5 with the 13 isolated ships, whose CAR is 83.34%, 87.65%, 84.47%, 83.66%, 88.37%, and 90.30%, respectively. Ship Groups 5 and 7’s isolated ships are contained in the last column; our method achieves an optimal value of 83.62% in terms of CAR. Obviously, Figure 12 and Table 5 show the superiority of the ship grouping method based on feature similarity clustering. When there is no isolated ship interference, the clustering CAR of our proposed method reaches more than 90%, and when there is isolated ship interference, our proposed method is also better than others. The clustering method in this paper is helpful for the subsequent task of formation recognition.

We further represent the formation structure using the Delaunay triangulation based on the results of the ship grouping above. Figure 13 shows the topological structure of different formations after grouping, which we use to obtain the graph structure data representation of formations, including the nodes, edges, and node features of a formation.

4.4.2. Formation Classification

After obtaining the graph structure data of the ship formation, we input them into the context-aware DGCN network to realize formation recognition. We choose the average classification accuracy rate as the performance index

F_{m C A R}

to evaluate the formation recognition algorithm, which is expressed as follows:

F_{m C A R} = \frac{\sum_{i = 1}^{m} F_{C A R}}{m}

(20)

where

F_{m C A R}

is the average classification accuracy rate and

m

is the total number of categories of ship formation.

F_{C A R} = \frac{N_{corr}}{N_{corr} + N_{mis}}

(21)

where

N_{corr}

shows the number of times the formation classification is correct, and

N_{mis}

represents the number of formation classification errors. We carry out experiments on the graph structure data from the SGF datasets. The DGCN obtains the average features of all nodes of each graph, and then the Softmax function is used to classify the formation. We use the cross-entropy loss to constrain the training process of formation classification. The qualitative experimental results of formation recognition are shown in Figure 14.

Figure 14 shows the qualitative experimental results of ship formation recognition. The recognition results encompass all ship detections, with the center point represented by a yellow circle. Additionally, the figure incorporates a visualization of the ship formations based on the Delaunay triangulation representation, evident through the interconnected green lines. Figure 14a–d showcase the recognition outcomes for Formation 1, Formation 2, Formation 6, and Formation 5, respectively. Figure 14e exhibits two formations, Formations 2 and 6, while Figure 14f incorporates Formations 2 and 3. However, the classification results in the first row of Figure 14b and the second row of Figure 14e are incorrect. This inaccuracy primarily stems from the isolated ships included in the Delaunay triangulation representation, which alters the graph structure data of the formation. As observed in Figure 13, some isolated ships are erroneously recognized as ship groups, resulting in changes in the graph structure features of the output formation and affecting the classification results.

To quantitatively analyze the effectiveness of the proposed method, we conduct a comparative test of methods based on image-level recognition and those based on graph data recognition. Image-level algorithms directly predict and classify the formations by inputting images into the CNN, such as the formation recognition algorithm based on VGG-16, ResNet-50, and ResNet-101. The graph data algorithm predicts and classifies the formation on the basis of obtaining the position information of the ships and the graph structure data of ship groups, such as AGCN [63], GAT [64] and our proposed method. The quantitative comparison results are shown in Table 6.

As can be seen from Table 6, the

F_{m C A R}

of the formation recognition algorithm proposed in this paper reaches the highest, which is mutually confirmed with the above qualitative experiments and further verifies the feasibility of formation recognition. The image-level recognition method does not consider the group structure information of ship groups, and it is easy to lose the target features in operations such as down-sampling, resulting in poor recognition results. We added dense connections to the GCN to effectively reuse the features between layers, and the layer attention mechanism can further promote recognition performance.

Additionally, the Topology Similarity Calculation (TSC) method is the most commonly used algorithm for ship formation recognition, and we will expand it to highlight the effectiveness of the presented method. Figure 15 depicts the formation and peripheral contours of ship groups. In Figure 15, the first row represents different formations on the SGF dataset. We identify the ship formation by calculating the similarity between the formation in Figure 15 and the publicly available formation in Figure 9. The various similarity factors are discussed in Table 7. We adopt the number of neighbors for each ship, the area of the convex hull, sparsity degree, and shape distribution to obtain the results of formation recognition, and the experimental results are presented in Table 8.

Table 7 provides the statistical results of the calculation factors required for the similarity calculation. This table serves as a comprehensive summary of the numerical values gathered from our analysis. Table 8 lists the similarity calculation results of each factor and an overall similarity score, which represents the average level of similarity between different formations. The overall similarity calculation results

S I M

represent the

F_{m C A R}

of the formations. By calculating the number of neighbors for each ship, we obtain the similarity

S I M_{T o p o}

of topology structure. The area

S

of the convex hull is used to reflect the approximate distribution range

S I M_{A r e a}

of the ship groups. The distribution density

S I M_{t h i c}

is calculated as the ratio of the number of formations

n

and the area

S

of the convex hull. We calculate the distance similarity of the ship group by using the minimum external matrix with the length

X

and width

Y

, which reflects the shape distribution of the formation. Upon examining Table 8, it is evident that the similarity scores associated with Ship Group 1 and Ship Group 2 are relatively high, which indicates a substantial degree of similarity between these two ship groups. However, this method requires us to calculate the similarity between all the identified formations and all the publicly available standard formations, which is computationally complex.

5. Conclusions

In this study, we present an innovative formation recognition method based on key-point estimation and context-aware DGCN, drawing inspiration from the CenterNet framework. In the first stage, we design a center point-based ship detection method that utilizes oriented response networks and coordinate attention to generate direction-invariant feature maps, effectively improving the accuracy of the ship’s center point prediction. In the second stage, we propose a context-aware DGCN recognition method to classify ship formations. Specifically, we employ feature similarity clustering to group ships, leverage the Delaunay triangulation method to represent the graph structure of the formation, and then develop a context-aware DGCN for formation classification. This entire process, from clustering to classification, enhances the accuracy of formation recognition. Our detection method is implemented on the HRSC2016 dataset, achieving an mAP of 81.2%, FPS of 17.8, and F1-score of 0.89, respectively. Additionally, the average classification accuracy rate on the SGF dataset reaches 75.59%, effectively identifying ship formations. As part of future work, we plan to focus on dynamic graph networks in the case of the dynamic number of ships. We aim to optimize the proposed method further by considering key-point information and dynamic graph convolutional to recognize formations more accurately.

Author Contributions

Conceptualization, T.Z. and X.X.; methodology, T.Z.; software, X.X.; validation, S.S., X.Y. and R.L.; formal analysis, T.Z.; investigation, T.Z.; resources, R.L.; data curation, S.W.; writing—original draft preparation, T.Z.; writing—review and editing, R.L.; visualization, X.X.; supervision, X.Y.; project administration, X.Y.; funding acquisition, R.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 62276274 and the Aeronautical Science Foundation of China under Grant 2020JM-537.

Data Availability Statement

Northwestern Polytechnical University released the HRSC2016 dataset in 2016, which is available at https://sites.google.com/site/hrsc2016/, accessed on 28 May 2024.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Cui, X. Research on Arbitrary-Oriented Ship Detection in Optical Remote Sensing Images. Ph.D. Thesis, University of Chinese Academy of Science, Beijing, China, 2021. [Google Scholar]
Dong, C. Research on the Detection of Ship Targets on the Sea Surface in Optical Remote Sensing Image. Ph.D. Thesis, University of Chinese Academy of Science, Changchun, China, 2020. [Google Scholar]
Chen, L.; Shi, W.; Deng, D. Improved YOLOv3 Based on Attention Mechanism for Fast and Accurate Ship Detection in Optical Remote Sensing Images. Remote Sens. 2021, 13, 660. [Google Scholar] [CrossRef]
Meroufel, H.; El Amin Larabi, M.; Amri, M. Deep Learning based Ships Detections from ALSAT-2 Satellite Images. In Proceedings of the 2022 IEEE Mediterranean and Middle-East Geoscience and Remote Sensing Symposium (M2GARSS), Istanbul, Turkey, 7–9 March 2022; pp. 86–89. [Google Scholar]
Hu, J.; Zhi, X.; Shi, T.; Zhang, W.; Cui, Y.; Zhao, S. PAG-YOLO: A Portable Attention-Guided YOLO Network for Small Ship Detection. Remote Sens. 2021, 13, 3059. [Google Scholar] [CrossRef]
Zhang, C.; Gao, G.; Liu, J.; Duan, D. Oriented Ship Detection Based on Soft Thresholding and Context Information in SAR Images of Complex Scenes. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–15. [Google Scholar] [CrossRef]
Bai, A.; Chen, J.; Yang, W.; Men, Z.; Zhang, Z.; Zeng, S.; Xu, H.; Cao, W.; Jian, C. Leveraging Permuted Image Restoration for Improved Interpretation of Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–15. [Google Scholar] [CrossRef]
Yu, W.; Li, J.; Wang, Z.; Yu, Z.; Luo, Y.; Liu, Y.; Feng, J. Detecting rotated ships in SAR images using a streamlined ship detection network and gliding phases. Remote Sens. Lett. 2024, 15, 413–422. [Google Scholar] [CrossRef]
Mou, F.; Fan, Z.; Jiang, C.; Zhang, Y.; Wang, L.; Li, X. Double Augmentation: A Modal Transforming Method for Ship Detection in Remote Sensing Imagery. Remote Sens. 2024, 16, 600. [Google Scholar] [CrossRef]
Deng, H.; Zhang, Y. FMR-YOLO: Infrared Ship Rotating Target Detection Based on Synthetic Fog and Multiscale Weighted Feature Fusion. IEEE Trans. Instrum. Meas. 2024, 73, 1–17. [Google Scholar] [CrossRef]
Song, J.; Kim, D.; Hwang, J.; Kim, H.; Li, C.; Han, S.; Kim, J. Effective Vessel Recognition in High Resolution SAR Images Using Quantitative and Qualitative Training Data Enhancement from Target Velocity Phase Refocusing. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–14. [Google Scholar] [CrossRef]
Tang, X.; Zhang, J.; Xia, Y.; Xiao, H. DBW-YOLO: A High-Precision SAR Ship Detection Method for Complex Environments. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 7029–7039. [Google Scholar] [CrossRef]
Alina, C.; Sylvie, L.; Sidonie, L.; Arnaud, W. Deep-NFA: A deep a contrario framework for tiny object detection. Pattern Recognit. 2024, 150, 110312. [Google Scholar]
Han, Y.; Guo, J.; Yang, H.; Guan, R.; Zhang, T. SSMA-YOLO: A Lightweight YOLO Model with Enhanced Feature Extraction and Fusion Capabilities for Drone-Aerial Ship Image Detection. Drones 2024, 8, 145. [Google Scholar] [CrossRef]
Chen, X.; Guan, J.; Liu, N.; He, Y. Maneuvering Target Detection via Radon-Fractional Fourier Transform-Based Long-Time Coherent Integration. IEEE Trans. Signal Process. 2014, 62, 939–953. [Google Scholar] [CrossRef]
Liang, F.; Zhou, Y.; Li, H.; Feng, X.; Zhang, J. Multi-Aircraft Formation Recognition Method of Over-the-Horizon Radar Based on Deep Transfer Learning. IEEE Access 2022, 10, 115411–115423. [Google Scholar] [CrossRef]
Lin, Z.; Zhang, X.; Hao, N.; He, F. An LSTM-based Fleet Formation Recognition Algorithm. In Proceedings of the 40th Chinese Control Conference (CCC), Shanghai, China, 26–28 July 2021; pp. 8565–8569. [Google Scholar]
Zhou, Q. Research on UAV Target Detection and Formation Recognition Based on Computer Vision. Master’s Thesis, China Academic of Electronics and Information Technology, Beijing, China, 2021. [Google Scholar]
You, X.; Li, H. A Sea-Land Segmentation Scheme based on Statistical Model of Sea. In Proceedings of the 4th International Congress on Image and Signal Processing (CISP), Shanghai, China, 12 December 2011; pp. 1155–1159. [Google Scholar]
Otsu, N. A Threshold Selection Method from Gray-Level Histograms. IEEE Trans. Syst. Man Cybern. 2007, 9, 62–66. [Google Scholar] [CrossRef]
Jian, X.; Fu, K.; Xian, S. An Invariant Generalized Hough Transform Based Method of Inshore Ships Detection. In Proceedings of the 2011 International Symposium on Image and Data Fusion (ISIDF), Yunnan, China, 9–11 August 2011. [Google Scholar]
Zhang, Z.; Warrell, J.; Torr, P. Proposal Generation for Object Detection Using Cascaded Ranking SVMs. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Colorado Springs, CO, USA, 20–25 June 2011; pp. 1497–1504. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Wang, X.; Yang, X.; Zhang, S.; Li, Y.; Feng, L.; Fang, S.; Chen, K.; Zhang, W. Consistent-Teacher: Towards Reducing Inconsistent Pseudo-Targets in Semi-Supervised Object Detection. In Proceedings of the 2023 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Oxford, UK, 15–17 September 2023; pp. 3240–3249. [Google Scholar]
Pang, J.; Li, C.; Shi, J.; Xu, Z.; Feng, H. R2-CNN: Fast Tiny Object Detection in Large-Scale Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2019, 57, 5512–5524. [Google Scholar] [CrossRef]
Wang, P.; Sun, X.; Diao, W.; Fu, K. FMSSD: Feature-Merged Single-Shot Detection for Multiscale Objects in Large-Scale Remote Sensing Imagery. IEEE Trans. Geosci. Remote Sens. 2020, 58, 3377–3390. [Google Scholar] [CrossRef]
Yang, X.; Yang, J.; Yan, J.; Zhang, Y.; Zhang, T.; Guo, Z.; Sun, X.; Fu, K. SCRDet: Towards More Robust Detection for Small, Cluttered and Rotated Objects. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 8232–8241. [Google Scholar]
Zhang, J.; Shi, X.; Zheng, C.; Wu, J.; Li, Y. MRPFA-Net for Shadow Detection in Remote-Sensing Images. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–11. [Google Scholar] [CrossRef]
Xu, X.; Yang, Z.; Li, J. AMCA: Attention-Guided Multi-Scale Context Aggregation Network for Remote Sensing Image Change Detection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–19. [Google Scholar] [CrossRef]
Zhao, L.; Zhu, M. MS-YOLOv7: YOLOv7 Based on Multi-Scale for Object Detection on UAV Aerial Photography. Drones 2023, 7, 188. [Google Scholar] [CrossRef]
Liu, C.; Yang, D.; Tang, L.; Zhou, X.; Deng, Y. A Lightweight Object Detector Based on Spatial-Coordinate Self-Attention for UAV Aerial Images. Remote Sens. 2023, 15, 83. [Google Scholar] [CrossRef]
Li, Q.; Mou, L.; Liu, Q.; Wang, Y.; Zhu, X. HSF-Net: Multiscale Deep Feature Embedding for Ship Detection in Optical Remote Sensing Imagery. IEEE Trans. Geosci. Remote Sens. 2018, 56, 7147–7161. [Google Scholar] [CrossRef]
Lei, F.; Wang, W.; Zhang, W. Ship Extraction Using Post CNN from High Resolution Optical Remotely Sensed Images. In Proceedings of the IEEE 3rd Information Technology, Networking, Electronic, and Automation Control Conference (ITNEC), Chengdu, China, 15–17 March 2019; pp. 2531–2535. [Google Scholar]
Chen, W.; Han, B.; Yang, Z.; Gao, X. MSSDet: Multi-scale Ship-detection Framework in Optical Remote-sensing Images and New Benchmark. Remote Sens. 2022, 14, 5460. [Google Scholar] [CrossRef]
Zhang, T.; Lou, X.; Wang, H.; Cheng, Y. Context-Preserving Region-based Contrastive Learning Framework for Ship Detection in SAR. J. Signal Process. Syst. Signal Image Video Technol. 2022, 95, 3–12. [Google Scholar] [CrossRef]
Yang, X.; Sun, H.; Fu, K.; Yang, J.; Sun, X.; Yan, M.L.; Guo, Z. Automatic Ship Detection in Remote Sensing Images from Google Earth of Complex Scenes based on Multiscale Rotation Dense Feature Pyramid Networks. Remote Sens. 2018, 10, 132. [Google Scholar] [CrossRef]
Nie, M.; Zhang, J.; Zhang, X. Ship Segmentation and Orientation Estimation using Key-points Detection and Voting Mechanism in Remote Sensing Images. In Proceedings of the 16th International Symposium on Neural Networks (ISNN), Moscow, Russia, 10–12 July 2019; pp. 402–413. [Google Scholar]
Zhu, M.; Hu, G.P.; Li, S.; Zhou, H.; Wang, S. FSFADet: Arbitrary-oriented Ship Detection for SAR Images based on Feature Separation and Feature Alignment. Neural Process. Lett. 2022, 54, 1995–2005. [Google Scholar] [CrossRef]
Zhang, J.; Huang, R.; Li, Y.; Pan, B. Oriented Ship Detection based on Intersecting Circle and Deformable RoI in Remote Sensing Images. Remote Sens. 2022, 14, 4749. [Google Scholar] [CrossRef]
Deng, C.; Cao, Z.; Xiao, Y.; Chen, Y.; Fang, Z.; Yan, R. Recognizing the Formations of CVBG based on Multi-viewpoint Context. IEEE Trans. Aerosp. Electron. Syst. 2015, 51, 1793–1810. [Google Scholar] [CrossRef]
Shi, L.; Huang, Z.; Feng, X. Recognizing the Formations of CVBG based on Shape Context Using Electronic Reconnaissance Data. Electron. Lett. 2021, 57, 562–563. [Google Scholar] [CrossRef]
Zhou, X.; Wang, D.; Krähenbühl, P. Objects as Points. arXiv 2019, arXiv:1904.07850. [Google Scholar]
Yu, F.; Wang, D.; Shelhamer, E.; Darrell, T. Deep Layer Aggregation. In Proceedings of the 31st IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 7794–7803. [Google Scholar]
Zhou, Y.; Ye, Q.; Qiu, Q.; Jiao, J. Oriented Response Networks. In Proceedings of the 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 4961–4970. [Google Scholar]
Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-Excitation Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 2011–2023. [Google Scholar] [CrossRef] [PubMed]
Woo, S.; Park, J.; Lee, J.; Kweon, I. CBAM: Convolutional Block Attention Module. In Proceedings of the 15th European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
Wang, X.; Girshick, R.; Gupta, A.; He, K. Non-local Neural Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 7794–7803. [Google Scholar]
Hou, Q.; Zhou, D.; Feng, J. Coordinate Attention for Efficient Mobile Network Design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 19–25 June 2021; pp. 13708–13717. [Google Scholar]
Liu, Y.; Zhai, J. The Representation and Identification of Spatial Graphics for Random Point Cluster. Sci. Surv. Mapp. 2005, 4, 39–42+4. [Google Scholar]
Liu, T.; Du, Q.; Yan, H. Spatial Similarity Assessment of Point Clusters. Geomat. Inf. Sci. Wuhan Univ. 2011, 36, 1149–1153. [Google Scholar]
Liang, Z.; Xie, H.; Xie, G. Study on the Calculation Model of Spatial Grouped Point Object Similarity and its Application. Bull. Surv. Mapp. 2016, 3, 111–114. [Google Scholar]
Liu, Z.; Yuan, L.; Weng, L.; Yang, Y. A High Resolution Optical Satellite Image Dataset for Ship Recognition and Some New Baselines. In Proceedings of the 6th International Conference on Pattern Recognition Applications and Methods (ICPRAM), Porto, Portugal, 24–26 February 2017; pp. 324–331. [Google Scholar]
Deng, C. Research on Collective Motion Analysis and Recognition. Ph.D. Thesis, Huazhong University of Science and Technology, Wuhan, China, 2016. [Google Scholar]
Wu, Y.; Xue, H.; Yin, D. Target Clustering in Naval Battlefield Environment Based on DBSCAN Algorithm. J. Nav. Univ. Eng. 2023, 35, 71–76. [Google Scholar]
Liu, Z.; Hu, J.; Weng, L.; Yang, Y. Rotated Region Based CNN for Ship Detection. In Proceedings of the 24th IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 900–904. [Google Scholar]
Lin, T.; Goyal, P.; Girshick, R.; He, K.; Dollar, P. Focal Loss for Dense Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 318–327. [Google Scholar] [CrossRef] [PubMed]
Ding, J.; Xue, N.; Long, Y.; Xia, G.S.; Lu, Q.; Soc, I. Learning RoI Transformer for Oriented Object Detection in Aerial Images. In Proceedings of the 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 2844–2853. [Google Scholar]
Qian, W.; Yang, X.; Peng, S.; Yan, J.; Guo, Y. Learning Modulated Loss for Rotated Object Detection. In Proceedings of the 35th AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; pp. 2458–2466. [Google Scholar]
Newell, A.; Yang, K.; Deng, J. Stacked Hourglass Networks for Human Pose Estimation. In Proceedings of the 14th European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 8–16 October 2019; pp. 483–499. [Google Scholar]
Yang, X.; Yan, J.; Feng, Z.; He, T. R3Det: Refined Single-stage Detector with Feature Refinement for Rotating Object. In Proceedings of the 35th AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; pp. 3163–3171. [Google Scholar]
Qiu, Z.; Ni, L.; Yao, T.; Liang, J.; Yang, D.; Wang, J. Research on Air Target Classification Method Based on Louvain Algorithm. J. Gun Launch Control. 2023, 1–9. [Google Scholar] [CrossRef]
Li, R.; Wang, S.; Zhu, F.; Huang, J. Adaptive Graph Convolution Neural Networks. arXiv 2018, arXiv:1801.03226. [Google Scholar]
Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Liò, P.; Bengio, Y. Graph Attention Networks. arXiv 2018, arXiv:1710.10903. [Google Scholar]

Figure 1. Examples of ship datasets in remote-sensing images. (a) The remote-sensing images are large in scale and require cropping during training. (b) Ships only occupy a small portion of the area in remote-sensing images, and the target features may be lost or obscured after multiple down-sampling operations. (c) The images have cluttered backgrounds (such as islands, port containers, dry docks, and other land targets), making locating the ships in complex backgrounds difficult.

Figure 2. The overall framework of the proposed method. It adopts the center-point-based detection network to detect ships and get position information

(c_{x}, c_{y}, w, h, θ)

, while designing the context-aware DGCN to recognize the ship formation.

Figure 2. The overall framework of the proposed method. It adopts the center-point-based detection network to detect ships and get position information

(c_{x}, c_{y}, w, h, θ)

, while designing the context-aware DGCN to recognize the ship formation.

Figure 3. The center-point-based detection model consists of feature extraction and center-point detection modules. The output is the position coordinates and angle information.

Figure 4. The context-aware dense graph convolution network for ship formation recognition. The feature similarity clustering method is mainly used for ship grouping. The Delaunay triangulation serves as the graph structure of the ship formation. The DGCN aggregates and outputs features for formation classification. In the whole presentation, the graph nodes are treated as identical, and they have nothing to do with color change.

Figure 5. Angle diagram of ship similarity calculation.

Figure 6. The graph structure of ship formation based on the Delaunay triangulation. The input is the position coordinates of the center point within the ship formation. The output is the graph structure representation of ship formation for the downstream classification task.

Figure 7. The illustration of a three-layer Locality Preserving GCN.

Figure 8. Some typical samples from the HRSC2016 and SGF datasets. (a) Examples from the HRSC2016 dataset; (b) examples from the SGF dataset.

Figure 9. The standard formation of the ship group. (a–f) are the six different ship formations arranged with different configurations, and they are named Formation1 to Formation6. CV, CG, DD, FFG, and SSN represent aircraft carriers, cruisers, destroyers, frigates, and nuclear submarines, respectively.

Figure 10. Qualitative detection results on the HRSC2016 and SGF datasets with different methods. (a) The detection results of

R^{2} CNN

; (b) the detection results of

R^{3} Det

; (c) the detection results of the proposed method.

Figure 10. Qualitative detection results on the HRSC2016 and SGF datasets with different methods. (a) The detection results of

R^{2} CNN

; (b) the detection results of

R^{3} Det

; (c) the detection results of the proposed method.

Figure 11. The test results of center structure recognition and isolated ship detection. The (a–f) represent the different ship groups on the SGF dataset. The first row is the visual results of ship center point detection, and the second row plots the test result. Among them, each graph in the second row represents the identified group central structure and marks other isolated ship targets that do not belong to the group central structure. For example, in (a), the yellow dots clearly form the central point structure of the ship groups, and the blue triangles represent isolated ships.

Figure 12. The visual test results for ship grouping based on feature similarity clustering. The first row shows the visual results of ship center point detection, including different ship groups and some isolated ships. The second row represents the distribution of ship centers. The last row is the distribution of the proposed grouping method. Different colors display the grouping results of the clustering change processes.

Figure 13. The formation structure representation based on the Delaunay triangulation. The first row shows the visual results of ship grouping. The second row represents the distribution of ship centers. The last row is the graph structure of the different formations.

Figure 14. The qualitative experimental results of formation recognition. (a) The recognition result of formation 1. (b) The recognition result of formation 2. (c) The recognition result of formation 6. (d) The recognition result of formation 5. (e) The recognition result of formations 2 and 6. (f) The recognition result of formations 2 and 3. The yellow circles are the center points of the ships, and the groups connected in green represent ship formations.

Figure 15. The ship groups and its peripheral contour. (a–c) show the different identified formations, and (d–f) display the convex hull (green) and the outer quadrilateral (orange) of ship formations.

Table 1. Experimental results of different detection algorithms on the HRSC2016 dataset.

Parameter	Detection Methods
Parameter	$R^{2} CNN$	PPRN	$R^{2} PN$	ROI-Trans	RSDET	CenterNet-Rbb	Ours ¹
Backbone	Resnet101	Resnet50	VGG16	Resnet50	Resnet50	Hourglass	DLA
Image-size	800 × 800	800 × 800	800 × 800	512 × 800	800 × 800	1024 × 1024	1024 × 1024
$m A P_{0.8}$	65.43	71.65	72.35	74.42	75.35	73.62	77.06
$m A P_{0.7}$	67.35	73.52	74.26	76.39	77.21	75.46	78.45
$m A P_{0.6}$	69.25	75.44	76.21	78.28	79.19	77.35	79.89
$m A P_{0.5}$	71.17	77.38	77.60	79.72	80.37	78.66	81.2
F1 score	0.75	0.82	0.83	0.86	0.87	0.84	0.89
FPS	5	1.5	-	6	15.4	-	17.8

¹ Tables stands the proposed method based on the center point.

Table 2. Experimental results of different detection algorithms on the SGF dataset. Bold represents the best result.

Performance Parameter	Detection Methods
Performance Parameter	$R^{2} CNN$	RetinaNet-Rbb	SCRDet	CSL	CenterNet-Rbb	Ours ¹
Backbone	Resnet50	Resnet50	Resnet50	Resnet50	Hourglass	DLA
Image-size	512 × 512	800 × 800	800 × 800	512 × 800	1024 × 1024	1024 × 1024
$m A P_{0.8}$	77.67	76.18	77.49	75.38	78.96	81.62
$m A P_{0.7}$	79.45	77.86	79.23	77.12	80.72	83.38
$m A P_{0.6}$	81.23	79.54	80.97	78.85	82.48	85.13
$m A P_{0.5}$	83.02	81.32	82.78	80.63	84.29	86.92
F1 score	0.89	0.87	0.88	0.86	0.90	0.91
FPS	12.3	32.6	10.3	12.5	14.7	28.2

¹ Tables stands the proposed method based on the center point.

Table 3. Performance comparison of each attention module on the HRSC2016 dataset.

Self-Attention	Backbone	Image-Size	mAP
+SE [46]	DLA34	512 × 512	80.92
+CBAM [47]	DLA34	512 × 512	81.12
+CA	DLA34	512 × 512	81.25

Table 4. Description of data set for ship clustering correctness test.

Number	Number of Ship Groups	Central Structure of Ship Groups	Isolated Ships	Total Number of Ships	Number of Clustering Correctness
1	1	Ship Group 1	0	12	12
2	1	Ship Group 4	0	17	16
3	2	Ship Groups 2/3	0	23	22
4	1	Ship Group 6	6	26	22
5	2	Ship Groups 2/5	13	46	42
6	1	Ship Group 5	7	26	22

Table 5. Results of ship clustering accuracy rate (CAR). Numbers 1–3 present the data set without isolated ships, and Numbers 4–6 show the data set with isolated ships.

Method\Number	1	2	3	4	5	6
K-means	100	88.63	85.67	79.61	83.34	78.36
K-means++	100	91.23	89.12	81.39	87.65	82.75
DBSCAN [55]	100	89.96	87.54	79.95	84.47	79.36
AGNES [55]	100	89.42	86.31	78.96	83.66	78.93
Louvain [62]	100	91.49	88.68	80.98	88.37	83.84
The proposed	100	91.12	90.62	82.62	90.30	83.62

Table 6. The quantitative comparison results of the different recognition methods. It mainly consists of the method based on image-level recognition and the method based on graph data recognition.

Methods		$F_{m C A R}$ (100%)
The method based on image-level recognition	VGG-16	53.26
	ResNet-50	55.32
	ResNet-101	56.21
The method based on graph data recognition	TSC	68.57
	AGCN	74.18
	GAT	73.34
	Ours	75.59

Table 7. Statistics for similarity factor calculations.

Ship Group	Topological Neighbors	Area of Convex Hull/S	Length of the External Rectangle/X	Width of External Rectangle/Y
Ship Group 1	46	99,585	376	398
Ship Group 2	53	113,665	435	402
Ship Group 3	28	30,806	276	279

Table 8. The results of formation recognition based on the topology similarity of the ship group.

Ship Groups	$S I M_{T o p o}$	$S I M_{A r e a}$	$S I M_{t h i c}$	$S I M_{s p}$	$S I M$
Ship Group 2\3	0.862	0.309	0.604	0.896	0.616
Ship Group 1\2	0.860	0.876	0.861	0.914	0.877
Ship Group 1\3	0.862	0.309	0.604	0.896	0.616

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, T.; Yang, X.; Lu, R.; Xie, X.; Wang, S.; Su, S. Context-Aware DGCN-Based Ship Formation Recognition in Remote Sensing Images. Remote Sens. 2024, 16, 3435. https://doi.org/10.3390/rs16183435

AMA Style

Zhang T, Yang X, Lu R, Xie X, Wang S, Su S. Context-Aware DGCN-Based Ship Formation Recognition in Remote Sensing Images. Remote Sensing. 2024; 16(18):3435. https://doi.org/10.3390/rs16183435

Chicago/Turabian Style

Zhang, Tao, Xiaogang Yang, Ruitao Lu, Xueli Xie, Siyu Wang, and Shuang Su. 2024. "Context-Aware DGCN-Based Ship Formation Recognition in Remote Sensing Images" Remote Sensing 16, no. 18: 3435. https://doi.org/10.3390/rs16183435

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Context-Aware DGCN-Based Ship Formation Recognition in Remote Sensing Images

Abstract

1. Introduction

2. Related Work

2.1. Object Detection

2.2. Ship Target Detection

2.3. Formation Recognition

3. Materials and Methods

3.1. Overview

3.2. Center-Point-Based Method for Ship Detection

3.2.1. Feature Extraction

3.2.2. Center Point Based Detection

3.3. Context-Aware DGCN for Ship Formation Recognition

3.3.1. Ship Grouping Based on Feature Similarity Clustering

3.3.2. Formation Structure Representation Based on Delaunay Triangulation

3.3.3. Formation Classification Based on Context-Aware DGCN

4. Experimental Results and Analysis

4.1. Datasets and Annotation

4.2. Experiment Details and Evaluation Index

4.3. Experiment Results of Ship Detection

4.4. Experiment Results of Formation Recognition

4.4.1. Ship Grouping and Formation Structure Representation

4.4.2. Formation Classification

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI