*Article* **MSR2N: Multi-Stage Rotational Region Based Network for Arbitrary-Oriented Ship Detection in SAR Images**

#### **Zhenru Pan 1,2,\*, Rong Yang 1,2 and Zhimin Zhang <sup>1</sup>**


Received: 25 March 2020; Accepted: 17 April 2020; Published: 20 April 2020

**Abstract:** In synthetic aperture radar (SAR) images, ships are often arbitrary-oriented and densely arranged in complex backgrounds, posing enormous challenges for ship detection. However, most existing methods detect ships with horizontal bounding boxes, which leads to the redundancy of detected regions. Furthermore, the high Intersection-over-Union (IoU) between two horizontal bounding boxes of densely arranged ships can cause missing detection. In this paper, a multi-stage rotational region based network (MSR2N) is proposed to solve the above problems. In MSR2N, the rotated bounding boxes, which can reduce background noise and prevent missing detection caused by high IoUs, are utilized to represent ship regions. MSR2N consists of three modules: feature pyramid network (FPN), rotational region proposal network (RRPN), and multi-stage rotational detection network (MSRDN). First of all, the FPN is applied to combine high-resolution features with semantically strong features. Second, in RRPN, a rotation-angle-dependent strategy is employed to generate multi-angle anchors which can represent arbitrary-oriented ship regions more felicitously than horizontal anchors. Finally, the MSRDN with three sub-networks is proposed to regress proposals of ship regions stage by stage. Meanwhile, the incrementally increasing IoU thresholds are selected for resampling positive and negative proposals in sequential stages of MSRDN, which eliminates close false positive proposals successively. With the above characteristics, MSR2N is more suitable and robust for ship detection in SAR images. The experimental results on SAR ship detection dataset (SSDD) show that the MSR2N has achieved state-of-the-art performance.

**Keywords:** synthetic aperture radar (SAR) ship detection; multi-stage rotational region based network (MSR2N); rotated anchor generation; multi-stage rotational detection network (MSRDN)

#### **1. Introduction**

Ship detection is one of the most significant missions of marine surveillance. With the characteristics of working all-weather, all-time [1], and imaging relatively wide areas at constant resolution [2], synthetic aperture radars (SAR) such as Terra-X, COSMOS-SkyMed, RADARSAT-2, Sentinel-1, and GF-3 are widely applied in ship detection [3–7].

Traditional ship detection methods are mainly based on the following three aspects: (1) statistics characteristics [8–11]; (2) wavelet transform [12,13]; and (3) polarization information [14,15]. Among these methods, constant false alarm rate (CFAR) and variants thereof [8–10] are most widely utilized. CFAR detectors adaptively calculate the detection thresholds by estimating the statistics of background clutter and maintain a constant probability of false alarm. However, the determination of detection thresholds depends on the distribution of sea clutter, which is not robust enough for the detection of

multi-scale ships in multi-scenes. On the other hand, CFAR based methods require land masking and post-processing to reduce false alarms, which is insufficiently automated.

Recently, with the rapid development of deep convolutional networks [16–20], great progress has been made in deep convolutional neural networks (CNN)-based object detection [21–33]. Generally, the current CNN-based detectors can be divided into one-stage [25,26,29,31] and two-stage detectors [22–24,28,32,33]. One-stage detectors include you only look once (YOLO) [25] and its derivative versions [34,35], single shot detector (SSD) [26] and RetinaNet [29], et al. YOLO reframes object detection as a regression problem. The input images are divided into *S* × *S* grid cells and then YOLO predicts bounding boxes and class probabilities for each grid cell straightly. SSD generates a set of default boxes over different sizes per feature map location to match the shape of objects better. RetinaNet proposed the focal loss to overcome the extreme foreground-background class imbalance. On the other hand, faster region-based CNN (Faster R-CNN) [23] and region-based fully convolutional networks (R-FCN) [28] are representative two-stage detectors. Faster R-CNN generates anchors of different scales and aspect ratios through the region proposal network (RPN). In addition, then the feature map and proposals rescaled from anchors are fed into the Fast R-CNN sub-network to predict the location and class probabilities of bounding boxes. Different from the per-region sub-network of Faster R-CNN, R-FCN is a fully convolutional network with the shared computation on the entire image. It proposed position-sensitive score maps to address the problem between translation-invariance in image classification and object detection [28]. Feature pyramid network (FPN) [24] combines low-level and high-level features for more comprehensive feature expression, which has outstanding performance on multi-scale object detection. In summary, one-stage detectors show superiority in detection speed benefits from the single network of detection pipeline. However, for accuracy, the two-stage detectors are better than that of one-stage, especially for small dense object detection.

For SAR ship detection, Deep CNNs have been widely applied in recent years. As a typical one-stage detection method, YOLOv2 was utilized to detect ships in SAR imagery [36]. Wang et al. [37] utilized RetinaNet for automatic ship detection of multi-resolution GF-3 imagery. Zhang et al. [38] proposed a lightweight feature optimizing network with lightweight feature extraction and attention mechanism for better feature representation. On the other hand, many two-stage detectors were proposed for higher detection accuracy. Ref. [39] proposed an improved Faster R-CNN for SAR ship detection. A multilayer fusion light-head detector [40] was proposed to improve the detection speed. Jiao et al. [41] proposed a densely connected neural network, which utilizes a modified FPN, for multi-scale and multi-scene ship detection. Ref. [42–45] added attention mechanisms into CNNs as attention mechanisms adaptively recalibrate feature responses to increase representation power [19,46].

Though many CNN-based methods have been proposed for SAR ship detection, they still encounter bottlenecks on the following issues: (1) Quite different from natural images, in SAR images, strip-like ships are often presented under bird's eye perspectives with various rotation angles and densely arranged in an inshore complex background, as shown in Figure 1a. A ship with an inclined angle leads to a relatively large redundancy region, which would introduce background noise. Moreover, two horizontal bounding boxes of densely arranged ships have a high Intersection-over-Union (IoU) leading to missing detection after the non-maximum suppression (NMS) operation [47,48]. Under these circumstances, the limited capacity of detection with horizontal bounding boxes would be exposed. (2) In R-CNN based object detection methods, an IoU threshold is utilized to distinguish positive and negative samples in the Fast R-CNN sub-network. A relatively low IoU threshold will result in a high recall but a low precision due to the generation of noisy bounding boxes. On the contrary, a relatively high IoU threshold leads to inadequate positive samples. In this case, the overfitting model will cause missing detection. Figure 2 shows the detection results under IoU thresholds of 0.5 and 0.7, respectively. Some noisy background regions are detected as ship regions in Figure 2a, and the missing detection appears in Figure 2b.

**Figure 1.** Densely arranged ships in an inshore complex background. (**a**) marking ship regions with horizontal bounding boxes; (**b**) marking ship regions with rotated bounding boxes.

(a) (b)

**Figure 2.** Detection results under different IoU thresholds; (**a**) detection results under a IoU threshold of 0.5; (**b**) detection results under a IoU threshold of 0.7.

Due to the inherent drawback of horizontal bounding boxes, rotated bounding boxes were gradually developed in optical remote sensing [47–50]. Ref. [47] proposed a rotation dense feature pyramid network (R-DFPN) in which the dense FPN and multiscale region of interest (ROI) align are used to detect ships in different scenes. In [48], a multi-category rotation detector was proposed for small, cluttered and rotated objects. Li et al. [49] proposed a rotatable region-based residual network (R3-Net) for multi-oriented vehicle detection. The ROI Transformer [50] that is lightweight was proposed to decrease the computational complexity.

To address the problems in SAR ship detection, a multi-stage rotational region based network (MSR2N) is proposed for arbitrary-oriented ship detection. As shown in Figure 1b, rotated bounding boxes can locate ships more accurately with less redundant noise background, and would not overlap with each other even in a dense arrangement. Therefore, the rotated bounding box representation is utilized in this paper. In the feature extraction module, we apply the FPN to fuse the high-resolution features and semantically strong features from the backbone network, which enhances feature representation. To generate rotated anchors and proposals, a rotational RPN (RRPN) is utilized. In addition, a multi-stage rotational detection network (MSRDN) is applied in MSR2N. The MSRDN,

which contains an initial rotational detection network (IRDN) and two refined rotational detection networks (RRDN), is trained stage by stage. Three increasing IoU thresholds are selected to sample the positive and negative proposals in three stages, respectively. The increasing IoU thresholds guarantee sufficient positive samples to avoid overfitting, and reduce close false positive proposals in the meantime. Compared to other methods, the proposed MSR2N achieves state-of-the-art performance on SAR ship detection, especially for densely arranged ships in inshore complex backgrounds. The main contributions of proposed MSR2N are enumerated as follows:


This paper is organized as follows. Section 2 describes the proposed MSR2N in detail. In Section 3, the ablation and comparative experiments are carried out on SSDD, which verifies the effectiveness of proposed MSR2N. Section 4 draws the conclusions for this paper.

#### **2. Proposed Approach**

The overall framework of the proposed MSR2N is illustrated in Figure 3. The whole framework can be divided into three modules: FPN, RRPN, and MSRDN. The FPN fuses feature maps from different layers of the backbone to generate the feature pyramid. In the RRPN module, a rotated anchor generation strategy is utilized to produce anchors with various rotation angles. The RRPN head outputs the softmax probabilities of being a ship and the regression offsets that encode the coordinates of coarse proposals. Millions of proposals are generated, but most of them are redundant noisy proposals and the amount of computation is huge. Hence, we select *Npre* proposals with the highest probabilities per feature pyramid level and perform the skew NMS operation [51] on the selected proposals to obtain *Npost* final proposals which are fed into the next MSRDN. The MSRDN containing the IRDN and two RRDNs is trained stage by stage with three increasing IoU thresholds. During training, the MSR2N is optimized under the supervision of softmax probabilities and rotated boxes regression offsets from RRPN and each stage of MSRDN. For the inference time, the final refined proposals from the last stage of MSRDN are selected by a probability threshold and sampled by the skew NMS operation. Finally, the predicted bounding boxes are obtained.

#### *2.1. Feature Pyramid Network*

In this paper, we select ResNet50 [17] as the backbone network, as shown in Figure 3. To extract sufficient ship features in SAR images, especially for small ships, the FPN [24] is utilized to combine the high-resolution, semantically weak features with low-resolution, and semantically strong features. As illustrated in Figure 4, the FPN contains a bottom-up pathway, a top-down pathway, lateral connections, and output convolutions. The outputs of ResNet50s last residual blocks, denoted as {C2, C3, C4, C5}, are chosen as the bottom-up hierarchy. The strides of {C2, C3, C4, C5} are {4, 8, 16, 32} pixels with respect to the input image. The top-down pathway upsamples the semantically stronger features by a factor of 2 and fuses them with the spatially finer features from the bottom-up pathway by element-wise addition. The lateral connections are 1 × 1 convolution layers with 256-channels, which

reduces channel dimensions. In addition, 3 × 3 convolution layers with 256-channels are utilized as the output layers to reduce the aliasing effect of upsampling. The final feature maps of FPN denoted as {P2, P3, P4, P5}, are fed into the following stages.

**Figure 3.** Overall framework of MSR2N.

#### *2.2. Parameterization of Rotated Bounding Box*

In the conventional SAR ship detection networks, the ship region is a horizontal rectangle, which is represented by four parameters (*xmin*, *ymin*, *xmax*, *ymax*). (*xmin*, *ymin*) and (*xmax*, *ymax*) denote the coordinates of the top left and bottom right corners of a bounding box, respectively. For a rotated bounding box, five parameters (*x*, *y*, *w*, *h*, *θ*) are conducted. The coordinate (*x*, *y*) represents the center of the rotated bounding box. To avert the disorder of coordinate expression, we set *w* and *h* as the longer side and shorter side of the rotated bounding box. *θ* denotes the angle from the positive direction of the *x*-axis counterclockwise to the longer side of the rotated bounding box. As half angular space is enough for the description of the rotation angle, the range of *θ* is set to [−90◦, 90◦). Figure 5 shows the geometric representation of two rotated bounding boxes with different rotation angles.

**Figure 5.** Geometric representation of two rotated bounding boxes. The rotation angle *θ* of the bounding box on the left is 60◦, and the rotation angle *θ* of the bounding box on the right is −45◦.

#### *2.3. Rotated Anchors and Proposals*

In [24], multi-scale anchors with multiple aspect ratios are generated on the SAR images with respect to the feature pyramid. However, the generation of horizontal anchors is not sufficient enough for arbitrary-oriented bounding boxes. On the other hand, the characteristic of ships with large aspect ratios should be taken into consideration. Corresponding to the representation of rotated bounding boxes, we utilize a rotation-angle-dependent strategy to generate rotated anchors. As illustrated in Figure 6, a set of cell anchors are generated at the points of a SAR image with the strides of FPN, respectively. As shown in Figure 7, three aspects are taken into consideration in the cell anchor generation. First, referring to [24], the scales of anchors are set to {162, 322, 642, 1282} pixels on the feature pyramid levels of {P2, P3, P4, P5}, respectively. Second, the large aspect ratios {2, 5, 8} replace the usual aspect ratios {0.5, 1, 2} in [23,24] with a view to the characteristic of strip-like ships. Finally, we set multiple rotation angles for anchors of various scales and aspect ratios, which can represent proposals more effectively. Six rotation angles {−75◦, −45◦, −15◦, 15◦, 45◦, 75◦} are set in consideration of the trade-off between half angular coverage and computational efficiency. Hence, each point per level of feature pyramid generates 18 (1 × 3 × 6) anchors. As shown in Figure 3, the classification layer outputs 36 (2 × 18) softmax probabilities of being a ship and background, and the regression layer outputs 90 (5 × 18) regression offsets that encode the coordinates (*x*, *y*, *w*, *h*, *θ*) for 18 proposals per position.

#### *2.4. Multi-Stage Rotational Detection Network*

#### 2.4.1. Initial Rotational Detection Network

As shown in Figure 3, the feature pyramid {P2, P3, P4, P5} and the coarse proposals produced from RRPN are both fed into the next stage called IRDN. Different from the conventional object detection network, IRDN aims to detect rotated proposals in this paper. *NRROI* proposals are randomly sampled as an RROI mini-batch in which an IoU threshold is chosen to distinguish the positive and negative samples. Here, we utilize the skew IoU computation [51] to compute IoU between rotated bounding boxes. If the IoU between a proposal and any ground-truth box is higher than the threshold, the proposal is assigned to a positive sample and vice versa. The ratio of positive and negative proposals is 1:3. In the RROI mini-batch, if the number of positive proposals is less than 25% of *NRROI*, the negative proposals should be padded.

As illustrated in Figure 8, the rotational ROI (RROI) align layer is applied for arbitrary-oriented proposals. It converts the feature map of RROIs into fixed spatial feature maps. An RROI is divided into 7 × 7 sub-regions by the RROI align layer, and the max pooling is operated in each sub-region. Then, the feature map of an RROI with a fixed spatial extent of 7 × 7 can be obtained. The feature maps of RROIs are input to two successive fully connected (FC) layers with a dimension of 1024. Two 1 × 1 convolutional layers for classification and regression are attached to the FC layers. The classification layer outputs softmax probabilities of being a ship and background, and the regression layer outputs the regression offsets that encode the coordinates (*x*, *y*, *w*, *h*, *θ*) of refined proposals.

#### 2.4.2. Multi-Stage Detector

As mentioned in Section 1, a single IoU threshold in IRDN can't separate positive and negative samples properly, which can cause false and missing detection. Inspired by [52], a multi-stage rotational detection strategy is proposed. As illustrated in Figure 3, the MSRDN contains an IRDN and two RRDNs which share the same structure with IRDN. The MSRDN is a multi-stage regression framework, which successively resamples proposals by increasing IoU thresholds. In the first stage of MSRDN, the distribution of coarse proposals is set by a low IoU threshold, which guarantees enough positive samples. The feature pyramid and sampled coarse proposals are fed into the IRDN to produce the refined proposals. In the second stage, the refined proposals from the first stage are resampled by a medium IoU threshold. These sampled refined proposals and the feature pyramid are fed into the RRDN which outputs new refined proposals. In the third stage, the repetitive procedures occur with a high IoU threshold. The process of MSRDN can be expressed as:

$$MRRDN(f, d^1) = IRDN(f, d^1) \circ RRDN\_1(f, d^2) \circ RRDN\_2(f, d^3) \tag{1}$$

where *f* denotes the feature maps from FPN, *d*<sup>1</sup> is the initial sampled distribution of proposals in IRDN, *<sup>d</sup>*<sup>2</sup> and *<sup>d</sup>*<sup>3</sup> are respectively the resampled distribution of proposals in two RRDNs, and ◦ represents the cascade operation. The sub-networks are optimized by the resampled distributions {*d*1, *<sup>d</sup>*2, *<sup>d</sup>*3}, respectively. With a sequence of increasing IoU thresholds, the number of close false positive proposals are sequentially decreased, and the rotated bounding boxes regression is more accurate.

**Figure 8.** Structure of IRDN.

#### *2.5. Multi-Stage Loss Computation*

During the training of RRPN, a sampling strategy for rotated anchors is used. A mini-batch of *NRRPN* anchors is selected for the loss computation, where the ratio of positive and negative samples is 1:1. In the mini-batch, if positive samples are not enough, the negative samples should be padded. The positive samples are defined by following rules: (i) the IoU between the anchor and any ground-truth is larger than 0.7; and (ii) an angular difference between the anchor and the ground-truth is smaller than 15◦. The negative samples should satisfy the following rules: (i) the IoU between the anchor and any ground-truth is lower than 0.3; or (ii) the IoU between the anchor and a ground-truth is larger than 0.7, while the angular difference is larger than 15◦. We minimize an objective function of RRPN with the multi-task loss [22] defined as:

$$L\_{RRPN}(p\_{i\cdot}, p\_i^\*, t\_i, t\_I^\*) = \frac{1}{N\_{cls}} \sum\_i L\_{cls}(p\_i, p\_i^\*) + \lambda \frac{1}{N\_{reg}} \sum\_i p\_i^\* L\_{reg}(t\_i, t\_i^\*). \tag{2}$$

Here, *i* denotes the index of samples in a mini-batch, *p*∗ *<sup>i</sup>* is the label of sample *i*, where *p*<sup>∗</sup> *<sup>i</sup>* equals 1 if the sample is a positive one and *p*∗ *<sup>i</sup>* equals 0 if the sample is a negative one. *pi* is the predicted softmax probablility of sample *i* being a ship. *t* ∗ *<sup>i</sup>* represents the offsets between sample *i* and the assigned ground-truth, and *ti* represents the predicted bounding box regression offsets. In Equation (2), only if sample *i* is a positive one with *p*∗ *<sup>i</sup>* = 1 is the regression loss *Lreg* calculated. The classification loss *Lcls* is log loss which is defined as:

$$L\_{\rm cls}(p\_i, p\_i^\*) = -\log p\_i p\_i^\*.\tag{3}$$

The regression loss is defined by the robust loss function smooth L1 as:

$$L\_{reg}(t\_i, t\_i^\*) = smooth\_{L1}(t\_i - t\_i^\*),\tag{4}$$

$$smooth\_{L1}(\mathbf{x}) = \begin{cases} \quad 0.5\mathbf{x}^2, & |\mathbf{x}| < 1\\ |\mathbf{x}| - 0.5, & otherwise \end{cases}.\tag{5}$$

The classification loss *Lcls* and regression loss *Lreg* are respectively normalized by *Ncls* and *Nreg*, where *Ncls* and *Nreg* are both set to the mini-batch size. The regression loss *Lreg* is weighted by a balancing parameter *λ*, which is set to 1 in the experiments.

For rotated bounding box regression, the five parameterized coordinate regression offsets are defined as:

$$\begin{aligned} t\_x &= (\mathbf{x} - \mathbf{x}\_a) / w\_a, t\_y = (y - y\_a) / h\_{a\prime} \\ t\_w &= \log(w / w\_a), t\_{\parallel} = \log(h / h\_a), \end{aligned}$$

$$\begin{aligned} t\_\theta &= \theta - \theta\_a + k\pi, \\ t\_x^\* &= (\mathbf{x}^\* - \mathbf{x}\_a) / w\_a, t\_y = (y^\* - y\_a) / h\_{a\prime} \\ t\_w^\* &= \log(w^\* / w\_a), t\_{\parallel}^\* = \log(h^\* / h\_a), \end{aligned} \tag{6}$$

$$\begin{aligned} t\_\theta^\* &= \theta^\* - \theta\_a + k\pi, \end{aligned} \tag{7}$$

where *x*, *y*, *w*, *h* and *θ* denote the center coordinates, width, height, and rotation angle of a rotated bounding box. *x*, *xa*, *x*∗ are respectively for the predicted box, anchor box, and ground-truth box, likewise for *y*, *w*, *h*, *θ*. The parameter *k* ∈ *Z* keeps *θ* in the range of [−180◦, 180◦).

As described in Section 2.4, during the training of MSRDN, the proposals are resampled into a mini-batch and refined by a sequence of rotational detection sub-networks. Meanwhile, each sub-network outputs the softmax probabilities of a proposal being a ship and rotated bounding box regression offsets for loss computation. The multi-task multi-stage loss function of MSRDN is defined as:

$$L\_{MSRDN} = L\_{IRDN} + L\_{RRDN\_1} + L\_{RRDN\_2} \tag{7}$$

where *LIRDN*, *LRRDN*<sup>1</sup> , and *LRRDN*<sup>2</sup> represent the losses of IRDN, first RRDN and second RRDN, respectively. The losses of three stages have the same definition as the multi-task loss function of RRPN.

#### **3. Experiments and Results**

#### *3.1. Dataset*

We evaluate the proposed framework on the public SAR ship detection dataset—SSDD [39]. The SSDD contains SAR ship images of various scenes from three sensors. The details of SSDD are shown in Table 1. There are 1160 SAR images including 2456 ships in total, with an average of 2.12 ships per image. Ships in SSDD are annotated with coordinates of the four vertices of rotated bounding boxes. The SSDD is divided into a training set and a test set by a ratio of 4:1.



#### *3.2. Setup and Implementation Details*

The experiments are implemented in the deep learning framework Pytorch [53] on four NVIDIA TITIAN Xp GPUs with 12 GB memory. Our network is initialized by the pre-trained ResNet50 model for ImageNet classification. The whole network is trained for 12 k iterations in total, and a batch of 48 images is fed into the network in an iteration. The initial learning rate is 0.01 for the first 8k iterations; then, learning rates of 0.001 and 0.0001 are set for the next 2.5 k iterations and the remaining 1.5 k iterations, respectively. The stochastic gradient descent (SGD) [54] optimizer with a weight decay of 0.0001 and a momentum of 0.9 is selected for the model.

Referring to faster R-CNN, we resize images such that the shorter side is 350 pixels under the premise that ensures the longer side less than 800 pixels. For data augmentation, we flip the images horizontally with a probability of 0.5.

As mentioned in Section 2, *Npre* denotes the number of pre-selected proposals per feature pyramid level, and *Npost* represents the number of post-selected proposals. Here, we set *Npre* to 2000 and 1000 for training and inference, respectively. In addition, *Npost* is set to 1000 for both training and inference. *NRRPN* and *NRROI*, which are the sizes of RRPN and RROI mini-batches, are set to 256 and 512, respectively. In inference time, an NMS threshold is used to select final predicted bounding boxes, and a predicted bounding box can be retained if its score is higher than the score threshold. The NMS threshold and score threshold are respectively set to 0.3 and 0.1 in all experiments.

#### *3.3. Evaluation Metrics*

To quantitatively evaluate the performance of proposed MSR2N, the precision, recall, mean Average Precision (mAP), and F-measure (F1) are utilized as evaluation metrics.

Precision is the rate that correctly detected ships in all detected results, and recall means the rate that correctly detected ships in all ground-truths. The definitions of precision and recall are as follows:

$$Precision = \frac{TP}{TP + FP} \tag{8}$$

$$Recall = \frac{TP}{TP + FN} \tag{9}$$

where *TP*, *FP*, and *FN* denote the number of correctly detected ships, false alarms, and undetected ships, respectively. In addition, a bounding box is admitted as a correctly detected ship in the case that the IoU between the bounding box and a ground-truth is higher than the threshold of 0.5. The precision–recall (PR) curve shows the precision–recall pairs at different confidence score thresholds.

The *mAP* is a comprehensive metric that calculates the average value of precision under the recall in a range of [0, 1]. The definition of mAP is as follows:

$$mAP = \int\_0^1 P(R)dR\tag{10}$$

where *R* denotes a recall value and *P* represents the precision corresponding to a recall.

*F1* evaluates the comprehensive performance of a detector by taking the precision and recall into consideration. *F1* is defined as:

$$F1 = \frac{2 \ast Precision \ast Recall}{Precision + Recall} . \tag{11}$$

#### *3.4. Ablation Study*

In this part, some ablation experiments are carried out to investigate the effectiveness of the main modules of the proposed MSR2N.

#### 3.4.1. Effect of MSRDN

To illustrate the detection performance of MSRDN, the experimental results are listed in Table 2. We can find that the single-stage detectors have poor detection performance. The detector with a low IoU threshold of 0.5 achieves a high recall but a low precision due to the introduction of noisy regions. With the increase of the IoU threshold, the precision increases, but the recall decreases. When the IoU threshold reaches 0.7, the recall decrease to 84.50% because of the overfitting caused by inadequate positive proposals. When using the two-stage detectors with an IRDN and an RRDN, they can achieve better detection performance than the single-stage ones. The detection performance of three-stage detector with the increasing IoU thresholds of {0.5, 0.6, 0.7} is significantly improved, which achieves the recall of 92.05%, precision of 86.52%, mAP of 90.11%, and F1 of 89.20%. These experiments demonstrate the effectiveness of multiple stages of MSRDN. Comparing to the three-stage detectors with uniform thresholds, the detector with the increasing IoU thresholds still has superior performance, which proves the effectiveness of increasing IoU thresholds of MSRDN. With the multi-stage regression, the number of close false positive proposals is decreased and the location of proposals is more accurate in the meantime. Therefore, the MSRDN with the increasing IoU thresholds of {0.5, 0.6, 0.7} is chosen in this paper.


**Table 2.** Experimental results of MSRDN.

#### 3.4.2. Effect of Multi-Stage Loss Computation

The experimental results of multi-stage loss computation are presented in Table 3. It can be observed that the detection performance with three-stage loss computation is preferable, and the experiment only using loss from the last stage obtains the inferior detection performance. With the multi-stage loss used, the precision increases significantly, which means that the false alarms are effectively suppressed. We can conclude that the multi-stage loss computation has a strong constraint on the training of the whole network.


**Table 3.** Experimental results of multi-stage loss computation.

#### 3.4.3. Effect of Rotation Angles

Some experiments about rotation angles are carried out, and their results are shown in Table 4. It can be seen in Table 4 that the method which only generates horizontal and vertical anchors achieves inferior detection performance. As the interval of rotation angles decreases, the detection performance improves. The method with the rotation angles of {−75◦, −45◦, −15◦, 15◦, 45◦, 75◦} obtains superior detection performance, which proves the effectiveness of anchors with different rotation angles.


**Table 4.** Experimental results of rotation angles.

#### 3.4.4. Effect of FPN

A couple of ablation experiments are performed to demonstrate the effect of FPN in the proposed MSR2N, wherein the experiment of MSR2N uses the feature pyramid {P2, P3, P4, P5} as the outputs of feature extraction module, and the experiment of MSR2N without FPN only uses C5 of ResNet50 to feed into the next stages. Table 5 shows the experimental results of the FPN. Comparing to MSR2N without FPN, our MSR2N reaches higher evaluation metrics. The significant improvement of detection performance benefits from the sufficient feature expression of FPN.

**Table 5.** Experimental results of FPN.


#### *3.5. Comparison with Other Object Detection Methods*

To verify the performance of our proposed MSR2N, we compare the MSR2N with the multi-stage horizontal region based network (MSHRN), rotational Faster R-CNN [23] (Faster RR-CNN), rotational FPN [24] (R-FPN), and rotational RetinaNet [29] (R-RetinaNet). MSHRN is the horizontal variant of our proposed MSR2N, where the RPN sub-network is used and the ROI pooling layer replaces the RROI align layer for predicting horizontal bounding boxes. Faster RR-CNN and R-FPN are two-stage detectors, which are respectively rotational variants of Faster R-CNN and FPN. The one-stage detector R-RetinaNet is the rotational variant of RetinaNet. In [29], the feature pyramid with levels {P3, P4, P5, P6, P7} is used as the output of the feature extraction module. For consistency, we use the feature pyramid {P2, P3, P4, P5} in the experiment of R-RetinaNet, which is the same as the experiments of R-FPN and MSR2N. Here, ResNet50 is chosen as the backbone network of all methods.

Table 6 summarizes the experimental results of different methods on SSDD, and Figure 9 shows the PR curves of different methods. From Table 6, we can find that MSR2N achieves state-of-the-art performance: 92.05% for recall, 86.52% for precision, 90.11% for mAP, and 89.20% for F1. Comparing to the other four methods, MSR2N obtains 2.27%, 7.89%, 6.45%, and 9.54% gains in mAP, respectively. In Figure 9, the PR curve of MSR2N is higher than those of Faster RR-CNN, R-FPN, and R-RetinaNet. The PR curve of MSHRN only can reach a recall of 90.36%, but MSR2N can achieve a recall of 92.05%. This phenomenon indicates that horizontal region based detection leads to more undetected ships than rotational region based detection. To further verify the capability of handling hard cases, we construct a subset by selecting SAR images, which contain densely arranged ships in inshore complex scenes, from the original test set. Table 7 shows the experimental results of different methods on hard cases. From Table 7, it can be observed that all the values of the evaluation metrics are relatively low, which indicates that these hard cases are difficult to detect correctly. For example, the mAP of R-RetinaNet only reaches 42.91%. Nevertheless, our proposed MSR2N achieves superior performance. Comparing to the other four methods, MSR2N obtains 12.43%, 8.52%, 13.08%, and 28.30% gains in mAP, respectively.

Figure 10 illustrates the detection results of four images from the test set of SSDD using different methods. From the detection results of three inshore SAR images, it can be observed that there exist undetected ships in Figure 10e–g. This is on account of the high IoU between two overlapped horizontal bounding boxes in MSHRN. On the other hand, the background noise in the horizontal bounding boxes can interfere with the detection. In addition, the detection results of Faster RR-CNN, R-FPN, and R-RetinaNet still have problems with missing detection and false detection. On the contrary, MSR2N shows superior detection performance in Figure 10u–w. All ships are successfully detected, and the position of ships is located more accurately than other methods. For the methods of MSHRN, Faster RR-CNN, and R-RetinaNet, there exist several false alarms and undetected ships in the results of fourth SAR images. Generally speaking, all methods are competent to detect ships far from shore. Based on the above discussion, we can conclude that MSR2N has state-of-the-art performance on SAR ship detection, especially for densely arranged ships in complex backgrounds.

**Figure 9.** PR curves of different methods on SSDD.


**Table 6.** Experimental results of different methods on SSDD.

**Table 7.** Experimental results of different methods on hard cases.


**Figure 10.** Detection results of different methods. (**a**–**d**) ground-truths; (**e**–**h**) detection results of MSHRN; (**i**–**l**) detection results of Faster RR-CNN; (**m**–**p**) detection results of R-FPN; (**q**–**t**) detection results of R-RetinaNet; (**u**–**x**) detection results of MSR2N.

#### **4. Conclusions**

This paper proposes an arbitrary-oriented ship detection network called MSR2N which aims at detecting ships in various complex scenes. In our MSR2N, the rotating bounding boxes are utilized to represent ship regions, which can reduce redundant noisy regions and overlaps between bounding boxes. The SSDD is employed to verify the effectiveness of the proposed MSR2N. The ablation experiments illustrate that high precision and high recall can be achieved simultaneously due to the sequential resampling of proposals and training in MSRDN sub-networks. The experiments about loss computation show the significance of multi-stage loss computation. The experiments of rotation angles and FPN are also carried out, which proves the effectiveness of rotation angles and FPN. Compared to other methods, MSR2N obtains superior detection performance, especially when ships are densely arranged in inshore complex scenes.

**Author Contributions:** Z.P. proposed the method and performed the experiments; R.Y. performed the experiments; and Z.Z. supervised the study and reviewed this paper. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by the National Natural Science Foundation of China 61901442.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


c 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **The Single-Shore-Station-Based Position Estimation Method of an Automatic Identification System**

#### **Yi Jiang 1,\* and Kai Zheng <sup>2</sup>**


Received: 31 January 2020; Accepted: 9 March 2020; Published: 12 March 2020

**Abstract:** In order to overcome the vulnerability of the Global Navigation Satellite System (GNSS), the International Maritime Organization (IMO) initiated the ranging mode (R-Mode) of the automatic identification system (AIS) to provide resilient position data. As the existing AIS is a communication system, the number of shore stations as reference stations cannot satisfy positioning requirements. Especially in the area near a shore station, it is very common that a vessel can only receive signals from one shore station, where the traditional positioning method cannot be used. A novel position estimation method using multiple antennas on shipborne equipment is proposed here, which provides a vessel's position even though the vessel can only receive signals from a single shore station. It is beneficial for solving positioning issues in proximity to the coast. Further, as the distances between different antennas to the shore station are not sufficiently independent, the positioning matrix can easily be near singularity or ill-conditioned; thus, an effective position solving method is derived. Furthermore, the proposed method is verified and evaluated in different scenarios by numerical simulation. We assessed the influencing factors of positioning performance, such as the vessel's heading angle, the relative position, and the distances between the shore station and the vessel. The proposed method widely expands the application scope of the AIS R-Mode positioning system.

**Keywords:** position estimation; ranging mode; single shore station; AIS

#### **1. Introduction**

The vulnerability of the Global Navigation Satellite System (GNSS) to both intentional and unintentional jamming and interference is an urgent problem to be solved in the e-Navigation Strategy Implementation Plan (SIP) of the International Maritime Organization (IMO) [1,2]. The automatic identification system (AIS) is widely used for maritime communication, and the IMO has mandated its installation since 2002 in order to avoid collisions [3–6]. By reusing the existing AIS infrastructure, the ranging mode (R-Mode) of AIS has been accepted as one of the alternative GNSS backup navigation systems of the future [7]. The resilient position data, which could be supplied by both satellite and terrestrial-based navigation systems, serve as the foundation for the e-Navigation SIP [8–10]. Its aim is to contribute to safe and reliable navigation at sea, especially for autonomous ship navigation.

Generally speaking, AIS is a very high frequency (VHF)-based communication system. AIS shore stations are the most critical components in a coastal AIS network. They can not only receive signals from shipborne equipment but also transmit signals within the coverage area. AIS R-Mode adds a ranging function without influencing the existing AIS communication capacity. The existing AIS shore stations are considered reference stations. A vessel can receive signals and derive ranging information to itself from shore stations, and as a consequence, the vessel's position can be estimated [11,12]. It should be noted that the positional information in the existing AIS is now derived from GNSS [13]. If GNSS were to fail, the whole AIS would break down. Therefore, AIS R-Mode presents an efficient

solution to overcome the dependence of AIS on GNSS. It will make AIS a comprehensive marine radio system integrated with communication and navigation for the e-navigation strategy. Therefore, many countries and scientists are researching AIS R-Mode. A European consortium of 12 research institutions, administrations, and industries in Germany, Poland, Sweden, and Norway has currently set up an R-Mode testbed in the Baltic Sea to study the feasibility of R-Mode [14,15]. The Maritime Safety Administration (MSA) and Dalian Maritime University have established the first AIS R-Mode testbed in China [11,16]. South Korea has performed some simulations of R-Mode with the integration of eLoran [17]. Further, the International Association of Marine Aids to Navigation and Lighthouse Authorities (IALA) is drafting the standard for R-Mode according to these countries' research results [18].

In AIS R-Mode, shipborne AIS equipment receives VHF signals from AIS shore stations. If using traditional positioning methods in AIS R-Mode, such as the time of arrival (TOA) [19–21] or the time difference of arrival (TDOA) [22–24], the vessel needs to measure the signal transmission delay or relative delay. Then, the distance or distance difference can be obtained by multiplying it by the speed of light in the free space. Finally, the vessel's position can be estimated according to the positioning equation. However, both methods require receiving signals from at least three different AIS shore stations. The existing AIS, though, is set up for communication without considering the requirement of positioning. Hence, a vessel cannot receive signals from three different AIS shore stations at the same time in some sea areas [25]. Therefore, the traditional positioning methods are not feasible. In this situation, if the vessel can receive signals from two AIS shore stations, a displacement correction position estimation method can be used to estimate the vessel's position [26,27]. The principle of this method is to calculate the displacement of the vessel for a period of time according to parameters, such as the heading and speed, provided by auxiliary sensors. The relationship of positional information between the adjacent moments can be derived by the displacement vector. Finally, the vessel's position is estimated by the continuous range measurements in adjacent moments. However, when a vessel can only receive signals from one shore station, the existing position estimation methods cannot be utilized.

We have proposed a position estimation method for AIS R-Mode, especially in the area close to a shore station. This method aims to estimate the vessel's position when it can only receive signals from a single AIS shore station. The contributions of this study are as follows.


The remainder of this paper is organized as follows: The architecture of the AIS R-Mode positioning system is given, and different position estimation methods that can be used in AIS R-Mode are discussed in Section 2. In Section 3, we present a novel position estimation method for when a vessel can only receive signals from a single AIS shore station. Furthermore, an effective position solving method to avoid singularity is derived for the conditions of three antennas and more than three antennas. Then, four different simulation scenarios are introduced to verify and evaluate the proposed method in Section 4. Simulation results are analyzed, and the influencing factors of positioning performance are discussed in Section 5. Finally, conclusions are put forth in Section 6.

#### **2. AIS R-Mode Positioning**

The AIS R-mode positioning system is comprised of AIS shore stations and vessels. The system framework is illustrated in Figure 1. The AIS shore stations, as positioning reference stations, transmit VHF ranging signals periodically. The circles in Figure 1 indicate the coverage areas of shore stations. Only the vessels in the coverage area of the shore station can receive its signals. Then, the vessel estimates its position according to its signals received from shore stations.

**Figure 1.** Coverage areas of automatic identification system (AIS) stations: different cases.

In Figure 1, if the vessel, such as Vessel 1, can receive signals from at least three AIS shore stations, either the TOA method or the TDOA method can be utilized to estimate the vessel's position in AIS R-mode positioning [25]. However, the layout of existing AIS shore stations was originally designed to satisfy communication requirements. A single shore station is sufficient for the vessel to communicate. However, as the AIS traffic load increased, more shore stations were established to increase the signal coverage rate. Still, areas exist where there are insufficient numbers of shore stations that can act as the reference stations for positioning [25]. For instance, some vessels can only receive signals from two shore stations, such as Vessel 2 in Figure 1. In this situation, the TOA method can be enhanced with the displacement correction so that the vessel's position can be estimated [26]. However, some vessels can only receive signals from a single shore station, shown in Figure 1, such as Vessel 3. Previous studies have based their criteria for at least two shore stations, so they were not able to estimate vessels' positions in this condition. This paper mainly focuses on the position estimation method based on a single shore station in AIS R-Mode.

#### **3. Positioning Method for a Single Shore Station**

The proposed positioning method based on a single shore station in AIS R-Mode uses multiple antennas on a vessel. Figure 2 presents the global coordinate system for R-Mode positioning and the local coordinate system for the proposed positioning method. *O* is the origin of the global system, the *Y*-axis is directed towards the north, and the *X*-axis is perpendicular to the *X*-axis. (*X*, *Y*) indicates the coordination in the global system. In Figure 2, *B* and *M* denote the shore station and the vessel, respectively. An inverted triangle depicts an antenna on the vessel. Multiple antennas on the vessel are shown in Figure 2. The global coordinates of the shore station *B* are (*XB, YB*). (*Xi, Yi*) are the global position coordinates of the *i*th antenna of the vessel. In the local coordinate system, *o* is the origin, and it is the center of the vessel; the *x*-axis is directed towards the heading direction, and the *y*-axis is perpendicular to the *x*-axis. (*xi*, *yi*) are the position coordinates of the *i*th antenna on the vessel in the local coordinate system.

**Figure 2.** Global and local coordinate systems for positioning.

The distance between the reference shore station and the *i*th antenna on the vessel can be calculated using the following simple equation [28].

$$
\overline{R}\_i = c \left( T\_{r\_i} - T\_t \right) \tag{1}
$$

where *c* is the speed of light in the free space; *Tt* is the signal transmission time, which can be obtained according to the AIS VHF signal from the shore station; and *Tri* is the signal arrival time for the *i*th antenna of vessel *M*, which can be obtained by the vessel.

However, as time synchronization between a vessel and shore stations is a difficult task to achieve in reality, *Ri* is the measured distance between the *i*th antenna and the shore station *B*, not equal to the actual distance *Ri*. The positioning equation between the *i*th antenna of the vessel and the shore station is

$$R\_i = R\_i(X\_{i\prime}, Y\_i) + c\Delta T \tag{2}$$

where subscript *i* represents the *i*th antenna, and Δ*T* is the clock offset between *M* and *B*. The accurate distance *Ri* can be calculated by the Euclidean distance formula:

$$R\_i(X\_{i\prime}, Y\_i) = \left( (X\_i - X\_B)^2 + (Y\_i - Y\_B)^2 \right)^{\frac{1}{2}} \tag{3}$$

According to Equation (3), Equation (2) can be further written as

$$\overline{R}\_{i} = \left( (X\_{i} - X\_{B})^{2} + (Y\_{i} - Y\_{B})^{2} \right)^{\frac{1}{2}} + c\Delta T \tag{4}$$

If one antenna is installed at the center of the vessel, the position coordinates of the vessel can be represented as (*X*0*, Y*0).

The heading angle θ of the vessel at any time can be obtained according to the outputs of shipborne equipment, including a magnetic compass, a gyrocompass, and so forth. In the global coordinate system, the coordinates of the *i*th antenna (*Xi*, *Yi*) can be expressed as

$$
\begin{bmatrix} X\_i \\ Y\_i \end{bmatrix} = \begin{bmatrix} X\_0 \\ Y\_0 \end{bmatrix} + T(\theta) \begin{bmatrix} x\_i \\ y\_i \end{bmatrix} \tag{5}
$$

where *T*(θ) is called the rotation matrix and is given below:

$$T(\theta) = \begin{bmatrix} -\sin\theta & \cos\theta \\ \cos\theta & \sin\theta \end{bmatrix} \tag{6}$$

substituting Equation (5) into Equation (3), we have

$$R\_i(X\_i, \mathcal{Y}\_i) = R\_i'(X\_0, \mathcal{Y}\_0) \tag{7}$$

The positioning equation of (*X*0, *Y*0) can be written as

$$R\_i = R\_i'(X\_0, Y\_0) + c\Delta T \tag{8}$$

As Equation (8) is a nonlinear equation, the first step of solving the positioning equation is linearization. A Taylor series is used, and the first-order terms are retained:

$$
\overline{R}\_i - R\_i' \Big(\hat{X}\_0, \hat{Y}\_0\big) - c\Delta \hat{T} = \left. \frac{\partial R\_i'}{\partial X\_0} \right|\_{\left(\hat{X}\_0, \hat{Y}\_0\right)} \delta X\_0 + \left. \frac{\partial R\_i'}{\partial Y\_0} \right|\_{\left(\hat{X}\_0, \hat{Y}\_0\right)} \delta Y\_0 + c\delta T \tag{9}
$$

where (*X*ˆ 0, *Y*ˆ 0) is the initial estimated coordinate of *M*, and (δ*X*0, δ*Y*0, δ*T)* are the corrections of the corresponding estimated values.

However, due to the limitation of the vessel size, the distances between different antennas to the shore station are approximately the same and not sufficiently independent. Therefore, the positioning matrix given by Equation (9) easily becomes a near-singular matrix or an ill-conditioned matrix, which is difficult to solve. Thus, the Taylor series expansion method [29,30] or the least-squares method [31,32] is widely used in position estimation. Both of these methods require good initial values; otherwise, it is difficult for the solution to achieve convergence [33]. Therefore, they are not suitable for solving the position estimation situation proposed in this paper. The Chan method is a TDOA-based localization algorithm, which can provide a closed-form solution for arbitrarily placed reference nodes [34]. Inspired by the Chan method, we solved the proposed positioning equation as follows:

According to Equations (7) and (8), we rewrote Equation (8) as

$$R\_i(X\_{i\prime}, Y\_i) = R\_i^{\prime}(X\_{0\prime}, Y\_0) = \overline{R}\_{i0} + R\_0(X\_{0\prime}, Y\_0) \tag{10}$$

where

$$
\overline{R}\_{i0} = \overline{R}\_i - \overline{R}\_0 \tag{11}
$$

Substituting Equation (10) into Equation (3), we obtained

$$
\overline{R}\_{i0}^2 + 2\overline{R}\_{i0}R\_0 + R\_0^2 = X\_i^2 + Y\_i^2 + X\_B^2 + Y\_B^2 - 2(X\_iX\_B + Y\_iY\_B) \tag{12}
$$

If *i* = 0 in Equation (3), it can be simplified as

$$R\_0 \,^2 = X\_0^2 + Y\_0^2 + X\_B^2 + Y\_B^2 - 2(X\_0 X\_B + Y\_0 Y\_B) \tag{13}$$

Then, subtracting Equation (13) from Equation (12), the result is

$$
\overline{R}\_{i0}^2 + 2\overline{R}\_{i0}R\_0 = X\_i^2 + Y\_i^2 - \left(X\_0^2 + Y\_0^2\right) - 2\left(X\_{i0}X\_B + Y\_{i0}Y\_B\right) \tag{14}
$$

according to

$$X\_i^2 + Y\_i^2 = \begin{bmatrix} X\_0 \\ Y\_0 \end{bmatrix}^T \begin{bmatrix} X\_0 \\ Y\_0 \end{bmatrix} + 2\begin{bmatrix} x\_i \\ y\_i \end{bmatrix}^T T(\theta) \begin{bmatrix} X\_0 \\ Y\_0 \end{bmatrix} + \begin{bmatrix} x\_i \\ y\_i \end{bmatrix}^T \begin{bmatrix} x\_i \\ y\_i \end{bmatrix} \tag{15}$$

Substituting Equations (5) and (15) into Equation (14), we obtained

$$\overline{R}\_{i0}^{2} + 2\overline{R}\_{i0}R\_{0} = 2\left[\begin{array}{c} \mathbf{x}\_{i} \\ \mathbf{y}\_{i} \end{array}\right]^{T} T(\boldsymbol{\theta}) \begin{bmatrix} \mathbf{X}\_{0} \\ \mathbf{Y}\_{0} \end{bmatrix} + \left[\begin{array}{c} \mathbf{x}\_{i} \\ \mathbf{y}\_{i} \end{array}\right]^{T} \begin{bmatrix} \mathbf{x}\_{i} \\ \mathbf{y}\_{i} \end{bmatrix} - 2\left[\begin{array}{c} \mathbf{x}\_{i} \\ \mathbf{y}\_{i} \end{array}\right]^{T} T(\boldsymbol{\theta}) \begin{bmatrix} \mathbf{X}\_{B} \\ \mathbf{Y}\_{B} \end{bmatrix} \tag{16}$$

#### *3.1. Condition of Three Antennas*

If there are three antennas on the vessel, according to Equation (16), the positioning matrix can be written simply as

$$
\begin{pmatrix} X\_0 \\ Y\_0 \end{pmatrix} = \left( \begin{bmatrix} \mathbf{x}\_1 & \mathbf{x}\_2 \\ & y\_1 & y\_2 \end{bmatrix}^T T(\theta) \right)^{-1} \left\{ \frac{1}{2} \left( \begin{bmatrix} \overline{R}\_{10}^2 \\ \overline{R}\_{20}^2 \end{bmatrix} - \begin{bmatrix} \begin{bmatrix} \mathbf{x}\_1 & y\_1 \end{bmatrix} \begin{bmatrix} \mathbf{x}\_1 & y\_1 \end{bmatrix}^T \\\ \begin{bmatrix} \mathbf{x}\_2 & y\_2 \end{bmatrix} \begin{bmatrix} \mathbf{x}\_2 & y\_2 \end{bmatrix}^T \end{pmatrix} \right) + \begin{bmatrix} \overline{R}\_{10} \\\ \overline{R}\_{20} \end{bmatrix} \mathbf{R}\_0 \right\} + \begin{bmatrix} X\_B \\\ Y\_B \end{bmatrix} \tag{17}
$$

where *R*<sup>0</sup> can be expressed by (*X*0, *Y*0), given by

$$R\_0^2 = \begin{bmatrix} X\_0 - X\_B \\ Y\_0 - Y\_B \end{bmatrix}^T \begin{bmatrix} X\_0 - X\_B \\ Y\_0 - Y\_B \end{bmatrix} \tag{18}$$

The equation can be solved by substituting Equation (17) into Equation (18), and we can calculate the vessel position coordinates (*X*0, *Y*0) according to Equation (17).

#### *3.2. Condition of More Than Three Antennas*

If there are *M* antennas on the vessel and *M* > 3, according to Equation (16), the positioning matrix can be written as

$$
\begin{bmatrix} \mathbf{x}\_{i} \\ \mathbf{y}\_{i} \end{bmatrix}^{T} T(\boldsymbol{\theta}) \begin{bmatrix} \mathbf{X}\_{0} \\ \mathbf{Y}\_{0} \end{bmatrix} - \overline{\mathbf{R}}\_{i0} \mathbf{R}\_{0} = \frac{1}{2} \mathbf{I} \begin{bmatrix} \overline{\mathbf{R}}\_{i0}^{2} - \begin{bmatrix} \mathbf{x}\_{i} \\ \mathbf{y}\_{i} \end{bmatrix}^{T} \begin{bmatrix} \mathbf{x}\_{i} \\ \mathbf{y}\_{i} \end{bmatrix} \end{bmatrix} + \begin{bmatrix} \mathbf{x}\_{i} \\ \mathbf{y}\_{i} \end{bmatrix}^{T} T(\boldsymbol{\theta}) \begin{bmatrix} \mathbf{X}\_{B} \\ \mathbf{Y}\_{B} \end{bmatrix} \tag{19}
$$

Substituting Equation (6) into Equation (19), we obtained

$$
\begin{bmatrix}
\begin{pmatrix}
\cos\theta & \sin\theta
\end{pmatrix}
\begin{bmatrix}
\text{x}\_{i} & \text{y}\_{i} \\
\text{x}\_{i} & \text{y}\_{i}
\end{bmatrix}^{T}
\end{bmatrix}
\begin{bmatrix}
\text{X}\_{0} \\
\text{Y}\_{0} \\
\text{R}\_{0}
\end{bmatrix} = \frac{1}{2} \begin{bmatrix}
\overline{R}\_{i0}^{2} - \begin{bmatrix}
\text{x}\_{i} \\
\text{y}\_{i}
\end{bmatrix}^{T}
\begin{bmatrix}
\text{x}\_{i} \\
\text{y}\_{i}
\end{bmatrix}
\end{bmatrix} + \begin{bmatrix}
\text{x}\_{i} \\
\text{y}\_{i}
\end{bmatrix}^{T}
\Gamma(\theta)
\begin{bmatrix}
\text{X}\_{B} \\
\text{y}\_{B}
\end{bmatrix} \tag{20}
$$

The positioning matrix can be simplified as

$$\mathbf{H}\mathbf{X}=\mathbf{b}\tag{21}$$

where

$$\mathbf{X} = \begin{bmatrix} \ \mathbf{X}\_0 & \ \mathbf{Y}\_0 & \ \mathbf{R}\_0 \end{bmatrix}^T \tag{22}$$

$$\mathbf{H} = \begin{bmatrix} \begin{pmatrix} -\sin\theta & \cos\theta \end{pmatrix} \begin{bmatrix} x\_1 \\ y\_1 \end{bmatrix} & \begin{pmatrix} \cos\theta & \sin\theta \end{pmatrix} \begin{bmatrix} x\_1 \\ y\_1 \end{bmatrix} & -\overline{\mathbf{R}}\_{10} \\\\ \begin{pmatrix} -\sin\theta & \cos\theta \end{pmatrix} \begin{bmatrix} x\_2 \\ y\_2 \end{bmatrix} & \begin{pmatrix} \cos\theta & \sin\theta \end{pmatrix} \begin{bmatrix} x\_2 \\ y\_2 \end{bmatrix} & -\overline{\mathbf{R}}\_{20} \\\\ \vdots & \vdots & \vdots \\\\ \begin{pmatrix} -\sin\theta & \cos\theta \end{pmatrix} \begin{bmatrix} x\_{M-1} \\ y\_{M-1} \end{bmatrix} & \begin{pmatrix} \cos\theta & \sin\theta \end{pmatrix} \begin{bmatrix} x\_{M-1} \\ y\_{M-1} \end{bmatrix} & -\overline{\mathbf{R}}\_{(M-1)0} \end{bmatrix} \tag{23}$$

$$
\begin{bmatrix}
\vdots & \vdots & \vdots \\
\begin{bmatrix} -\sin\theta & \cos\theta \end{bmatrix} \begin{bmatrix} \begin{array}{ccc} \chi\_{M-1} \\ \end{array} \end{bmatrix} & \begin{array}{ccc} \begin{array}{ccc} \vdots & \vdots & \vdots \\ \end{array} \\\begin{array}{ccc} \chi\_{M-1} \end{array} \end{bmatrix} & \begin{array}{ccc} \begin{array}{ccc} \chi\_{M-1} \\ \end{array} \end{array} \end{bmatrix} & \begin{array}{ccc} \begin{array}{ccc} \chi\_{M-1} \\ \end{array} \end{bmatrix} & \begin{array}{ccc} \begin{array}{ccc} \chi\_{M-1} \\ \end{array} \end{array} \end{bmatrix}
\end{bmatrix}
$$

$$\mathbf{b} = \begin{bmatrix} \frac{1}{2} \left[ \overline{\mathbf{R}}\_{10}^{2} - \begin{bmatrix} \mathbf{x}\_{1} \\ \mathbf{y}\_{1} \end{bmatrix}^{T} \begin{bmatrix} \mathbf{x}\_{1} \\ \mathbf{y}\_{1} \end{bmatrix} \right] + \begin{bmatrix} \mathbf{x}\_{1} \\ \mathbf{y}\_{1} \end{bmatrix}^{T} T(\theta) \begin{bmatrix} \mathbf{X}\_{B} \\ \mathbf{y}\_{B} \end{bmatrix} \\\ \frac{1}{2} \left[ \overline{\mathbf{R}}\_{20}^{2} - \begin{bmatrix} \mathbf{x}\_{2} \\ \mathbf{y}\_{2} \end{bmatrix}^{T} \begin{bmatrix} \mathbf{x}\_{2} \\ \mathbf{y}\_{2} \end{bmatrix} \right] + \begin{bmatrix} \mathbf{x}\_{2} \\ \mathbf{y}\_{2} \end{bmatrix}^{T} T(\theta) \begin{bmatrix} \mathbf{X}\_{B} \\ \mathbf{y}\_{B} \end{bmatrix} \\\ \vdots \\\ \frac{1}{2} \left[ \overline{\mathbf{R}}\_{(M-1)0}^{2} - \begin{bmatrix} \mathbf{x}\_{M-1} \\ \mathbf{y}\_{M-1} \end{bmatrix}^{T} \begin{bmatrix} \mathbf{x}\_{M-1} \\ \mathbf{y}\_{M-1} \end{bmatrix} \right] + \begin{bmatrix} \mathbf{x}\_{M-1} \\ \mathbf{y}\_{M-1} \end{bmatrix}^{T} T(\theta) \begin{bmatrix} \mathbf{X}\_{B} \\ \mathbf{y}\_{B} \end{bmatrix} \end{bmatrix} \tag{24}$$

As measurement errors exist in *Ri*0, the positioning matrix of Equation (21) was rewritten as

$$
\boldsymbol{\upPsi} = \mathbf{b} - \mathbf{H}\mathbf{X} \tag{25}
$$

where

$$
\psi\_i = R\_i c n\_i + 0.5c^2 n\_i^2 \tag{26}
$$

where *ni* is an error corresponding to *Ri*0. Due to *Ri* ≈ *R*0, **R** ≈ *R*0**I**. **I** is an identity matrix. Equation (25) can be written as

$$
\mathfrak{sp} \approx cR\_0 \mathfrak{n} + 0.5c^2 \mathfrak{n} \odot \mathfrak{n} \tag{27}
$$

Therefore, solving the positioning matrix requires actually finding the value of **X** where ||**b** – **HX**|| is the minimum. Using the weighted least-squares method, the solution of Equation (25) is

$$\mathbf{X} = \left(\mathbf{H}^T \mathbf{Q}^{-1} \mathbf{H}\right)^{-1} \mathbf{H}^T \mathbf{Q}^{-1} \mathbf{b} \tag{28}$$

where **Q** is the covariance matrix of the measurement error. The covariance matrix of **X** can be expressed as

$$\text{cov}(\mathbf{X}) \approx c^2 R\_0^2 \left(\mathbf{H}^T \mathbf{Q}^{-1} \mathbf{H}\right)^{-1} \tag{29}$$

In order to improve the accuracy, assuming that the elements of **X** are independent, we defined

$$\mathbf{X} = \begin{bmatrix} \overleftarrow{X}\_0 + \mathbf{c}\_X \\ \overleftarrow{Y}\_0 + \mathbf{c}\_Y \\ \overleftarrow{R}\_0 + \mathbf{c}\_R \end{bmatrix} \tag{30}$$

where **e** represents the estimation errors of **X**. Subtracting the first two components of **X** by (*XB*, *YB*), we obtained

$$
\mathbf{q} = \mathbf{h} - \mathbf{G}\mathbf{Z} \tag{31}
$$

where

$$\mathbf{Z} = \begin{bmatrix} \left(\widetilde{\mathbf{X}}\_0 - \mathbf{X}\_B\right)^2\\ \left(\widetilde{\mathbf{Y}}\_0 - \mathbf{Y}\_B\right)^2 \end{bmatrix} \tag{32}$$

$$\mathbf{G} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ 1 & 1 \end{bmatrix} \tag{33}$$

$$\mathbf{h} = \begin{bmatrix} \left( \mathbf{X}\_0 - \mathbf{X}\_B \right)^2 \\ \left( \mathbf{Y}\_0 - \mathbf{Y}\_B \right)^2 \\ \mathbf{R}\_0^2 \end{bmatrix} \tag{34}$$

The solution of Equation (31) is

$$\mathbf{Z} = \left(\mathbf{G}^T \boldsymbol{\Phi}^{-1} \mathbf{G}\right)^{-1} \mathbf{G}^T \boldsymbol{\Phi}^{-1} \mathbf{h} \tag{35}$$

*Sensors* **2020**, *20*, 1590

where

$$\Phi = E(\boldsymbol{\varphi}^T \boldsymbol{\varphi})\tag{36}$$

we know that

$$\begin{aligned} \mathfrak{sp}\_1 &= 2(X\_0 - X\_B)\mathfrak{e}\_X + \mathfrak{e}\_X^2 \approx 2(X\_0 - X\_B)\mathfrak{e}\_X\\ \mathfrak{sp}\_2 &= 2(Y\_0 - Y\_B)\mathfrak{e}\_Y + \mathfrak{e}\_Y^2 \approx 2(Y\_0 - Y\_B)\mathfrak{e}\_Y\\ \mathfrak{sp}\_3 &= 2R\_0\mathfrak{e}\_R + \mathfrak{e}\_R^2 \approx 2R\_0\mathfrak{e}\_R \end{aligned} \tag{37}$$

Equation (36) can be derived as

$$\boldsymbol{\Phi} = 4 \mathbf{D} \text{cov}(\mathbf{X}) \mathbf{D} \tag{38}$$

where

$$\mathbf{D} = \text{diag}\{ (X\_0 - X\_B)\_\prime \left( \mathbf{Y}\_0 - \mathbf{Y}\_B \right), \mathbf{R}\_0 \} \tag{39}$$

Substituting Equation (28) into Equation (37),

$$\boldsymbol{\Phi} = \boldsymbol{E} \left( \boldsymbol{\uprho}^T \boldsymbol{\uprho} \right) = 4 \mathbf{D} \text{cov}(\boldsymbol{\uplambda}) \mathbf{D} = 4 \boldsymbol{c}^2 \boldsymbol{R}\_0^2 \mathbf{D} \left( \mathbf{H}^{0T} \mathbf{Q}^{-1} \mathbf{H}^0 \right)^{-1} \mathbf{D} \tag{40}$$

Similarly, substituting Equation (38) into Equation (34),

$$\mathbf{Z} = \left(\mathbf{G}^T \mathbf{D}^{-1} \mathbf{H}^{0T} \mathbf{Q}^{-1} \mathbf{H}^0 \mathbf{D}^{-1} \mathbf{G}\right)^{-1} \mathbf{G}^T \mathbf{D}^{-1} \mathbf{H}^{0T} \mathbf{Q}^{-1} \mathbf{H}^0 \mathbf{D}^{-1} \mathbf{h} \tag{41}$$

The covariance of **Z** is

$$\mathbf{Cov}(\mathbf{Z}) = \left(\mathbf{G}^T \boldsymbol{\Phi}^{-1} \mathbf{G}\right)^{-1} \tag{42}$$

Finally, the vessel position is estimated as

$$\mathbf{X}' = \pm \sqrt{\mathbf{Z}} + \begin{bmatrix} X\_B \\ Y\_B \end{bmatrix} \tag{43}$$

According to the approximate position of the vessel, the minus sign or the plus sign can be determined in Equation (43). The covariance matrix of position estimation **X'**, corresponding to positioning precision, is given by

$$\text{cov}\left(\mathbf{X}'\right) = \frac{1}{4}\mathbf{B}^{-1}\text{cov}(\mathbf{Z})\mathbf{B}^{-1} \approx c^2 \mathcal{R}\_0^{-2} \left(\mathbf{B}\mathbf{G}^T\mathbf{D}^{-1}\left(\mathbf{H}^{0T}\mathbf{Q}^{-1}\mathbf{H}^0\right)\mathbf{D}^{-1}\mathbf{G}\mathbf{B}\right)^{-1} \tag{44}$$

where

$$\mathbf{B} = \text{diag}\{ (X\_0 - X\_B)\_\prime \left( Y\_0 - Y\_B \right) \}\tag{45}$$

The position root-mean-square error (RMSE) is equal to the root of the sum of the diagonal elements of cov(**X'**). Furthermore, to further improve the accuracy of the estimated position, smooth filtering was also used based on the estimated position **X'** of Equation (43). We formulated this smooth problem in MATLAB by simply using the smoothdata function to smooth noisy position data.

#### **4. Simulation Scenario**

In order to verify the proposed position estimation method for a single shore station in AIS R-Mode, numerical simulation of positioning errors was performed in different scenarios using MATLAB. A real AIS shore station named Huangbaizui was used, the latitude of which is 38◦54.2850 N, and the longitude is 121◦42.9500 E, as shown in Figure 3a. The location of the vessel was very close to the Huangbaizui shore station, where the vessel could only receive signals from the Huangbaizui shore station. Its initial latitude was 38◦40.1690 N, and the longitude was 122◦10.1020 E. We used a Gauss–Krüger projection with six-degree zones [35]; the shore station coordinates (*XB, YB*) were [4.3087 <sup>×</sup> 106, <sup>−</sup>1.1139 <sup>×</sup> 105]; and the vessel's initial position (*X*0*, Y*0) was [4.3092 <sup>×</sup> <sup>10</sup>6, <sup>−</sup>1.0938 <sup>×</sup> <sup>10</sup>5]. The unit of measurement was meters. Figure 3b shows the geometric relationship between the shore

station and the antennas of the vessel when the heading angle of the vessel was 0◦. The blue star denotes the position of the AIS shore station. The red asterisk is the position of the vessel's center. The green crosses denote the other three antennas on the vessel. As the antennas' positions were linked with the vessel's dimensions, we introduced a common engineering vessel, named Maochang 526, to set up the antennas' location parameters. As Maochang 526 s cargo was 650 t, the length was 68 m, and the width was 14 m, the position coordinates of the four antennas were [0, 0], [15, 7], [30, 0], [15, −7] in the local coordinate system of the vessel, as shown in Figure 3b.

**Figure 3.** Distribution of single shore station and initial vessel location. (**a**) distribution on google map (**b**) distribution using a Gauss–Krüger projection.

In order to assess the performance of the proposed positioning method for a single shore station, positioning errors were evaluated in this study in the following four simulation scenarios.


To bring simulation closer to reality, we used the Monte Carlo method to create measurement noises [36]. We ran this 1000 times at every vessel position in the above scenarios by adding Gaussian random noise with a mean root square of 0.001.

#### **5. Simulation Results Analysis**

#### *5.1. Scenario 1: Positioning Errors Vary with Heading Angles*

In Scenario 1, the position of the vessel was fixed. Further, the heading angle of the vessel was increased from 0◦ to 90◦. Figure 4 shows the antenna locations of the vessel with different heading angles θ. The red cross in Figure 4 denotes the vessel's location—the center of the vessel. The green quadrilaterals represent the layout of the antennas in the vessel with different heading angles.

**Figure 4.** Antenna layout with different heading angles.

The theoretical RMSE of the proposed method in this study at different heading angles is given in Table 1. Moreover, the calculated RMSE and the average positioning error after smooth filtering according to the simulation results in Scenario 1 are also given.

**Table 1.** Positioning results of Scenario 1.


From Table 1, it can be seen clearly that the positioning errors increased as the heading angles increased. The reason for the increase of the RMSE was that the correlation between distances from different antennas to the shore station was increasing with the relative position change between the antennas and the shore station. These results indicate that the calculated RMSE based on the simulation positioning results is consistent with the RMSE according to the theoretical derivation. In addition, positioning errors could be reduced effectively by using the smoothdata function in MATLAB.

#### *5.2. Scenario 2: Positioning Errors Vary with Vessel's Position*

In Scenario 2, the vessel navigated around the shore station. The relative azimuth angle η shown in Figures 2 and 5 is the angle between the north vector and the shore to the vessel vector. It started at 5◦ and stopped every 10◦ to stabilize its heading angle on 0◦ and get its position. The vessel continued to navigate until η became 85◦. The shore station was the center of the motion curve. Figure 5 shows the positioning results of the vessel as its position changed. The red cross is the actual position of the vessel. The black asterisk is the estimated position using the proposed method. The vertexes of the green diamond represent the positions of antennas.

**Figure 5.** Positioning results at different locations.

η is the relative azimuth angle of the shore station shown in Figures 2 and 5. Table 2 shows the positioning results when the vessel stopped every 10◦ until η became 85◦ in Scenario 2.


**Table 2.** Positioning results of Scenario 2.

It can be seen from Figure 5 and Table 2 that the RMSE was changing with the relative position between the antennas and the shore station. The RMSE was largest and the positioning result was the worst when the relative azimuth angle between the vessel and the shore station was 25◦. In this situation, the distances between different antennas and the shore station were not sufficiently independent. Antennas -<sup>2</sup> and -4 , and the shore station, were almost on the same line, shown in Figure 5. Therefore, the positioning matrix attained singularity in this situation. The calculated RMSE based on the simulation positioning results was much larger than the theoretical RMSE, as the noise had a significant impact on the positioning matrix. Even using a smoothing filter could not reduce the position errors.

#### *5.3. Scenario 3: Positioning Errors Vary with Relative Position*

In Scenario 3, the vessel navigated around the shore station, and the shore station was still the center of the motion curve. It stopped every 10◦ to adjust its heading angle to maintain the relative positional relationship between the antennas and the shore station, and get its position. The positioning results of the vessel are shown in Figure 6. We can see that the relative position between the antennas and the shore station remained the same. In addition, the position errors were similar, although the vessel was in different locations.

**Figure 6.** Positioning results with unchanged relative position.

The numerical positioning results in Scenario 3 are given in Table 3.

**Table 3.** Positioning results of Scenario 3.


According to the simulation results shown in Table 3, the theoretical RMSE was the same. The reason for this was that the relative position between the antennas and the shore station was the same. Further, the calculated RMSE was almost the same. Positioning errors by the smoothing filter were reduced and were at the same level, substantially, when the vessel was in different positions. In Scenario 1, the heading angles of the vessel were changing. In Scenario 2, the position of the vessel was changing. However, in this scenario, the relative position was the same, although both the heading angle and the position were changing. Therefore, it can be concluded that, as long as the distance is constant and the relative positional relationship remains unchanged, the positioning performance is the same, regardless of whether the position and the heading angles of the vessel are changing.

#### *5.4. Scenario 4: Positioning Errors Vary with Distances*

In Scenario 4, the vessel moved toward or away from the shore station. The distance between the shore station and the vessel was changing from 100 to 1000 m. During the movement, the heading angle of the vessel was always 0◦. Figure 7 shows the estimated positioning results as the distance changed. Position errors varied with the distance, as depicted in Figure 8.

**Figure 7.** Positioning results with changing distance.

**Figure 8.** Positioning results at different distances.

Table 4 shows the positioning results in Scenario 4. It can be observed that the simulated RMSE and position errors increased as the distance increased. Therefore, the performance was better in the area closer to the shore station. Correspondingly, the proposed method is precisely used for positioning based on a single shore station, which is suitable for areas closer to the shore station.

**Table 4.** Positioning result of Scenario 4.


#### **6. Conclusions**

As the existing AIS is a communication system, when AIS shore stations are used as positioning reference stations in AIS R-Mode, a vessel can only receive signals from a single shore station in some areas, especially in an area close to the shore station. In this situation, traditional positioning methods are not suitable. A position estimation method using multiple antennas was proposed in this paper to solve this problem. The geometric relationship between the antennas in the local coordinate system is converted into the global positioning coordinate system using the heading angle. Then, the positioning matrix is obtained according to measurements from multiple antennas. However, due to the size of the vessel, the distances between different antennas to the shore station are not sufficiently independent. Therefore, the positioning matrix of this proposed method is easily near singularity or ill-conditioned. A novel method to solve the positioning matrix was presented here, rather than the Taylor series expansion method, which can effectively prevent singularity. Finally, we verified the validity of the proposed method in diverse scenarios by numerical simulations. The influencing factors of positioning performance were analyzed, such as heading angle, relative position, and distances. The positioning performance became worse as the distance increased. Fortunately, the proposed method was exactly suited for the area close to the shore station, where the vessel can only receive signals from this shore station. Further, the positioning errors were mainly affected by the relative positional relationship between the antennas and the shore station. As it is not necessary to establish new AIS shore stations, the proposed method can help the AIS R-Mode positioning system to expand its application scope. A possible area of future research would be to investigate the improved position estimation method based on a single shore station, which can reduce positioning errors in the far area of the shore station. This would help to further expand AIS R-Mode applications.

**Author Contributions:** Conceptualization, Y.J. and K.Z.; methodology, Y.J.; software, Y.J. and K.Z.; validation, Y.J. and K.Z.; investigation, Y.J. and K.Z.; writing—original draft preparation, Y.J.; writing—review and editing, Y.J. and K.Z.; funding acquisition, Y.J. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Chinese National Science Foundation (numbers 61501079 and 61971083); the Remote Sensing Youth Science and Technology Innovative Research; the Dalian Technology Star Program (number 2017Q058); the Dalian Science and Technology Innovation Fund (number 2019J11CY015); and the Fundamental Research Funds for the Central Universities (number 017190327).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
